Bharat MiniGPT 350M: A Custom GPT-Style AI Model Built from Scratch in India

The global AI industry is currently dominated by large language models such as GPT, LLaMA, Gemini, and Claude. However, independent developers and researchers in India are also beginning to build their own custom AI models from the ground up. One such project is Bharat MiniGPT 350M, a custom GPT-style causal language model developed by Harshvardhan Mishra.

Unlike many projects that simply fine-tune existing models, Bharat MiniGPT 350M was trained from scratch using a manually implemented Transformer architecture in PyTorch. The project focuses on understanding and building modern LLM systems rather than relying entirely on existing pretrained frameworks.

What is Bharat MiniGPT 350M?

Bharat MiniGPT 350M is a decoder-only Transformer language model containing approximately 350 million parameters. The model was trained using modern LLM architecture components including:

RoPE (Rotary Position Embedding)
RMSNorm
SwiGLU feed-forward layers
SDPA Attention
Flash Attention compatibility

The current release is a 3 billion tokens pretrained base model experiment and is not instruction-tuned yet.

Built from Scratch, Not Fine-Tuned

One of the most important aspects of Bharat MiniGPT 350M is that it is not a fine-tuned GPT-2 or LLaMA variant. The architecture, Transformer blocks, attention pipeline, and training workflow were manually implemented in PyTorch.

After successful implementation and testing, the model was later integrated into the HuggingFace ecosystem to support easier loading and inference.

This makes the project an important independent LLM engineering experiment originating from India.

Model Architecture

Bharat MiniGPT 350M uses a modern decoder-only Transformer architecture optimized for efficient language modeling.

Component	Details
Parameters	~350 Million
Architecture	Decoder-only Transformer
Layers	24 Transformer Blocks
Attention Heads	16
Embedding Size	1024
Context Length	768 Tokens
Vocabulary Size	50,257
Position Encoding	RoPE
Normalization	RMSNorm
Feed Forward	SwiGLU
Attention	SDPA / Flash Attention Compatible
Precision	FP16 Training

Training Data

The model was trained on a weighted mixture of publicly available datasets.

Dataset	Weight
FineWeb	40%
FineWeb-Edu	30%
Wikipedia English	30%

This dataset mixture was designed to balance internet-scale knowledge, educational content, and encyclopedic information.

The total training corpus contained approximately 3 billion tokens.

Training Setup

The training pipeline used several modern optimization techniques commonly used in large-scale LLM training.

Setting	Value
Optimizer	AdamW
Learning Rate	3e-4
Min LR	3e-5
Warmup Steps	51,200
LR Scheduler	Cosine Decay
Gradient Accumulation	128
Mixed Precision	FP16
Gradient Clipping	1.0

Additional optimizations such as gradient checkpointing were used to improve training efficiency on limited hardware resources.

Key Features

Bharat MiniGPT 350M includes multiple modern language model features:

Custom GPT architecture
RoPE positional embeddings
RMSNorm normalization
SwiGLU feed-forward layers
Flash Attention compatible SDPA
HuggingFace generate() support
KV-cache compatible inference
Weight tying support
Gradient checkpointing during training

Benchmark Results

The model was evaluated using the EleutherAI LM Evaluation Harness.

ARC Easy

acc: 0.3312
acc_norm: 0.3413

HellaSwag

acc: 0.2650
acc_norm: 0.2636

PIQA

acc: 0.5631
acc_norm: 0.5533

These benchmark results are from the current pretrained base checkpoint. Significant improvements are expected after future instruction tuning and fine-tuning stages.

HuggingFace Compatibility

The model was later integrated into the HuggingFace Transformers ecosystem for easier inference and experimentation.

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
    trust_remote_code=True
)

Text Generation Example

import torch

prompt = "India is a land of"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.8,
        top_k=50,
        top_p=0.95,
        do_sample=True
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Future Plans

The future roadmap for Bharat MiniGPT includes:

Larger-scale token training
Better multilingual support
Instruction tuning
Improved tokenizer optimization
Extended context length
Quantized inference support
Better KV-cache optimization
Integration into BharatAI ecosystem projects

A larger and more capable version of the model is planned for future releases.

Independent AI Development in India

Bharat MiniGPT 350M represents an independent effort to understand and build modern GPT-style architectures from scratch. Projects like this demonstrate how developers in India are actively experimenting with large language model training, optimization, and deployment workflows.

As AI development continues to grow globally, independent research and engineering projects such as Bharat MiniGPT help expand local innovation and technical expertise in the field of artificial intelligence.

Explore More

Official Project Article:
Bharat MiniGPT 350M Official Article

Disclaimer

Bharat MiniGPT 350M is an experimental pretrained base model developed for research and educational purposes. The model is not instruction-tuned yet and may generate inaccurate, biased, or incomplete responses.

Or check our Popular Categories...

About Us

Contact Info

Or check our Popular Categories...

Bharat MiniGPT 350M: A Custom GPT-Style AI Model Built from Scratch in India

What is Bharat MiniGPT 350M?

Built from Scratch, Not Fine-Tuned

Model Architecture

Training Data

Training Setup

Key Features

Benchmark Results

ARC Easy

HellaSwag

PIQA

HuggingFace Compatibility

Text Generation Example

Future Plans

Independent AI Development in India

Explore More

Disclaimer

Harshvardhan Mishra

Related Posts

Odia New Year 2026 (Pana Sankranti): 50+ Wishes, Captions & Greetings (Odia + English)

Vishu Kani 2026: Kerala Festival Wishes & Greetings (English + Malayalam)

Leave a Reply Cancel reply

Economy & Finance

How Many Days of Oil Reserve Does India Have? Strategic Petroleum Reserves Explained

Why India Needs a $100 Billion Defense Budget — A Strategic and Economic Perspective

Do Sikkim’s People Pay Income Tax? A Comprehensive Analysis

What Is FTA (Free Trade Agreement)? Types, Benefits, and Challenges Explained

What is Alpha and Beta in Mutual Funds? Understanding the Key Metrics of Fund Performance

Difference Between IPO and OFS: Meaning, Process, and Key Differences