The global AI industry is currently dominated by large language models such as GPT, LLaMA, Gemini, and Claude. However, independent developers and researchers in India are also beginning to build their own custom AI models from the ground up. One such project is Bharat MiniGPT 350M, a custom GPT-style causal language model developed by Harshvardhan Mishra.
Unlike many projects that simply fine-tune existing models, Bharat MiniGPT 350M was trained from scratch using a manually implemented Transformer architecture in PyTorch. The project focuses on understanding and building modern LLM systems rather than relying entirely on existing pretrained frameworks.
What is Bharat MiniGPT 350M?
Bharat MiniGPT 350M is a decoder-only Transformer language model containing approximately 350 million parameters. The model was trained using modern LLM architecture components including:
- RoPE (Rotary Position Embedding)
- RMSNorm
- SwiGLU feed-forward layers
- SDPA Attention
- Flash Attention compatibility
The current release is a 3 billion tokens pretrained base model experiment and is not instruction-tuned yet.
Built from Scratch, Not Fine-Tuned
One of the most important aspects of Bharat MiniGPT 350M is that it is not a fine-tuned GPT-2 or LLaMA variant. The architecture, Transformer blocks, attention pipeline, and training workflow were manually implemented in PyTorch.
After successful implementation and testing, the model was later integrated into the HuggingFace ecosystem to support easier loading and inference.
This makes the project an important independent LLM engineering experiment originating from India.
Model Architecture
Bharat MiniGPT 350M uses a modern decoder-only Transformer architecture optimized for efficient language modeling.
| Component | Details |
|---|---|
| Parameters | ~350 Million |
| Architecture | Decoder-only Transformer |
| Layers | 24 Transformer Blocks |
| Attention Heads | 16 |
| Embedding Size | 1024 |
| Context Length | 768 Tokens |
| Vocabulary Size | 50,257 |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Feed Forward | SwiGLU |
| Attention | SDPA / Flash Attention Compatible |
| Precision | FP16 Training |
Training Data
The model was trained on a weighted mixture of publicly available datasets.
| Dataset | Weight |
|---|---|
| FineWeb | 40% |
| FineWeb-Edu | 30% |
| Wikipedia English | 30% |
This dataset mixture was designed to balance internet-scale knowledge, educational content, and encyclopedic information.
The total training corpus contained approximately 3 billion tokens.
Training Setup
The training pipeline used several modern optimization techniques commonly used in large-scale LLM training.
| Setting | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 3e-4 |
| Min LR | 3e-5 |
| Warmup Steps | 51,200 |
| LR Scheduler | Cosine Decay |
| Gradient Accumulation | 128 |
| Mixed Precision | FP16 |
| Gradient Clipping | 1.0 |
Additional optimizations such as gradient checkpointing were used to improve training efficiency on limited hardware resources.
Key Features
Bharat MiniGPT 350M includes multiple modern language model features:
- Custom GPT architecture
- RoPE positional embeddings
- RMSNorm normalization
- SwiGLU feed-forward layers
- Flash Attention compatible SDPA
- HuggingFace
generate()support - KV-cache compatible inference
- Weight tying support
- Gradient checkpointing during training
Benchmark Results
The model was evaluated using the EleutherAI LM Evaluation Harness.
ARC Easy
- acc: 0.3312
- acc_norm: 0.3413
HellaSwag
- acc: 0.2650
- acc_norm: 0.2636
PIQA
- acc: 0.5631
- acc_norm: 0.5533
These benchmark results are from the current pretrained base checkpoint. Significant improvements are expected after future instruction tuning and fine-tuning stages.
HuggingFace Compatibility
The model was later integrated into the HuggingFace Transformers ecosystem for easier inference and experimentation.
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
trust_remote_code=True
)
Text Generation Example
import torch
prompt = "India is a land of"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
top_k=50,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Future Plans
The future roadmap for Bharat MiniGPT includes:
- Larger-scale token training
- Better multilingual support
- Instruction tuning
- Improved tokenizer optimization
- Extended context length
- Quantized inference support
- Better KV-cache optimization
- Integration into BharatAI ecosystem projects
A larger and more capable version of the model is planned for future releases.
Independent AI Development in India
Bharat MiniGPT 350M represents an independent effort to understand and build modern GPT-style architectures from scratch. Projects like this demonstrate how developers in India are actively experimenting with large language model training, optimization, and deployment workflows.
As AI development continues to grow globally, independent research and engineering projects such as Bharat MiniGPT help expand local innovation and technical expertise in the field of artificial intelligence.
Explore More
Official Project Article:
Bharat MiniGPT 350M Official Article
Disclaimer
Bharat MiniGPT 350M is an experimental pretrained base model developed for research and educational purposes. The model is not instruction-tuned yet and may generate inaccurate, biased, or incomplete responses.








