Bharat MiniGPT 350M: A Custom GPT-Style AI Model Built from Scratch in India

The global AI industry is currently dominated by large language models such as GPT, LLaMA, Gemini, and Claude. However, independent developers and researchers in India are also beginning to build their own custom AI models from the ground up. One such project is Bharat MiniGPT 350M, a custom GPT-style causal language model developed by Harshvardhan Mishra.

Unlike many projects that simply fine-tune existing models, Bharat MiniGPT 350M was trained from scratch using a manually implemented Transformer architecture in PyTorch. The project focuses on understanding and building modern LLM systems rather than relying entirely on existing pretrained frameworks.


What is Bharat MiniGPT 350M?

Bharat MiniGPT 350M is a decoder-only Transformer language model containing approximately 350 million parameters. The model was trained using modern LLM architecture components including:

  • RoPE (Rotary Position Embedding)
  • RMSNorm
  • SwiGLU feed-forward layers
  • SDPA Attention
  • Flash Attention compatibility

The current release is a 3 billion tokens pretrained base model experiment and is not instruction-tuned yet.


Built from Scratch, Not Fine-Tuned

One of the most important aspects of Bharat MiniGPT 350M is that it is not a fine-tuned GPT-2 or LLaMA variant. The architecture, Transformer blocks, attention pipeline, and training workflow were manually implemented in PyTorch.

After successful implementation and testing, the model was later integrated into the HuggingFace ecosystem to support easier loading and inference.

This makes the project an important independent LLM engineering experiment originating from India.


Model Architecture

Bharat MiniGPT 350M uses a modern decoder-only Transformer architecture optimized for efficient language modeling.

ComponentDetails
Parameters~350 Million
ArchitectureDecoder-only Transformer
Layers24 Transformer Blocks
Attention Heads16
Embedding Size1024
Context Length768 Tokens
Vocabulary Size50,257
Position EncodingRoPE
NormalizationRMSNorm
Feed ForwardSwiGLU
AttentionSDPA / Flash Attention Compatible
PrecisionFP16 Training

Training Data

The model was trained on a weighted mixture of publicly available datasets.

DatasetWeight
FineWeb40%
FineWeb-Edu30%
Wikipedia English30%

This dataset mixture was designed to balance internet-scale knowledge, educational content, and encyclopedic information.

The total training corpus contained approximately 3 billion tokens.


Training Setup

The training pipeline used several modern optimization techniques commonly used in large-scale LLM training.

SettingValue
OptimizerAdamW
Learning Rate3e-4
Min LR3e-5
Warmup Steps51,200
LR SchedulerCosine Decay
Gradient Accumulation128
Mixed PrecisionFP16
Gradient Clipping1.0

Additional optimizations such as gradient checkpointing were used to improve training efficiency on limited hardware resources.


Key Features

Bharat MiniGPT 350M includes multiple modern language model features:

  • Custom GPT architecture
  • RoPE positional embeddings
  • RMSNorm normalization
  • SwiGLU feed-forward layers
  • Flash Attention compatible SDPA
  • HuggingFace generate() support
  • KV-cache compatible inference
  • Weight tying support
  • Gradient checkpointing during training

Benchmark Results

The model was evaluated using the EleutherAI LM Evaluation Harness.

ARC Easy

  • acc: 0.3312
  • acc_norm: 0.3413

HellaSwag

  • acc: 0.2650
  • acc_norm: 0.2636

PIQA

  • acc: 0.5631
  • acc_norm: 0.5533

These benchmark results are from the current pretrained base checkpoint. Significant improvements are expected after future instruction tuning and fine-tuning stages.


HuggingFace Compatibility

The model was later integrated into the HuggingFace Transformers ecosystem for easier inference and experimentation.

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
    trust_remote_code=True
)

Text Generation Example

import torch

prompt = "India is a land of"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.8,
        top_k=50,
        top_p=0.95,
        do_sample=True
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Future Plans

The future roadmap for Bharat MiniGPT includes:

  • Larger-scale token training
  • Better multilingual support
  • Instruction tuning
  • Improved tokenizer optimization
  • Extended context length
  • Quantized inference support
  • Better KV-cache optimization
  • Integration into BharatAI ecosystem projects

A larger and more capable version of the model is planned for future releases.


Independent AI Development in India

Bharat MiniGPT 350M represents an independent effort to understand and build modern GPT-style architectures from scratch. Projects like this demonstrate how developers in India are actively experimenting with large language model training, optimization, and deployment workflows.

As AI development continues to grow globally, independent research and engineering projects such as Bharat MiniGPT help expand local innovation and technical expertise in the field of artificial intelligence.


Explore More

Official Project Article:
Bharat MiniGPT 350M Official Article


Disclaimer

Bharat MiniGPT 350M is an experimental pretrained base model developed for research and educational purposes. The model is not instruction-tuned yet and may generate inaccurate, biased, or incomplete responses.

  • Harshvardhan Mishra

    Harshvardhan Mishra is the founder and editor of IndicArticles.com, a platform dedicated to exploring India’s vast cultural, historical, political, and scientific heritage through deeply researched and well-structured articles. With a background in technology and a passion for journalism, Harshvardhan brings a unique perspective to topics ranging from geopolitics and ancient Indian wisdom to modern innovations and public policy. He also manages other knowledge-driven platforms such as BharatArticles.com and IoTbyHVM.ooo, where he shares expertise in multi-niche content and emerging technologies. At IndicArticles.com, his goal is to inform, educate, and inspire readers through authentic and insightful content that reflects the spirit of Bharat.

    Related Posts

    Odia New Year 2026 (Pana Sankranti): 50+ Wishes, Captions & Greetings (Odia + English)

    Pana Sankranti, also known as Maha Vishuba Sankranti, marks the traditional New Year in Odisha. It is celebrated with devotion, charity, and the preparation of the refreshing drink Pana, symbolizing…

    Vishu Kani 2026: Kerala Festival Wishes & Greetings (English + Malayalam)

    Vishu is one of the most important festivals celebrated in Kerala, marking the Malayalam New Year. The highlight of this festival is “Vishu Kani”, the first auspicious sight seen in…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Economy & Finance

    How Many Days of Oil Reserve Does India Have? Strategic Petroleum Reserves Explained

    How Many Days of Oil Reserve Does India Have? Strategic Petroleum Reserves Explained

    Why India Needs a $100 Billion Defense Budget — A Strategic and Economic Perspective

    Why India Needs a $100 Billion Defense Budget — A Strategic and Economic Perspective

    Do Sikkim’s People Pay Income Tax? A Comprehensive Analysis

    Do Sikkim’s People Pay Income Tax? A Comprehensive Analysis

    What Is FTA (Free Trade Agreement)? Types, Benefits, and Challenges Explained

    What Is FTA (Free Trade Agreement)? Types, Benefits, and Challenges Explained

    What is Alpha and Beta in Mutual Funds? Understanding the Key Metrics of Fund Performance

    What is Alpha and Beta in Mutual Funds? Understanding the Key Metrics of Fund Performance

    Difference Between IPO and OFS: Meaning, Process, and Key Differences

    Difference Between IPO and OFS: Meaning, Process, and Key Differences