Sumit Pokharel

ML Engineer & an independent Machine Learning Researcher. Driven by a desire to contribute to breakthroughs in the field of AI research and ML engineering to make the world a better place.

Professional Experience

FPT Japan Holdings, Machine Learning Engineer
Tokyo, Japan
  • Building proof-of-concept RAG systems and AI agents for client projects across multiple active engagements.
  • Conducting internal finetuning experiments alongside the client work, primarily QLoRA adaptation and embedding-model training.
Rakuten Group, Inc., Software Developer - Frontend
Tokyo, Japan
  • Led frontend delivery for 4 major checkout initiatives from development to production: Okihai (contactless delivery), Credit Card Installments, Credit Card Scan, and Digital Address implementation.
  • Architected and built 50+ reusable React components for the new checkout system, creating a shared component foundation for 7 critical Ichiba pages.
  • Developed performance-optimized, accessibility-conscious checkout interfaces for Rakuten Ichiba's 50M+ user base as part of a large-scale checkout revamp.
Best Path Research, Machine Learning Intern
Tokyo, Japan
  • Delivered a Python script that streamlines the conversion of digital text into handwritten images and hex code matching for image retrieval from a directory with 1+ million images of handwritten characters.
  • Advanced the development of an application to correct Japanese receipt distortions by synthesizing a dataset of artificially altered images & training an advanced open-source model to rectify these distortions.

Projects

  • Fine-tuned Qwen3-8B with LoRA on a 25k-row OpenMathInstruct-2-derived math SFT set, comparing attention-only and all-layer adapter scopes across three seeds, evaluated on GSM8K.
  • Both conditions beat the 84.5% baseline (attention-only 90.5%, all-layer 90.1%); the 0.455-pp gap fell below the prewritten 0.01 winner threshold and is reported as directional.
  • Pre-froze the contamination check, checkpoint rule, and null-result threshold before benchmark runs; all six runs converged on step 3169 and attention-only won paired-seed disagreement at every seed (+7, +6, +5).
  • Trained a GPT-2-based sparse Mixture-of-Experts Transformer with 8 experts across code, math, and prose to study whether domain-specialized routing emerges at small scale.
  • Demonstrated clear expert specialization and improved aggregate eval loss by 3.6% over a dense baseline, with math showing the largest gain at 14%.
  • Ran ablations on load balancing and top-2 routing, showing expert collapse within 500 steps without load balancing and only a negligible 0.14% gain from top-2 routing.
  • Implemented a complete sequence-to-sequence neural network from first principles, faithfully reproducing the seminal Sutskever et al. (2014) architecture for machine translation.
  • Built with LSTM encoder-decoder, beam search decoding, CUDA acceleration, and includes pre-trained weights from custom training pipeline of my own.
  • Ongoing project replicating foundational ML research papers through complete from-scratch implementations to master transformer architectures and modern LLM designs.
  • Implemented the original Transformer, GPT-2, LLaMA-2/3, and Mistral 7B with faithful attention mechanisms and positional encodings, continuously expanding to new architectures.

Technical Skills

Languages: Python, TypeScript, JavaScript, HTML, CSS

Architectures & Concepts: Transformer, LLM Architectures (LLaMA, Mistral, GPT-2, gpt-oss, nanoVLM, etc.), Seq2Seq (LSTM), GRPO, Reverse-Mode Autograd, Tinygrad

Frameworks & Tools: PyTorch, NumPy, React.js, Next.js, Astro, Git, Vim

Languages

English (fluent) · Nepali (native) · Japanese (proficient, JLPT N2 certified)

Education

Ritsumeikan Asia Pacific University Bachelor's Degree in Business Administration
Beppu, Japan

CGPA 3.65