Sumit Pokharel

pokharel.sumit@proton.me Tokyo, Japan sumit-pokharel git.sumit.so

Machine Learning Engineer in Tokyo building LLM training, evaluation, and applied AI systems. Recent work spans RAG and agent prototypes, LoRA and embedding-model experiments, sparse MoE training, and production frontend delivery for large-scale marketplace checkout systems.

Professional Experience

FPT Japan Holdings, Machine Learning Engineer

Apr 2026 – Present • Tokyo, Japan

Building client-facing RAG and AI-agent prototypes across multiple engagements, covering retrieval design, prompt/tool orchestration, and integration planning.
Running internal LoRA and embedding-model experiments, including dataset preparation, training configuration, and qualitative/quantitative evaluation of model behavior.

Rakuten Group, Inc., Software Developer - Frontend

Apr 2024 – Mar 2026 • Tokyo, Japan

Led frontend delivery for 4 major checkout initiatives from development to production: Okihai, Credit Card Installments, Credit Card Scan, and Digital Address.
Architected and built 50+ reusable React components for the new checkout system, creating a shared component foundation for 7 critical Ichiba pages.
Shipped performance-conscious, accessible checkout flows for Rakuten Ichiba's 50M+ user marketplace, collaborating across product, backend, QA, and design teams.

Best Path Research, Machine Learning Intern

Jul 2023 – Aug 2023 • Tokyo, Japan

Built Python tooling for handwritten-character image generation and hex-code matching, enabling retrieval across a directory of 1M+ character images.
Supported Japanese receipt-correction research by generating synthetic distortion datasets and training an open-source vision model for image rectification.

Projects

LoRA and Friends: Qwen3-8B Math SFT Target-Module Study

Fine-tuned Qwen3-8B with LoRA on a 25k-row OpenMathInstruct-2-derived math SFT set, comparing attention-only and all-layer adapter scopes across three seeds, evaluated on GSM8K.
Both LoRA variants improved GSM8K accuracy over the 84.5% baseline (attention-only 90.5%, all-layer 90.1%); attention-only was directionally stronger but below the predefined winner threshold.
Predefined contamination checks, checkpoint selection, and null-result criteria before benchmark runs; all six training runs converged consistently.

Expert Emergence in a Sparse MoE Transformer

Trained a GPT-2-based sparse Mixture-of-Experts Transformer with 8 experts across code, math, and prose to study whether domain-specialized routing emerges at small scale.
Demonstrated clear expert specialization and improved aggregate eval loss by 3.6% over a dense baseline, with math showing the largest gain at 14%.
Ran ablations on load balancing and top-2 routing, showing expert collapse within 500 steps without load balancing and only a negligible 0.14% gain from top-2 routing.

Robuchan: Recipe Adaptation Fine-Tuning with Mistral

Fine-tuned a Mistral 8B Instruct model for dietary recipe adaptation using synthetic training data generated from 530K Food.com recipes.
Built the data generation, quality-gating, fine-tuning, and evaluation pipeline with W&B tracking and a Hugging Face demo.
Improved judged adaptation quality from 67% to 82% against the base model using Mistral Large as evaluator.

Seq2Seq Neural Machine Translation from Scratch

Implemented a complete sequence-to-sequence neural network from first principles, faithfully reproducing the seminal Sutskever et al. (2014) architecture for machine translation.
Built with LSTM encoder-decoder, beam search decoding, GPU-aware training support, and pre-trained weights from a custom training pipeline.

Technical Skills

Languages: Python, C, TypeScript, JavaScript

ML / LLM: PyTorch, NumPy, HuggingFace Transformers, PEFT (LoRA), RAG, MoE

Engineering: HuggingFace, Git, uv, Ruff, React

Languages

English (fluent) · Nepali (native) · Japanese (proficient, JLPT N2 certified)

Education

Ritsumeikan Asia Pacific University Bachelor's Degree in Business Administration

Sep 2019 – Sep 2023 • Beppu, Japan

CGPA 3.65

Press Ctrl+P to print