I’m driven by a passion for building practical, intelligent systems that bridge learning with real-world usability. My projects reflect a focus on efficiency, scalability, and creativity from cloud-deployed LLMs to voice agents and control systems. I’m always eager to learn, explore emerging tools, and collaborate on solving meaningful challenges in AI !!
(Ongoing project at CU Boulder)
Project Scope: Currently designing a disaggregated LLM inference system that streams Key-Value (KV) caches layer-by-layer between prefill and decode GPUs . The project aims to overlapped computation and transfer to reduce Time-to-First-Token (TTFT) and overall inference latency while improving GPU utilization efficiency.
Result & Impact: The system is intended to demonstrate layer-wise KV streaming and processing as a method for prefill–decode parallelization, targeting faster response generation and better GPU utilizations.
Tools & Technologies: PyTorch, CUDA, vLLM, Triton Inference Server, NCCL, Multiprocessing, L4 GPUs
Technical Focus Areas: Layer-wise KV-cache streaming, Prefill–decode disaggregation, Overlapped compute–transfer scheduling, GPU utilization optimization, Low-latency LLM inference
(Github) (Project is hosted, Give it a try!)
Project Scope: Built a full-stack RAG system integrating LoRA-fine-tuned Sentence Transformers for dense vector retrieval with a FLAN-T5 generator, configured over a 12,000+ document corpus. Implemented semantic chunking, vector indexing using FAISS, and prompt coordination. Containerised with Docker and deployed using GitHub Actions.
Result & Impact: Improved Top-3 retrieval accuracy from 81% → 92.4% by tuning cosine thresholds and enabling ranked document retrieval.
Tools & Technologies: Sentence Transformers (LoRA), FAISS, FLAN-T5, Hugging Face, Python, Docker, Streamlit, GitHub Actions, Heroku.
Technical Focus Areas: Retriever–generator alignment, semantic vector search, LoRA fine-tuning, pipeline orchestration, containerized QA systems with CI/CD.
Project Scope: Trained and deployed a T5-small model on CNN/DailyMail using QLoRA and 8-bit quantization in Amazon SageMaker, exposed via a low-latency REST API using AWS Lambda + API Gateway, and served through a Streamlit frontend.
Result & Impact: Achieved a 20% improvement in ROUGE-L score (up to 42.7) through quantized training and inference pipeline optimization.
Tools & Technologies: T5-small, QLoRA, Hugging Face, SageMaker, AWS Lambda, API Gateway, 8-bit quantization, Streamlit, Python.
Technical Focus Areas: Parameter-efficient LLM tuning, quantized model deployment, serverless NLP APIs, multi-stage token cleanup.
Project Scope: Built a voice-enabled AI appointment agent using Whisper for transcription, LangGraph for stateful orchestration, and Groq's LLM for slot-filling dialogue processing. Integrated AWS SES for email confirmations and deployed via Docker, triggered through Lambda + API Gateway.
Result & Impact: Achieved 86% success rate across 15 real-world scenarios with complete voice-to-email booking flows.
Tools & Technologies: Whisper, LangGraph, LangChain, Groq's LLM, AWS SES, Lambda, API Gateway, Docker, GitHub Actions, S3, Python.
Technical Focus Areas: Agentic workflows, voice-to-text integration, cloud function triggers, multi-agent coordination, real-time pipeline deployment.
(GitHub)
Project Scope: Developed a context-aware cold email generator using ChromaDB for semantic search, Sentence Transformers for profile embeddings, and GPT/Claude APIs for personalized text generation. Enabled dynamic prompt construction and batch profile ingestion via CSV.
Result & Impact: Automated creation of personalized emails conditioned on vector-matched profile data with high semantic relevance.
Tools & Technologies: ChromaDB, Sentence Transformers, GPT/Claude APIs, CSV, Python, Streamlit, Jupyter.
Technical Focus Areas: vector similarity retrieval via Chroma DB, multi-profile batch generation, prompt engineering with LLMs.
(GitHub)
Project Scope: Built a phishing URL classifer using Random Forest, Logistic Regression, and Decision Tree models. Integrated MLflow to track experiments, versions, and metrics like accuracy, precision, and recall.
Result & Impact: Achieved 95% classification accuracy on a dataset of 11,000+ samples with reproducible pipeline evaluation.
Tools & Technologies: Scikit-learn, Pandas, MLflow, Python, Matplotlib, NumPy.
Technical Focus Areas: Experiment tracking, binary classification pipelines, feature selection, hyperparameter logging.
(Private Project)
Project Scope: Designed and trained a custom CNN model with concatenated convolution layers for classifying tea leaf diseases. Applied extensive data augmentation and preprocessing using OpenCV to improve generalization in low-sample conditions.
Result & Impact: Reached 96% classification accuracy using optimized hyperparameters and regularization.
Tools & Technologies: TensorFlow, Keras, OpenCV, Python, NumPy, ImageDataGenerator.
Technical Focus Areas: Custom CNN design, leaf segmentation, image-based classification, augmentation-based generalization.
(Private Project)
Project Scope: Implemented a Model Predictive Current Controller (MPC) on TMS320F28379D DSP for grid-connected inverter control. Built the predictive model in MATLAB/Simulink, deployed C code via Code Composer Studio, and tested dynamic behavior with real-time feedback loops.
Result & Impact: Reduced transient response time by 56% (2.1 ms → 0.914 ms), significantly improving dynamic stability.
Tools & Technologies: TMS320F28379D, MATLAB, Simulink, Code Composer Studio, Embedded C, PWM, Current Sensors.
Technical Focus Areas: Real-time embedded control, predictive algorithm tuning, grid-interfaced inverter regulation, DSP-based MPC deployment.