Projects

I’m driven by a passion for building practical, intelligent systems that bridge learning with real-world usability. My projects reflect a focus on efficiency, scalability, and creativity from cloud-deployed LLMs to voice agents and control systems. I’m always eager to learn, explore emerging tools, and collaborate on solving meaningful challenges in AI !!

1. Layer-Wise KV Streaming for Low-Latency LLM Inference

(Ongoing project at CU Boulder)

Project Scope: Currently designing a disaggregated LLM inference system that streams Key-Value (KV) caches layer-by-layer between prefill and decode GPUs . The project aims to overlapped computation and transfer to reduce Time-to-First-Token (TTFT) and overall inference latency while improving GPU utilization efficiency.
Result & Impact: The system is intended to demonstrate layer-wise KV streaming and processing as a method for prefill–decode parallelization, targeting faster response generation and better GPU utilizations.
Tools & Technologies: PyTorch, CUDA, vLLM, Triton Inference Server, NCCL, Multiprocessing, L4 GPUs
Technical Focus Areas: Layer-wise KV-cache streaming, Prefill–decode disaggregation, Overlapped compute–transfer scheduling, GPU utilization optimization, Low-latency LLM inference

2. Retrieval-Augmented Question Answering System

(Github) (Project is hosted, Give it a try!)

Project Scope: Built a full-stack RAG system integrating LoRA-fine-tuned Sentence Transformers for dense vector retrieval with a FLAN-T5 generator, configured over a 12,000+ document corpus. Implemented semantic chunking, vector indexing using FAISS, and prompt coordination. Containerised with Docker and deployed using GitHub Actions.
Result & Impact: Improved Top-3 retrieval accuracy from 81% → 92.4% by tuning cosine thresholds and enabling ranked document retrieval.
Tools & Technologies: Sentence Transformers (LoRA), FAISS, FLAN-T5, Hugging Face, Python, Docker, Streamlit, GitHub Actions, Heroku.
Technical Focus Areas: Retriever–generator alignment, semantic vector search, LoRA fine-tuning, pipeline orchestration, containerized QA systems with CI/CD.

3. News Summarization API with QLoRA

(GitHub)

Project Scope: Trained and deployed a T5-small model on CNN/DailyMail using QLoRA and 8-bit quantization in Amazon SageMaker, exposed via a low-latency REST API using AWS Lambda + API Gateway, and served through a Streamlit frontend.
Result & Impact: Achieved a 20% improvement in ROUGE-L score (up to 42.7) through quantized training and inference pipeline optimization.
Tools & Technologies: T5-small, QLoRA, Hugging Face, SageMaker, AWS Lambda, API Gateway, 8-bit quantization, Streamlit, Python.
Technical Focus Areas: Parameter-efficient LLM tuning, quantized model deployment, serverless NLP APIs, multi-stage token cleanup.

4. Voice-Driven AI Appointment Booking Agent

(GitHub) (DockerHub)

Project Scope: Built a voice-enabled AI appointment agent using Whisper for transcription, LangGraph for stateful orchestration, and Groq's LLM for slot-filling dialogue processing. Integrated AWS SES for email confirmations and deployed via Docker, triggered through Lambda + API Gateway.
Result & Impact: Achieved 86% success rate across 15 real-world scenarios with complete voice-to-email booking flows.
Tools & Technologies: Whisper, LangGraph, LangChain, Groq's LLM, AWS SES, Lambda, API Gateway, Docker, GitHub Actions, S3, Python.
Technical Focus Areas: Agentic workflows, voice-to-text integration, cloud function triggers, multi-agent coordination, real-time pipeline deployment.

5. Cold Email Generator with Chroma DB

(GitHub)

Project Scope: Developed a context-aware cold email generator using ChromaDB for semantic search, Sentence Transformers for profile embeddings, and GPT/Claude APIs for personalized text generation. Enabled dynamic prompt construction and batch profile ingestion via CSV.
Result & Impact: Automated creation of personalized emails conditioned on vector-matched profile data with high semantic relevance.
Tools & Technologies: ChromaDB, Sentence Transformers, GPT/Claude APIs, CSV, Python, Streamlit, Jupyter.
Technical Focus Areas: vector similarity retrieval via Chroma DB, multi-profile batch generation, prompt engineering with LLMs.

6. Phishing Detection Pipeline with MLflow

(GitHub)

Project Scope: Built a phishing URL classifer using Random Forest, Logistic Regression, and Decision Tree models. Integrated MLflow to track experiments, versions, and metrics like accuracy, precision, and recall.
Result & Impact: Achieved 95% classification accuracy on a dataset of 11,000+ samples with reproducible pipeline evaluation.
Tools & Technologies: Scikit-learn, Pandas, MLflow, Python, Matplotlib, NumPy.
Technical Focus Areas: Experiment tracking, binary classification pipelines, feature selection, hyperparameter logging.

7. CNN for Tea Leaf Disease Identification

(Private Project)

Project Scope: Designed and trained a custom CNN model with concatenated convolution layers for classifying tea leaf diseases. Applied extensive data augmentation and preprocessing using OpenCV to improve generalization in low-sample conditions.
Result & Impact: Reached 96% classification accuracy using optimized hyperparameters and regularization.
Tools & Technologies: TensorFlow, Keras, OpenCV, Python, NumPy, ImageDataGenerator.
Technical Focus Areas: Custom CNN design, leaf segmentation, image-based classification, augmentation-based generalization.

8. Model Predictive Current Controller on DSP

(Private Project)

Project Scope: Implemented a Model Predictive Current Controller (MPC) on TMS320F28379D DSP for grid-connected inverter control. Built the predictive model in MATLAB/Simulink, deployed C code via Code Composer Studio, and tested dynamic behavior with real-time feedback loops.
Result & Impact: Reduced transient response time by 56% (2.1 ms → 0.914 ms), significantly improving dynamic stability.
Tools & Technologies: TMS320F28379D, MATLAB, Simulink, Code Composer Studio, Embedded C, PWM, Current Sensors.
Technical Focus Areas: Real-time embedded control, predictive algorithm tuning, grid-interfaced inverter regulation, DSP-based MPC deployment.

Page updated

Google Sites

Report abuse