Harleen Kaur

I'm a

Hokie Data Science Enthusiast Full-Stack Developer Tech Leader & Mentor Creative Thinker Storyteller NITian

Master's student in Computer Science at Virginia Tech, specializing in Machine Learning and Data Analytics. With a knack for transforming data into insights and ideas into impact, I blend technology, creativity, and leadership to shape meaningful solutions.

Actively Looking for Job Opportunities in Full Stack, Data Science and Data Analyst Roles

Academic Projects

Rental Price Prediction System

ML-Based Rental Price Prediction System

Overview: Developed machine learning models to predict rental prices and classify properties into market segments, helping renters and property owners make data-driven decisions.

Key Features:

  • Implemented supervised learning (Linear Regression, XGBoost, LightGBM) and unsupervised algorithms (K-Means) achieving ~90% accuracy
  • Enhanced prediction through dimensionality reduction (PCA), feature engineering, and geospatial data enrichment via Geopy API
  • Validated model performance using cross-validation, hypothesis testing (T-tests, F-tests), and statistical validation techniques
  • Discovered market patterns through clustering analysis and association rule mining
Python scikit-learn TensorFlow Pandas NumPy Matplotlib Seaborn
Emotion Classification using NLP

Emotion Classification using NLP

Overview: Created a text-based emotion classification system using the Hugging Face "Emotion Dataset" to categorize text into six emotions: Sadness, Joy, Love, Anger, Fear, and Surprise.

Key Features:

  • Performed extensive data preprocessing including lowercasing, tokenization, stopword removal, lemmatization, while preserving emotional cues
  • Implemented multiple feature extraction techniques: Bag of Words, TF-IDF (with feature selection using Chi-square), and pre-trained embeddings (Word2Vec, GloVe, Sentence-BERT)
  • Trained and compared multiple models including Random Forest, SVM, and XGBoost
  • Achieved best performance with Word2Vec + XGBoost (68.90% accuracy) versus Sentence-BERT (66.95%) and traditional approaches (59%)
Python NLP Word2Vec BERT TF-IDF XGBoost SVM Random Forest
Naive Bayes & Decision Tree on FMNIST Dataset

Implementing Naive Bayes & Decision Tree on FMNIST Dataset

Overview: Implemented a Naive Bayes classifier from scratch and a Decision Tree model using Scikit-learn to classify clothing items from the Fashion MNIST dataset, focusing on distinguishing between Trousers and Pullovers.

Key Features:

  • Built Naive Bayes classifier from scratch, computing conditional probabilities and posterior probabilities without using ML libraries
  • Developed a Decision Tree classifier (max depth = 10, Gini index) for performance comparison
  • Manually implemented evaluation metrics including accuracy, precision, recall, and ROC curve without using existing libraries
  • Experimented with various binarization thresholds (100 to 200) and analyzed impact on classification performance
Python Naive Bayes Decision Trees scikit-learn FMNIST ROC Curve Feature Preprocessing
Interactive Data Visualization Platform

Interactive Data Visualization Platform

Overview: Created a web-based analysis platform with dynamic visualizations enabling users to perform complex data exploration without programming knowledge.

Key Features:

  • Built interactive dashboard with dynamic filtering, feature selection, and statistical analysis tools
  • Implemented comprehensive visualization suite for distribution analysis, relationship visualization, and categorical data exploration
  • Integrated analytical capabilities for outlier detection, PCA-based dimensionality reduction, and normality testing
  • Deployed containerized application to Google Cloud Platform for worldwide accessibility
Python Dash Plotly Matplotlib Seaborn Docker GCP
E-commerce Platform

E-Commerce Platform: Java & Angular Web Application

Overview: Engineered a robust, scalable e-commerce platform that revolutionized online purchasing experiences.

Key Features:

  • Secure payment integration with Stripe, ensuring seamless and protected financial transactions
  • Advanced backend optimization with Java, reducing database query response times by 40%
  • Comprehensive Angular frontend delivering responsive, intuitive user interfaces
  • Automated customer communication workflows enhancing user engagement and retention
  • Implemented complex SQL query optimizations for efficient data retrieval and management

Impact: Developed a high-performance e-commerce solution that significantly improved transaction speed, security, and overall user experience.

Java Spring Boot Angular SQL Stripe API RESTful Services

Work Experience

Full Stack Engineer

Fidelity Investments

August 2021 - August 2024
Data Engineering & Analytics Projects
Cloud-Based ETL Pipeline
  • Designed and implemented a comprehensive data transformation architecture using Spring Batch, AWS S3, and Snowflake, creating a scalable system that replaced legacy flat file processing
  • Implemented statistical validation procedures and data normalization techniques to ensure data quality and consistency
  • Achieved $2.5M annual cost savings through architecture optimization and process automation
Spring Batch AWS S3 Snowflake Java Statistical Analysis
Real-Time Data Processing System
  • Led development of an end-to-end data pipeline that processed 50K+ records per minute from mainframe systems to cloud environments
  • Engineered data transformation workflows using Python, SQL, Spring Boot, and CDC replication with Kafka integration
  • Implemented statistical analysis methods to validate data integrity during high-volume migrations
Python Spring Boot SQL Kafka CDC AWS Oracle
Automated Data Validation Framework
  • Created a robust validation framework achieving 90% test coverage for critical customer account workflows
  • Developed rule-based validation checks and statistical thresholds to automate quality assurance
  • Designed modular components that could be reused across multiple data pipelines
Java JUnit Cucumber Framework SQL Statistical Methods
System Optimization & Infrastructure
Production Performance Analysis
  • Conducted comprehensive analysis of production systems by extracting patterns from error logs and performance metrics
  • Identified 50+ critical system bottlenecks through data-driven investigation
  • Implemented SQL query and stored procedure optimizations that reduced latency by 40%
SQL Performance Tuning Log Analysis Statistical Methods Datadog Perf Testing
Cloud Infrastructure Automation
  • Architected containerized data pipelines using Terraform, Docker, and Kubernetes (EKS)
  • Implemented CI/CD workflows that increased deployment frequency and reduced manual intervention
  • Improved data processing efficiency by 60% through automated workflow optimization
Terraform Docker Kubernetes AWS EKS CI/CD
Database Architecture Analysis
  • Conducted data-driven proof of concept comparing CockroachDB and Oracle performance metrics
  • Analyzed read/write speeds and transaction processing capabilities under various workloads
  • Provided evidence-based recommendations for database architecture to support high-volume financial data processing
CockroachDB Oracle Performance Testing SQL
API Development
Mutual Funds Domain
  • Developed a dynamic Rule Engine API with multiple parameters for accurate fund selection and streamlined rules creation
  • Maintained comprehensive Swagger documentation for RESTful APIs to ensure seamless integration
  • Designed and implemented microservices architecture serving 10K+ daily users with robust Java backend services
Java Spring Boot REST API Swagger Microservices Datadog AWS Docker
Frontend Integration
  • Debugged and enhanced frontend applications for mutual fund management using Angular and React
  • Implemented feature enhancements that improved user experience and application performance
  • Conducted UI testing using Selenium framework to ensure cross-browser compatibility and responsive design
Angular React TypeScript JavaScript Selenium

Summer Analyst

Fidelity Investments

May 2020 - July 2020
Feedback Analytics System
  • Engineered a data collection and analytics system for 60,000+ employees using SharePoint
  • Implemented A/B testing methodologies to optimize feature placement and user engagement
  • Developed interactive dashboards using Power BI and Tableau to transform feedback data into actionable insights
React TypeScript Power BI Tableau A/B Testing SharePoint
SharePoint Performance Optimization
  • Reduced SharePoint page load time by 40% for all employees by standardizing JavaScript and CSS file references across all sites
  • Improved user productivity and increased platform engagement through optimized web performance
SharePoint JavaScript CSS Web Performance

Skills & Technologies

Programming Languages

Java Python SQL PL/SQL TypeScript JavaScript HTML/CSS C Node.js

Frameworks & Tools

Spring Boot Spring Batch Angular React Maven Git JIRA Swagger JUnit Selenium Cucumber

Cloud & DevOps

AWS (EC2, EKS, S3) Google Cloud Platform Azure Docker Kubernetes CI/CD Terraform Jenkins Agile Methodology

Security & Compliance

OWASP Top 10 Application Security Pentesting Data Compliance Security Scanners

Database & Big Data

Oracle Snowflake CockroachDB NoSQL Apache Kafka ETL Pipelines

Data Science & ML

Scikit-learn TensorFlow XGBoost LightGBM Pandas NumPy SciPy Neural Networks NLP

Statistical Methods

Hypothesis Testing A/B Testing Regression Analysis Time Series Analysis Cross-validation Anomaly Detection Classification Models Predictive Modeling

Data Visualization

Tableau Power BI Plotly Dash Matplotlib Seaborn Interactive Dashboards

Certifications

AWS Certified Cloud Practitioner
Database Certified SQL Associate
Information Security (OWASP)

Reach Out

Location

Virginia, United States

About Me

Harleen Kaur

A storyteller at heart and an engineer by trade, I thrive at the crossroads of creativity and technology. Whether it's designing intelligent systems, leading impactful initiatives, or bringing ideas to life through code and art, I believe in innovation with purpose.

Currently, I am a Master's student at Virginia Tech, specializing in Machine Learning and Data Analytics, with three years of experience at Fidelity Investments as a Full-Stack Developer. I've built large-scale data pipelines, optimized cloud infrastructures, and engineered solutions that saved millions—earning me two Impact Awards. My leadership journey includes heading Learning & Development at Fidelity, mentoring teams, and driving organization-wide upskilling initiatives. Now, as a Graduate School Assistant at Virginia Tech, I blend research, strategy, and innovation to support academic growth.

Beyond tech, I paint, teach, and design. As the Social Media Head of E-Cell, I nurtured my love for branding and storytelling, crafting digital experiences that inspire. I also enjoy mentoring students, breaking down complex topics into engaging lessons.

When I'm not coding or strategizing, you'll find me bringing stories to canvas.

Location:

Arlington, Virginia, USA

Degree:

MEng in Computer Science

Explore My Creative Side

Discover my artwork - where technology meets creativity and stories come to life through colors and shapes.

Visit My Art Gallery
Sample Artwork