Skip to main content

Portfolio

Explore my collection of data engineering, web development, and machine learning projects showcasing real-world solutions.

Filtered Projects (7)

Real-Time Analytics

AdTech Real-Time Analytics Platform

AdTech real-time analytics pipeline using Kinesis, Flink, and Iceberg on AWS.

Apache Flink Apache Iceberg Athena AWS Kinesis Python +1 more
Data Warehousing

Travel Booking SCD2 Data Warehouse

SCD2 travel data warehouse with Delta Lake for analytic insights.

Delta Lake PySpark Python
Streaming Analytics

UPI CDC Streaming Analytics Pipeline

Real-time CDC pipeline for UPI transactions using Delta CDF and Databricks.

Databricks Delta Lake Python Spark Structured Streaming Unity Catalog
Machine Learning

Customer Segmentation ML Pipeline

End-to-end ML pipeline for customer segmentation using RFM analysis. Includes data preprocessing, feature engineering, clustering algorithms, and model deployment.

FastAPI Pandas Python Scikit-learn
DevOps & Cloud

Data Lake Architecture on AWS

Serverless data lake implementation on AWS using S3, Lambda, Glue, and Athena. Processes and stores ~20GB daily with automated partitioning and compression.

AWS PySpark Python Terraform
Data Engineering

ETL Orchestration Framework

Custom ETL framework built on Prefect for orchestrating complex data workflows. Features include automatic retry logic, data quality checks, and monitoring dashboards.

Docker PostgreSQL Prefect Python Redis
Data Engineering

Real-time Data Pipeline with Kafka & Spark

High-throughput streaming data pipeline processing millions of events per day using Apache Kafka and PySpark. Implements real-time aggregations and anomaly detection.

Apache Kafka Docker PostgreSQL PySpark Python