Portfolio
Explore my collection of data engineering, web development, and machine learning projects showcasing real-world solutions.
Filtered Projects (7)
AdTech Real-Time Analytics Platform
AdTech real-time analytics pipeline using Kinesis, Flink, and Iceberg on AWS.
Travel Booking SCD2 Data Warehouse
SCD2 travel data warehouse with Delta Lake for analytic insights.
UPI CDC Streaming Analytics Pipeline
Real-time CDC pipeline for UPI transactions using Delta CDF and Databricks.
Customer Segmentation ML Pipeline
End-to-end ML pipeline for customer segmentation using RFM analysis. Includes data preprocessing, feature engineering, clustering algorithms, and model deployment.
Data Lake Architecture on AWS
Serverless data lake implementation on AWS using S3, Lambda, Glue, and Athena. Processes and stores ~20GB daily with automated partitioning and compression.
ETL Orchestration Framework
Custom ETL framework built on Prefect for orchestrating complex data workflows. Features include automatic retry logic, data quality checks, and monitoring dashboards.
Real-time Data Pipeline with Kafka & Spark
High-throughput streaming data pipeline processing millions of events per day using Apache Kafka and PySpark. Implements real-time aggregations and anomaly detection.