Data Engineering

Real-time Data Pipeline with Kafka & Spark

High-throughput streaming data pipeline processing millions of events per day using Apache Kafka and PySpark. Implements real-time aggregations and anomaly detection.

Apache Kafka Docker PostgreSQL PySpark Python

December 2025 127 stars 34 forks

View on GitHub

Repository Statistics

127

GitHub Stars

34

Forks

Python

Primary Language

More from Data Engineering

Explore other projects in this category

Data Engineering

ETL Orchestration Framework

Custom ETL framework built on Prefect for orchestrating complex data workflows. Features include automatic retry logic, data quality checks, and monitoring dashboards.

Docker PostgreSQL Prefect Python Redis

89 21

View Details

View All Projects