Data Engineering
Real-time Data Pipeline with Kafka & Spark
High-throughput streaming data pipeline processing millions of events per day using Apache Kafka and PySpark. Implements real-time aggregations and anomaly detection.
Apache Kafka
Docker
PostgreSQL
PySpark
Python
December 2025
127 stars
34 forks
Repository Statistics
127
GitHub Stars
34
Forks
Python
Primary Language
More from Data Engineering
Explore other projects in this category
ETL Orchestration Framework
Custom ETL framework built on Prefect for orchestrating complex data workflows. Features include automatic retry logic, data quality checks, and monitoring dashboards.
Docker
PostgreSQL
Prefect
Python
Redis
89
21
View Details