Skip to main content
Data Engineering

Real-time Data Pipeline with Kafka & Spark

High-throughput streaming data pipeline processing millions of events per day using Apache Kafka and PySpark. Implements real-time aggregations and anomaly detection.

Apache Kafka Docker PostgreSQL PySpark Python
December 2025 127 stars 34 forks

Repository Statistics

127
GitHub Stars
34
Forks
Python
Primary Language

More from Data Engineering

Explore other projects in this category

Data Engineering

ETL Orchestration Framework

Custom ETL framework built on Prefect for orchestrating complex data workflows. Features include automatic retry logic, data quality checks, and monitoring dashboards.

Docker PostgreSQL Prefect Python Redis