Skip to main content

About Me

Data Engineer | Full-Stack Developer | Problem Solver

AK

Hello! I'm Abhishek Kumar

I'm a Data Engineer and Full-Stack Developer based in Bangalore, passionate about building scalable data pipelines and real-time analytics systems that drive business value.

With expertise in modern data engineering tools and frameworks, I specialize in designing and implementing end-to-end data solutions—from ingestion and transformation to visualization and insights. I thrive at the intersection of data engineering and software development, creating robust systems that handle massive scale.

My work spans across diverse domains including AdTech, Travel & Hospitality, Financial Data Analytics, and Real-time Streaming Platforms. I'm particularly excited about building systems that process data at scale using technologies like Apache Spark, Databricks, Kafka, and modern data orchestration tools.

When I'm not architecting data pipelines or writing code, you'll find me exploring new technologies, contributing to open-source projects, or sharing my knowledge through technical writing.

What I Do

Data Engineering

Building scalable ETL/ELT pipelines, data warehouses, and streaming platforms

Real-Time Analytics

Streaming data processing with Kafka, Spark Streaming, and Delta Lake

Full-Stack Development

Django, FastAPI, React applications with modern architectures

Data Architecture

Designing scalable data platforms and cloud infrastructure (AWS, Azure)

Technical Skills

Data Engineering

PySpark Apache Kafka Databricks Delta Lake Prefect dbt Snowflake

Programming Languages

Python SQL JavaScript TypeScript Bash

Web Development

Django FastAPI React Tailwind CSS REST APIs

Cloud & DevOps

AWS Azure Docker Git CI/CD

Experience Highlights

AdTech Analytics Pipeline

Built a real-time data pipeline processing 100K URLs every 15 minutes using Prefect, PySpark, and Databricks. Implemented incremental processing with Delta Lake for efficient data updates.

PySpark Databricks Prefect Delta Lake

Real-Time Streaming Platform

Architected and deployed a streaming analytics platform for crypto and stock data using Kafka, Spark Streaming, and WebSockets. Processes ~20GB of market data daily with sub-second latency.

Kafka Spark Streaming WebSockets Redis

CDC Data Warehouse

Designed and implemented Change Data Capture pipelines for a travel booking platform, synchronizing data across multiple sources with Databricks and Snowflake. Reduced data latency from hours to minutes.

CDC Databricks Snowflake SQL

Let's Work Together

I'm always interested in hearing about new projects and opportunities.