Nike Data Pipeline, Sales Analysis & BI System

Production-grade ETL pipeline and BI system analyzing 2,000+ sales transactions with Python, revealing profit gaps, optimizing margins, and visualizing KPIs across regions wth Tableau.

Nike Data Pipeline, Sales Analysis & BI System

About This Project

Data Pipeline System and Data Analysis of NIKE SALES
End-to-End Business Analytics Platform for Revenue Optimization

The Problem
Nike's sales data across 6 Indian regions was riddled with quality issues: 66.7% missing values, 114 duplicate records, inconsistent region names, and 544 loss-making transactions. Without clean data and analytical insights, the business was unable to identify revenue leaks, optimize discount strategies, or predict seasonal demand patterns. Manual processing took 6+ hours per analysis cycle with significant error rates.

The Solution
I engineered a comprehensive business intelligence platform that transforms raw, messy sales data into actionable strategic insights through automated ETL pipelines, statistical analysis, and executive dashboards.
Impact: Identified 1.2M+ in profit optimization opportunities, reduced data processing time by 92%, and uncovered critical business patterns across 2,123 transactions.

Key Features
Automated ETL Pipeline

10-step data transformation process handling 2,500+ raw transactions
IQR-based outlier detection removing 263 extreme values
Missing value imputation strategies (median, mode, conservative assumptions)
Duplicate removal and data validation achieving 100% completeness
SQLite database and csv integration with optimized indexing

Comprehensive Exploratory Data Analysis

Statistical analysis across 5 product lines and 3 customer segments
Correlation analysis identifying discount impact on profitability
Time-series decomposition revealing 77.6% revenue concentration in December
Regional performance disparities ranging from 10.3% to 12.0% profit margins
Transaction value distribution with top 20% generating 33.1% of revenue

Interactive Business Intelligence Dashboard

15+ professional visualizations using Matplotlib, Seaborn, and Plotly
Executive KPI panel tracking revenue, profit margins, AOV, and loss rates
4-panel regional analysis showing revenue, profit, margins, and market share
Product performance heatmaps with category-wise metrics
Time-series trend analysis with monthly and seasonal patterns
Customer segmentation charts with transaction value distributions

Tableau Executive Dashboard

8+ interactive visualizations with drill-down capabilities
Geographic heatmaps showing regional performance
Real-time KPI monitoring across online (49.9%) and retail (50.1%) channels
Filter functionality by region, product line, gender, and sales channel
Cross-filtering across all visualizations for deep analysis
Hover tooltips with detailed metrics and context

Technical Stack
Data Processing: Python, Pandas, NumPy, SQLite database with custom schema
Analysis: Statistical modeling, correlation analysis, regression techniques
Visualization: Matplotlib, Seaborn, Plotly for programmatic charts, Tableau for interactive dashboards
Development: Jupyter Notebook for exploratory analysis, Git for version control
Performance: Automated pipeline reducing processing time from 6 hours to 30 minutes

Key Technical Challenges Solved
1. Extreme Data Quality Issues

Problem: 66.7% missing values in critical columns, 114 duplicates, inconsistent region names
Solution: Multi-step imputation strategy, fuzzy matching for region standardization, validation frameworks
Result: 100% data completeness with 99.5% quality score

2. Revenue Calculation Inconsistencies

Problem: 408 transactions with negative units, revenue mismatches, invalid discount values
Solution: Recalculated revenue using formula (Units × MRP × (1 - Discount/100)), converted negatives to absolutes
Result: Consistent revenue calculations across all 2,123 transactions

3. Outlier Detection Without Losing Business Context

Problem: Extreme values could be legitimate high-value sales or data errors
Solution: IQR method with business logic validation, preserved legitimate high-value transactions
Result: Removed 263 statistical outliers while retaining valid business data

4. Seasonal Pattern Recognition

Problem: Identifying meaningful trends in noisy time-series data
Solution: Month-over-month analysis, moving averages, seasonal decomposition
Result: Discovered 77.6% December concentration representing critical business risk

5. Actionable Insight Generation

Problem: Converting statistical findings into business recommendations
Solution: Benchmarking against industry standards, A/B scenario modeling, ROI calculations
Result: Quantified 1.2M+ optimization potential with specific action items

Results
1.2M+ profit optimization — Identified through discount strategy, AOV standardization, and margin improvements
92% time reduction — Automated pipeline processes 2,500 records in 30 minutes vs. 6 hours manually
100% data completeness — From 66.7% missing values to fully validated dataset
99.5% data quality — Comprehensive validation and cleaning across all fields
77.6% seasonal risk — Discovered extreme December revenue concentration requiring mitigation
22.8% loss rate — Identified 544 loss-making transactions worth 281K+ recovery opportunity
15+ visualizations — Professional charts ready for executive presentations

Development Highlights
ETL Pipeline Architecture:

Phase 1: Extraction from CSV with error handling
Phase 2: Quality assessment identifying 5 critical issues
Phase 3: 10-step transformation (imputation, deduplication, standardization, validation)
Phase 4: Comprehensive validation with 7 automated checks
Phase 5: Loading to SQLite/csv with metadata tracking

Statistical Analysis Techniques:

Descriptive statistics (mean, median, standard deviation, quartiles)
Correlation analysis between discounts and profit margins
Distribution analysis identifying high-value customer segments
Time-series decomposition for seasonality patterns
Comparative analysis across regions, products, and channels

Visualization Design Principles:

Executive-ready aesthetics with professional color schemes
Value labels and annotations for instant comprehension
Multi-panel layouts for comprehensive storytelling
Consistent styling across all charts for cohesion
Interactive elements in Tableau for exploration

Key Business Insights Delivered
Revenue Analysis:

Mumbai leads with 4.6M revenue (17.8% market share)
Balanced product portfolio: all categories contribute 18-22% of revenue
Online and retail channels perfectly balanced at 49.9%/50.1%
Top 20% of transactions generate 33.1% of total revenue

Profitability Patterns:

Kolkata highest profit margin at 12.0% despite moderate revenue
Hyderabad paradox: high revenue (4.4M) but lowest margin (10.3%)
Soccer category: lowest revenue (18.2%) but highest margin (12.0%)
Discount sweet spot: 10-20% maintains volume with healthy margins

Customer Behavior:

Men's category: 35-40% revenue, highest AOV, premium segment
Women's category: 30-35% revenue, growth opportunity identified
Kids' category: 25-30% revenue, seasonal purchasing patterns
High-value segment: Top 20% customers drive majority of profit

Seasonal Volatility:

December: 20.1M revenue (77.6% of annual sales) - critical risk
October: 374K revenue (lowest month) - 5,369% peak-to-trough ratio
Extreme concentration creates cash flow and inventory challenges
Opportunity: Develop Q1-Q3 promotional strategies

Strategic Recommendations Provided
Immediate Actions (0-30 Days):

Implement 25% discount cap to save 300K+ annually
Address Hyderabad's 1.7% margin gap (74.8K profit opportunity)
Standardize AOV across regions (884K revenue potential)
Review 544 loss-making transactions for pricing corrections

Short-Term Initiatives (60-90 Days):

Develop off-season promotional calendar to reduce December dependency
Launch VIP program targeting top 20% customers
Expand Soccer category inventory (highest margin opportunity)
Implement regional pricing strategies based on margin disparities

Long-Term Strategy (6-12 Months):

Build predictive models for demand forecasting
Develop real-time KPI dashboard for daily monitoring
Implement customer segmentation for personalized marketing
Expand to 2-3 new tier-2 cities based on profitability patterns

What I Learned
Data quality is foundational — 66.7% missing values taught me that cleaning is 80% of the work
Business context matters — Statistical outliers aren't always errors; domain knowledge prevents over-cleaning
Visualizations drive decisions — Executives responded more to charts than statistical tables
Automation creates value — 92% time reduction allows analysts to focus on insights, not data wrangling
Metrics tell compelling stories — Quantifying "1.2M+ optimization" resonates more than "improved profitability"

Technologies
Python • Pandas • NumPy • Matplotlib • Seaborn • Plotly • SQLite • Jupyter Notebook • Tableau • Git

Project Information

Technologies Used

Python 3.12 Pandas NumPy SQLite Tableau Matplotlib Seaborn Plotly Jupyter Notebook Virtual Environment (venv) Git ETL Pipeline SQL CSV Processing Statistical Analysis Business Intelligence KPI Dashboards Data Validation Executive Reporting

Project Timeline

Started October 2025
Last Updated October 2025

Like What You See?

Let's discuss how we can work together on your next project.

Get In Touch
×