Nike Data Pipeline, Sales Analysis & BI System
Production-grade ETL pipeline and BI system analyzing 2,000+ sales transactions with Python, revealing profit gaps, optimizing margins, and visualizing KPIs across regions wth Tableau.
About This Project
Data Pipeline System and Data Analysis of NIKE SALES
End-to-End Business Analytics Platform for Revenue Optimization
The Problem
Nike's sales data across 6 Indian regions was riddled with quality issues: 66.7% missing values, 114 duplicate records, inconsistent region names, and 544 loss-making transactions. Without clean data and analytical insights, the business was unable to identify revenue leaks, optimize discount strategies, or predict seasonal demand patterns. Manual processing took 6+ hours per analysis cycle with significant error rates.
The Solution
I engineered a comprehensive business intelligence platform that transforms raw, messy sales data into actionable strategic insights through automated ETL pipelines, statistical analysis, and executive dashboards.
Impact: Identified 1.2M+ in profit optimization opportunities, reduced data processing time by 92%, and uncovered critical business patterns across 2,123 transactions.
Key Features
Automated ETL Pipeline
10-step data transformation process handling 2,500+ raw transactions
IQR-based outlier detection removing 263 extreme values
Missing value imputation strategies (median, mode, conservative assumptions)
Duplicate removal and data validation achieving 100% completeness
SQLite database and csv integration with optimized indexing
Comprehensive Exploratory Data Analysis
Statistical analysis across 5 product lines and 3 customer segments
Correlation analysis identifying discount impact on profitability
Time-series decomposition revealing 77.6% revenue concentration in December
Regional performance disparities ranging from 10.3% to 12.0% profit margins
Transaction value distribution with top 20% generating 33.1% of revenue
Interactive Business Intelligence Dashboard
15+ professional visualizations using Matplotlib, Seaborn, and Plotly
Executive KPI panel tracking revenue, profit margins, AOV, and loss rates
4-panel regional analysis showing revenue, profit, margins, and market share
Product performance heatmaps with category-wise metrics
Time-series trend analysis with monthly and seasonal patterns
Customer segmentation charts with transaction value distributions
Tableau Executive Dashboard
8+ interactive visualizations with drill-down capabilities
Geographic heatmaps showing regional performance
Real-time KPI monitoring across online (49.9%) and retail (50.1%) channels
Filter functionality by region, product line, gender, and sales channel
Cross-filtering across all visualizations for deep analysis
Hover tooltips with detailed metrics and context
Technical Stack
Data Processing: Python, Pandas, NumPy, SQLite database with custom schema
Analysis: Statistical modeling, correlation analysis, regression techniques
Visualization: Matplotlib, Seaborn, Plotly for programmatic charts, Tableau for interactive dashboards
Development: Jupyter Notebook for exploratory analysis, Git for version control
Performance: Automated pipeline reducing processing time from 6 hours to 30 minutes
Key Technical Challenges Solved
1. Extreme Data Quality Issues
Problem: 66.7% missing values in critical columns, 114 duplicates, inconsistent region names
Solution: Multi-step imputation strategy, fuzzy matching for region standardization, validation frameworks
Result: 100% data completeness with 99.5% quality score
2. Revenue Calculation Inconsistencies
Problem: 408 transactions with negative units, revenue mismatches, invalid discount values
Solution: Recalculated revenue using formula (Units × MRP × (1 - Discount/100)), converted negatives to absolutes
Result: Consistent revenue calculations across all 2,123 transactions
3. Outlier Detection Without Losing Business Context
Problem: Extreme values could be legitimate high-value sales or data errors
Solution: IQR method with business logic validation, preserved legitimate high-value transactions
Result: Removed 263 statistical outliers while retaining valid business data
4. Seasonal Pattern Recognition
Problem: Identifying meaningful trends in noisy time-series data
Solution: Month-over-month analysis, moving averages, seasonal decomposition
Result: Discovered 77.6% December concentration representing critical business risk
5. Actionable Insight Generation
Problem: Converting statistical findings into business recommendations
Solution: Benchmarking against industry standards, A/B scenario modeling, ROI calculations
Result: Quantified 1.2M+ optimization potential with specific action items
Results
1.2M+ profit optimization — Identified through discount strategy, AOV standardization, and margin improvements
92% time reduction — Automated pipeline processes 2,500 records in 30 minutes vs. 6 hours manually
100% data completeness — From 66.7% missing values to fully validated dataset
99.5% data quality — Comprehensive validation and cleaning across all fields
77.6% seasonal risk — Discovered extreme December revenue concentration requiring mitigation
22.8% loss rate — Identified 544 loss-making transactions worth 281K+ recovery opportunity
15+ visualizations — Professional charts ready for executive presentations
Development Highlights
ETL Pipeline Architecture:
Phase 1: Extraction from CSV with error handling
Phase 2: Quality assessment identifying 5 critical issues
Phase 3: 10-step transformation (imputation, deduplication, standardization, validation)
Phase 4: Comprehensive validation with 7 automated checks
Phase 5: Loading to SQLite/csv with metadata tracking
Statistical Analysis Techniques:
Descriptive statistics (mean, median, standard deviation, quartiles)
Correlation analysis between discounts and profit margins
Distribution analysis identifying high-value customer segments
Time-series decomposition for seasonality patterns
Comparative analysis across regions, products, and channels
Visualization Design Principles:
Executive-ready aesthetics with professional color schemes
Value labels and annotations for instant comprehension
Multi-panel layouts for comprehensive storytelling
Consistent styling across all charts for cohesion
Interactive elements in Tableau for exploration
Key Business Insights Delivered
Revenue Analysis:
Mumbai leads with 4.6M revenue (17.8% market share)
Balanced product portfolio: all categories contribute 18-22% of revenue
Online and retail channels perfectly balanced at 49.9%/50.1%
Top 20% of transactions generate 33.1% of total revenue
Profitability Patterns:
Kolkata highest profit margin at 12.0% despite moderate revenue
Hyderabad paradox: high revenue (4.4M) but lowest margin (10.3%)
Soccer category: lowest revenue (18.2%) but highest margin (12.0%)
Discount sweet spot: 10-20% maintains volume with healthy margins
Customer Behavior:
Men's category: 35-40% revenue, highest AOV, premium segment
Women's category: 30-35% revenue, growth opportunity identified
Kids' category: 25-30% revenue, seasonal purchasing patterns
High-value segment: Top 20% customers drive majority of profit
Seasonal Volatility:
December: 20.1M revenue (77.6% of annual sales) - critical risk
October: 374K revenue (lowest month) - 5,369% peak-to-trough ratio
Extreme concentration creates cash flow and inventory challenges
Opportunity: Develop Q1-Q3 promotional strategies
Strategic Recommendations Provided
Immediate Actions (0-30 Days):
Implement 25% discount cap to save 300K+ annually
Address Hyderabad's 1.7% margin gap (74.8K profit opportunity)
Standardize AOV across regions (884K revenue potential)
Review 544 loss-making transactions for pricing corrections
Short-Term Initiatives (60-90 Days):
Develop off-season promotional calendar to reduce December dependency
Launch VIP program targeting top 20% customers
Expand Soccer category inventory (highest margin opportunity)
Implement regional pricing strategies based on margin disparities
Long-Term Strategy (6-12 Months):
Build predictive models for demand forecasting
Develop real-time KPI dashboard for daily monitoring
Implement customer segmentation for personalized marketing
Expand to 2-3 new tier-2 cities based on profitability patterns
What I Learned
Data quality is foundational — 66.7% missing values taught me that cleaning is 80% of the work
Business context matters — Statistical outliers aren't always errors; domain knowledge prevents over-cleaning
Visualizations drive decisions — Executives responded more to charts than statistical tables
Automation creates value — 92% time reduction allows analysts to focus on insights, not data wrangling
Metrics tell compelling stories — Quantifying "1.2M+ optimization" resonates more than "improved profitability"
Technologies
Python • Pandas • NumPy • Matplotlib • Seaborn • Plotly • SQLite • Jupyter Notebook • Tableau • Git