Analysis Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| cohort_period | month | cohort_period |
| retention_periods | 12 | retention_periods |
| churn_definition | 1 | churn_definition |
Purpose
This analysis examines customer retention patterns across 13 cohorts spanning 2009-2010 for an online retail store. The objective is to understand how customer cohorts behave over their lifecycle, identify churn trends, and quantify the relationship between retention and revenue generation. This foundational view establishes the scale and severity of customer attrition challenges.
Key Findings
- First Period Churn Rate: 77.0% - Nearly three-quarters of customers leave after their first period, indicating severe early-stage attrition
- Overall Retention (t=1): 23.0% - Only about one in four customers remain active after the first period
- Median Customer Lifetime: 4 periods - Typical customers remain active for approximately 4 time periods before churning
- Average Revenue per Customer: $2,047.29 - Despite high churn, retained customers generate substantial lifetime value
- Retention Stabilization: Retention rates plateau around 24-28% at t=3, t=6, and t=12, suggesting survivors represent a stable core segment
Interpretation
The data reveals a classic "leaky bucket" pattern: massive initial attrition followed by stabilization among survivors. The 77% first-period churn dominates the retention profile, yet customers who survive this critical phase generate meaningful revenue.
Data preprocessing and column mapping
Purpose
This section documents the data preprocessing pipeline for the retention analysis covering 4,314 customers across 13 cohorts from December 2009 to December 2010. Perfect data retention indicates no rows were removed during cleaning, suggesting either exceptionally clean source data or minimal validation applied. Understanding preprocessing decisions is critical because they directly affect the reliability of retention metrics, survival curves, and revenue calculations used to assess customer lifetime value.
Key Findings
- Initial vs. Final Rows: 17,001 rows retained with zero removals—indicates no outliers, duplicates, or invalid records were filtered during preprocessing
- Data Retention Rate: 100% retention suggests either pristine data quality or absence of aggressive cleaning protocols
- Train/Test Split: Not specified—preprocessing documentation lacks clarity on whether data was partitioned for validation purposes
- Transformations Applied: No explicit transformations documented, raising questions about handling of missing values visible in cohort_summary_table (7.7%–92.3% NA rates across retention metrics)
Interpretation
The perfect retention rate contrasts with observable data gaps in the cohort summary, where later cohorts show increasing missing values at t6 and t12 retention periods. This pattern reflects the natural limitation of the observation window (December 2009–December 2010) rather than data quality issues—later cohorts simply lack sufficient follow-
Executive Summary
Executive summary with key findings and recommendations
| Finding | Value |
|---|---|
| Total Customers Analyzed | 4,314 |
| Number of Cohorts | 13 |
| Overall Retention (Period 1) | 23.0% |
| First-Period Churn Rate | 77.0% |
| Median Customer Lifetime | 4.0 periods |
Key Findings:
• First-period churn is 77.0% - CRITICAL: More than half of customers churn immediately. Prioritize onboarding improvements.
• Period-1 retention is 23.0%
• Median customer lifetime is 4.0 periods - use this for LTV forecasting and retention economics.
• The retention heatmap reveals cohort quality trends and lifecycle churn patterns
• Survival analysis provides statistical lifetime estimates with proper censoring handling
Recommendations:
1. Deploy retention interventions targeting the first period (highest churn window)
2. Investigate low-retention cohorts for acquisition quality or seasonality issues
3. A/B test onboarding improvements to reduce first-period churn
4. Calculate cohort ROI by combining these retention patterns with acquisition costs
5. Track retention trends monthly to detect early signs of retention degradation
EXECUTIVE SUMMARY
Purpose
This analysis examines customer retention patterns across 4,314 customers organized into 13 cohorts spanning December 2009 through December 2010. The objective is to understand customer lifecycle behavior, identify retention risk periods, and quantify the business impact of churn on customer lifetime value and revenue sustainability.
Key Findings
- First-Period Churn Rate: 77.0% - Nearly four of five customers churn immediately after acquisition, representing the most critical retention failure point
- Period-1 Retention: 23.0% - Only one in four customers survive the first period, indicating severe onboarding or product-market fit issues
- Median Customer Lifetime: 4 periods - Customers who survive initial churn persist for approximately 4 measurement periods before attrition
- Survival Trajectory: Retention stabilizes at ~25-28% at months 3 and 6, then declines to 8% by month 11, showing a secondary churn wave
- Revenue Per Customer: $2,047 average despite high churn, suggesting strong monetization of retained customers
Interpretation
The data reveals a "leaky bucket" acquisition model where the business acquires customers at scale but loses the majority immediately. The 77% first-period churn is catastrophic—it suggests either mis
Cohort Summary
Overview of cohort sizes and key retention metrics
Purpose
This section establishes the foundational cohort structure for the retention analysis by segmenting 4,314 customers into 13 acquisition cohorts spanning December 2009 through December 2010. Understanding cohort composition is essential because retention patterns and revenue performance are meaningfully analyzed only when customers are grouped by acquisition timing, allowing fair comparison of behavior across different customer generations.
Key Findings
- Total Customer Base: 4,314 customers distributed across 13 monthly cohorts
- Cohort Size Range: 46 to 955 customers (mean: 332, median: 294), with the December 2009 cohort representing the largest acquisition group
- Acquisition Concentration: Early cohorts (Dec 2009–Apr 2010) contain substantially larger customer volumes than later cohorts (Aug–Dec 2010), suggesting declining acquisition velocity over the year
- Sample Size Concern: The December 2010 cohort contains only 46 customers, which may produce unstable retention estimates due to insufficient sample size
Interpretation
The cohort distribution reveals uneven acquisition patterns, with the earliest cohort (Dec 2009) capturing 22% of all customers. This imbalance means retention metrics for later cohorts will have wider confidence intervals and less statistical reliability. The declining cohort sizes toward year-end suggest either seasonal acquisition patterns or
Cohort Retention Heatmap
Retention rate heatmap showing percentage of each cohort still active at each lifecycle period
Purpose
This heatmap visualizes customer retention patterns across 13 cohorts over 12 lifecycle periods, revealing both universal churn triggers and cohort-specific quality differences. It serves as the diagnostic foundation for understanding whether retention problems stem from product/service issues (vertical patterns) or acquisition quality (horizontal patterns).
Key Findings
- First-Period Churn Rate: 77.0% — Nearly four of five customers never return after initial activity, indicating a severe onboarding or product-market fit issue at the earliest stage
- Overall Retention (t=1): 23.0% — Only one-quarter of customers remain active by period 1, well below industry benchmarks for most sectors
- Cohort Variability: Retention rates range from 6.67% to 100%, with mean 34.61% and SD 28.08%, showing inconsistent performance across cohorts
- Diagonal Decline Pattern: Retention naturally erodes over time; the 2009-12-01 cohort shows steeper early decline (35.29% at t=1) but stabilizes around 33-42% mid-lifecycle
Interpretation
The dominant vertical pattern—the catastrophic drop from 100% to 23% between periods 0 and 1—suggests a universal structural problem rather than cohort-specific acquisition issues.
Retention Curves
Retention decay curves by cohort showing how each cohort's retention declines over time
Purpose
Retention curves visualize how each of the 13 cohorts lose customers over their lifetime, starting at 100% and decaying through 12 periods. This section identifies which cohorts retain customers better than others, revealing differences in acquisition quality, timing, or product conditions that may explain overall retention performance (23–28.5% across key milestones).
Key Findings
- Earliest Cohort (Dec 2009) Performance: Retention rate of 35.29% at t=1, rising to 42.51% at t=3, then stabilizing around 38–24.82% through t=12—indicating stronger early stickiness than later cohorts.
- Later Cohorts (Oct–Dec 2010) Decline: Retention drops sharply; Oct 2010 cohort shows only 25.73% at t=1 and 9.28% at t=2, suggesting deteriorating customer quality or product fit.
- Diverging Curve Pattern: Cohorts do not follow parallel paths; early cohorts outperform later ones by 10–25 percentage points, indicating systematic differences across the observation period.
- Censoring Effect: Later cohorts (Nov–Dec 2010) have incomplete lifetime data, limiting full retention trajectory visibility.
Interpretation
The divergence in retention
Survival Analysis
Kaplan-Meier survival curve showing probability of remaining active over time
Purpose
This section applies Kaplan-Meier survival analysis to estimate customer lifetime while accounting for censoring (recent customers still active). Unlike cohort retention rates, survival analysis provides an unbiased estimate of true customer persistence patterns. The median customer lifetime of 4 periods serves as a critical anchor for forecasting customer lifetime value and retention economics across the 4,314-customer base.
Key Findings
- Median Customer Lifetime: 4 periods – Half of all customers remain active through period 4; half churn before this point
- Initial Survival Probability: 64% at time 0 – Reflects the 77% first-period churn rate, with 36% of customers already churned by period 1
- Steep Early Decline: Survival drops from 64% to 48% by period 4, indicating concentrated churn in early periods
- Long-Tail Attrition: By period 11, only 8% of customers remain active, showing rapid customer loss acceleration over time
Interpretation
The survival curve demonstrates that customer retention follows a classic "leaky bucket" pattern: aggressive early-stage churn (1,547 events at t=0) followed by sustained but slower attrition. The 4-period median lifetime aligns with the overall 26.7% retention at t=6
Churn Analysis
Churn rate by lifecycle period showing when in the customer lifecycle the biggest drop-offs occur
Purpose
This section identifies the critical lifecycle window where customer drop-off is most severe. Understanding churn timing reveals whether attrition is concentrated at onboarding (early friction) or distributed across the lifecycle, which fundamentally shapes retention strategy and resource allocation.
Key Findings
- First Period Churn Rate (t0→t1): 77.0% — Nearly 3,286 of 4,268 customers never return after initial activity, representing the single largest attrition event
- Stabilization Pattern: Churn rates drop dramatically after period 1, averaging 1.4% in periods 2–11, indicating survivors are substantially more committed
- Late-Stage Spike: Period 12 shows 24.7% churn, suggesting a secondary vulnerability point (possibly subscription renewal or contract expiration)
Interpretation
The data reveals a "leaky bucket" problem concentrated at entry. The 77% first-period churn dominates the overall 24.8% 12-month retention rate, meaning most customer loss occurs before meaningful engagement. Periods 2–11 show near-zero churn, indicating that customers who survive the critical first transition become highly stable. This pattern suggests onboarding friction or misaligned expectations drive initial attrition, not product quality issues.
Context
Negative churn rates in periods 2–
Revenue by Cohort
Revenue contribution by cohort over time showing lifetime value patterns
Purpose
This section measures economic retention by tracking cumulative revenue contribution across cohorts over their lifetime. While headcount retention shows customer survival rates, revenue-weighted retention reveals whether high-value customers retain better or worse than the average, directly impacting lifetime value (LTV) and acquisition ROI.
Key Findings
- Average Revenue per Customer: $2,047.29 — establishes the baseline economic value per acquired customer across all cohorts
- Revenue Distribution Skew (1.1): Highly right-skewed, indicating a small number of cohorts or periods generate disproportionately high revenue; the December 2009 cohort alone generated $4.93M cumulative revenue
- Period Revenue Range: $4,642–$686,654 per period, with median of $48,562 — shows extreme variability in period-level revenue generation
- Cohort Concentration: The earliest cohort (2009-12-01) accounts for ~40% of total tracked revenue, suggesting early customers are significantly higher-value
Interpretation
The data reveals a highly concentrated revenue model where early cohorts drive disproportionate lifetime value. The 2009-12-01 cohort's $5.2M revenue per customer (vs. $2,047 average) indicates acquisition timing or cohort quality dramatically affects economic
Overall Metrics
Platform-wide retention KPIs and summary statistics
| metric | value |
|---|---|
| Total Customers | 4314 |
| Number of Cohorts | 13 |
| Overall Retention (t=1) | 23.0% |
| Overall Retention (t=3) | 28.5% |
| Overall Retention (t=6) | 26.7% |
| Overall Retention (t=12) | 24.8% |
| First Period Churn Rate | 77.0% |
| Median Customer Lifetime | 4.0 periods |
| Average Revenue per Customer | $2047.29 |
Purpose
This section aggregates retention performance across all 13 cohorts into platform-wide KPIs, providing a single-number summary of customer retention health. These metrics serve as executive-level indicators for tracking retention trends over time and detecting early signals of improvement or decline in customer loyalty.
Key Findings
- First Period Churn Rate: 77.0% - Nearly 3 in 4 customers churn immediately after acquisition, indicating severe early-stage retention challenges
- Overall Retention (t=1): 23.0% - Only about 1 in 4 customers remain active after the first period
- Overall Retention (t=3): 28.5% - Retention stabilizes slightly by period 3, suggesting survivors represent a more committed segment
- Median Customer Lifetime: 4 periods - Half of all customers remain active for 4 periods or fewer
- Average Revenue per Customer: $2,047.29 - Despite high churn, retained customers generate substantial lifetime value
Interpretation
The data reveals a classic "leaky bucket" pattern: aggressive early-stage churn (77%) filters out low-commitment customers, but those who survive the first period show modest stabilization. The gap between t=1 (23%) and t=3 (28.5%) retention suggests a small cohort of engaged