Back to case studies

League of Legends Retention Factor Study

Riot games public data setLeague of Legends Survival Analysis

Applied professional retention-factor analysis to publicly available League of Legends match data to study which role, session-length, and performance patterns were associated with observed match survival.

Data sourceLeague of Legends Summoners and Match Data

Challenge

Public match-history data is noisy, but it can be structured into player-level visibility signals that approximate retention questions. The challenge was to use professional statistical methods without overstating what public observational data can prove.

Approach

Built a player-level survival frame from public match-history payloads, converted activity into observed matches survived, segmented by inferred role, session length, and win-rate behavior, then compared nonparametric curves, AFT models, and Cox hazard ratios.

Impact

The study showed how professional retention analytics can turn public gameplay telemetry into product-style hypotheses about engagement, session design, and risk factors. It also showed where the analysis must stay careful: last-seen-in-sample is a bounded-sample event signal, not verified churn.

Data source: publicly available League of Legends Summoners and Match Data on Kaggle. The last-seen-in-sample rate is the share of sampled players whose final observed match falls before the cutoff window, so it is useful for survival modeling but should not be read as verified account churn. This independent professional case-study experiment is not affiliated with Riot Games. View source

table

Dataset Frame

The participant sample keeps one row per player per match before aggregation. It gives the study a transparent bridge from raw match history to player-level retention factors.

PUUIDTimestampInferred roleChampionKillsDeathsAssistsWinMatch minutes
j8Pl_w2023-07-17 22:22 UTCTop LaneZac614Win23.617
nGGGrA2023-07-17 22:22 UTCJunglerViego912Win23.617
2gGqqw2023-07-17 22:22 UTCMid LaneSyndra235Win23.617
3hsTXA2023-07-17 22:22 UTCBot LaneAphelios430Win23.617
6Rvcxg2023-07-17 22:22 UTCSupportMilio045Win23.617
...........................
MW3U3Q2023-06-10 15:49 UTCJunglerMaokai8915Loss31.5

PUUID values are shortened for readability.

table

Role Analysis

Role is based on Riot teamPosition, which is a server-inferred played position. The table compares role mix, last-seen-in-sample rates, average matches, session length, KDA, and win rate.

RolePlayersSample shareLast-seen-in-sampleAvg matchesAvg session minAvg KDAAvg win rateRead
Jungler3,74418.8%83.1%1.2827.554.0749.4%Above average observed survival; lower last-seen rate
Bot Lane3,84919.3%82.7%1.1927.483.4649.7%Below average observed survival; lower last-seen rate
Support3,80919.1%82.5%1.1927.523.5549.8%Below average observed survival; lower last-seen rate
Mid Lane3,78419.0%82.6%1.2927.613.2849.7%Above average observed survival; lower last-seen rate
Top Lane3,80319.1%82.5%1.2127.662.7049.7%Below average observed survival; lower last-seen rate
No Role Inferred9464.7%99.6%1.3715.922.5050.2%Higher last-seen rate; interpret separately

table

Model Families

The study uses nonparametric curves to explain the shape first, then compares compact parametric and adjusted models.

ModelUseWatchout
Kaplan-MeierClear nonparametric survival curve by segment.No covariate adjustment.
Nelson-AalenShows accumulated hazard pressure over match count.Less intuitive than survival probability.
Exponential / WeibullTests whether a compact baseline distribution explains the pattern.Can be too rigid if behavior spikes or plateaus.
AFTFrames coefficients as time ratios that stretch or shrink observed survival.Sensitive to distribution choice and leakage.
Cox PHMain adjusted hazard-ratio workhorse.Associational here; proportional hazards must be considered.

chart

Core Survival Functions

Survival curves compare visibility beyond each observed match. The paired panels keep inferred role and average session length in the same visual read.

League retention study survival function chart by inferred role and average session length.

chart

Cumulative Hazard

Cumulative hazard shows accumulated last-seen pressure over match count. It is useful when retention risk builds unevenly across early matches.

League retention study cumulative hazard chart by inferred role and average session length.

chart

Discrete Hazard

The discrete hazard view focuses on conditional last-seen pressure at each observed match count. It makes early-match drop-off patterns easier to inspect.

League retention study discrete hazard chart by inferred role and average session length.

table

Session-Length Segments

Average match duration is the strongest product-design read. The last-seen-in-sample rate shows where players stopped appearing inside this bounded sample, not whether they truly churned.

Session bucketPlayersLast-seen-in-sampleAvg matchesAvg match minAvg killsAvg deathsAvg assistsWin rate
Short6,53887.2%1.07118.9523.8823.9364.36649.9%
Standard7,60681.6%1.51327.5205.5885.6687.45649.2%
Long5,79181.7%1.06935.4417.2267.35510.59850.0%

table

Win-Rate Survival

Win-rate segments isolate performance context while keeping the retention read focused on observed match survival.

SegmentPlayersAvg matchesEvent rateAvg killsAvg deathsAvg assistsAvg win rate
Low Win Rate9,8521.02583.8%4.2746.7695.6020.1%
Mid Win Rate38511.91473.5%5.6695.5517.77551.2%
High Win Rate9,6981.03383.5%6.7484.3949.12099.9%

chart

Performance Survival and Event Rate

The performance charts compare survival curves and last-seen event rates by win-rate segment. They help separate match outcomes from observed visibility.

League retention study survival chart by win-rate performance segment.

chart

Last-Seen Event Rate

The event-rate chart summarizes how often each win-rate segment ends as a last-seen-in-sample signal. That makes it useful for retention-factor modeling, not as a confirmed churn rate.

League retention study event-rate chart by win-rate segment.

table

Fit Diagnostics

Model comparison keeps the adjusted analysis honest. Lower AIC and BIC are relative fit signals within this modeling setup.

ModelFamilyAICBICConcordanceRead
Loglogistic AFTAFT-11,644.121-11,650.3210.550Best relative fit
Lognormal AFTAFT14,263.59214,257.3920.545Next AFT candidate
Weibull AFTAFT45,702.80445,696.6040.526Weaker fit
Weibull baselineBaseline45,931.06245,946.863Changing hazard baseline
Exponential baselineBaseline46,427.71846,435.618Constant hazard baseline

chart

Model Comparison Chart

The model comparison chart visualizes relative AIC differences and supports the choice to inspect AFT-style time ratios.

League retention study model comparison chart sorted by AIC.

table

AFT Time Ratios

AFT coefficients are read as time ratios. Values above one stretch observed survival; values below one shrink it.

CovariateTime ratioP valueDirection
KDA0.8631.9e-48Time shrinking
Avg Deaths0.9031.03e-19Time shrinking
Avg Assists1.0836.51e-12Time extending
Avg Kills1.0491.65e-06Time extending
Mid Lane1.1046.89e-06Time extending
Jungler1.0810.000419Time extending
Avg Game Minutes1.0270.0218Time extending

table

Cox Hazard Ratios

Cox coefficients are read as hazard ratios. Above one is risk-increasing. Below one is survival-favoring.

TermHazard ratioHR lowHR highP valueDirection
No Role Inferred1.5031.3921.6243.75e-25Risk increasing
KDA1.0190.9961.0420.100Weak risk-increasing
Avg Deaths1.0070.9841.0310.547Weak risk-increasing
Jungler1.0030.9561.0510.914Weak risk-increasing
Avg Game Minutes0.9790.9571.0020.0707Weak survival-favoring

Weak terms are kept visible so the table does not overstate precision.

chart

Primary Cox Hazard Ratios

The forest plot gives a compact visual read of adjusted hazard ratios and confidence intervals.

League retention study primary Cox hazard-ratio forest plot.

chart

Calendar-Time Sensitivity

The sensitivity model switches from match count to calendar days. It checks whether the main retention-factor read survives a different time axis.

League retention study calendar-time sensitivity hazard-ratio forest plot.

Product read

How product teams could use the findings

Findings

Role differences appear, but public data alone is not enough for role-specific product decisions.

Session length is the stronger read. Shorter matches did not automatically show better observed survival.

Performance patterns need first-party context before they can guide product changes.

Potential actions

Use the patterns to form hypotheses around onboarding, role education, matchmaking quality, and post-match recovery.

Test whether harder-to-learn roles need clearer guidance or better next-match prompts.

Study pacing and frustration before assuming shorter sessions improve retention.

More useful platform data

First-party login, queue, mode, party, rank, matchmaking, and progression data would separate true retention from public visibility.

Tutorial, mastery, toxicity, reward, mission, and post-match telemetry would make the model more actionable.

Experiment data would be needed before making causal product recommendations.

Want the deeper story?

Let's talk through the work.