League of Legends Retention Factor Study

Riot games public data set

League of Legends Survival Analysis

Applied professional retention-factor analysis to publicly available League of Legends match data to study which role, session-length, and performance patterns were associated with observed match survival.

Data sourceLeague of Legends Summoners and Match Data

Challenge

Public match-history data is noisy, but it can be structured into player-level visibility signals that approximate retention questions. The challenge was to use professional statistical methods without overstating what public observational data can prove.

Approach

Built a player-level survival frame from public match-history payloads, converted activity into observed matches survived, segmented by inferred role, session length, and win-rate behavior, then compared nonparametric curves, AFT models, and Cox hazard ratios.

Impact

The study showed how professional retention analytics can turn public gameplay telemetry into product-style hypotheses about engagement, session design, and risk factors. It also showed where the analysis must stay careful: last-seen-in-sample is a bounded-sample event signal, not verified churn.

Data source: publicly available League of Legends Summoners and Match Data on Kaggle. The last-seen-in-sample rate is the share of sampled players whose final observed match falls before the cutoff window, so it is useful for survival modeling but should not be read as verified account churn. This independent professional case-study experiment is not affiliated with Riot Games. View source

table

Dataset Frame

The participant sample keeps one row per player per match before aggregation. It gives the study a transparent bridge from raw match history to player-level retention factors.

PUUID	Timestamp	Inferred role	Champion	Kills	Deaths	Assists	Win	Match minutes
j8Pl_w	2023-07-17 22:22 UTC	Top Lane	Zac	6	1	4	Win	23.617
nGGGrA	2023-07-17 22:22 UTC	Jungler	Viego	9	1	2	Win	23.617
2gGqqw	2023-07-17 22:22 UTC	Mid Lane	Syndra	2	3	5	Win	23.617
3hsTXA	2023-07-17 22:22 UTC	Bot Lane	Aphelios	4	3	0	Win	23.617
6Rvcxg	2023-07-17 22:22 UTC	Support	Milio	0	4	5	Win	23.617
...	...	...	...	...	...	...	...	...
MW3U3Q	2023-06-10 15:49 UTC	Jungler	Maokai	8	9	15	Loss	31.5

PUUID values are shortened for readability.

table

Role Analysis

Role is based on Riot teamPosition, which is a server-inferred played position. The table compares role mix, last-seen-in-sample rates, average matches, session length, KDA, and win rate.

Role	Players	Sample share	Last-seen-in-sample	Avg matches	Avg session min	Avg KDA	Avg win rate	Read
Jungler	3,744	18.8%	83.1%	1.28	27.55	4.07	49.4%	Above average observed survival; lower last-seen rate
Bot Lane	3,849	19.3%	82.7%	1.19	27.48	3.46	49.7%	Below average observed survival; lower last-seen rate
Support	3,809	19.1%	82.5%	1.19	27.52	3.55	49.8%	Below average observed survival; lower last-seen rate
Mid Lane	3,784	19.0%	82.6%	1.29	27.61	3.28	49.7%	Above average observed survival; lower last-seen rate
Top Lane	3,803	19.1%	82.5%	1.21	27.66	2.70	49.7%	Below average observed survival; lower last-seen rate
No Role Inferred	946	4.7%	99.6%	1.37	15.92	2.50	50.2%	Higher last-seen rate; interpret separately

table

Model Families

The study uses nonparametric curves to explain the shape first, then compares compact parametric and adjusted models.

Model	Use	Watchout
Kaplan-Meier	Clear nonparametric survival curve by segment.	No covariate adjustment.
Nelson-Aalen	Shows accumulated hazard pressure over match count.	Less intuitive than survival probability.
Exponential / Weibull	Tests whether a compact baseline distribution explains the pattern.	Can be too rigid if behavior spikes or plateaus.
AFT	Frames coefficients as time ratios that stretch or shrink observed survival.	Sensitive to distribution choice and leakage.
Cox PH	Main adjusted hazard-ratio workhorse.	Associational here; proportional hazards must be considered.

chart

Core Survival Functions

Survival curves compare visibility beyond each observed match. The paired panels keep inferred role and average session length in the same visual read.

chart

Cumulative Hazard

Cumulative hazard shows accumulated last-seen pressure over match count. It is useful when retention risk builds unevenly across early matches.

chart

Discrete Hazard

The discrete hazard view focuses on conditional last-seen pressure at each observed match count. It makes early-match drop-off patterns easier to inspect.

table

Session-Length Segments

Average match duration is the strongest product-design read. The last-seen-in-sample rate shows where players stopped appearing inside this bounded sample, not whether they truly churned.

Session bucket	Players	Last-seen-in-sample	Avg matches	Avg match min	Avg kills	Avg deaths	Avg assists	Win rate
Short	6,538	87.2%	1.071	18.952	3.882	3.936	4.366	49.9%
Standard	7,606	81.6%	1.513	27.520	5.588	5.668	7.456	49.2%
Long	5,791	81.7%	1.069	35.441	7.226	7.355	10.598	50.0%

table

Win-Rate Survival

Win-rate segments isolate performance context while keeping the retention read focused on observed match survival.

Segment	Players	Avg matches	Event rate	Avg kills	Avg deaths	Avg assists	Avg win rate
Low Win Rate	9,852	1.025	83.8%	4.274	6.769	5.602	0.1%
Mid Win Rate	385	11.914	73.5%	5.669	5.551	7.775	51.2%
High Win Rate	9,698	1.033	83.5%	6.748	4.394	9.120	99.9%

chart

Performance Survival and Event Rate

The performance charts compare survival curves and last-seen event rates by win-rate segment. They help separate match outcomes from observed visibility.

chart

Last-Seen Event Rate

The event-rate chart summarizes how often each win-rate segment ends as a last-seen-in-sample signal. That makes it useful for retention-factor modeling, not as a confirmed churn rate.

League retention study event-rate chart by win-rate segment.

table

Fit Diagnostics

Model comparison keeps the adjusted analysis honest. Lower AIC and BIC are relative fit signals within this modeling setup.

Model	Family	AIC	BIC	Concordance	Read
Loglogistic AFT	AFT	-11,644.121	-11,650.321	0.550	Best relative fit
Lognormal AFT	AFT	14,263.592	14,257.392	0.545	Next AFT candidate
Weibull AFT	AFT	45,702.804	45,696.604	0.526	Weaker fit
Weibull baseline	Baseline	45,931.062	45,946.863		Changing hazard baseline
Exponential baseline	Baseline	46,427.718	46,435.618		Constant hazard baseline

chart

Model Comparison Chart

The model comparison chart visualizes relative AIC differences and supports the choice to inspect AFT-style time ratios.

table

AFT Time Ratios

AFT coefficients are read as time ratios. Values above one stretch observed survival; values below one shrink it.

Covariate	Time ratio	P value	Direction
KDA	0.863	1.9e-48	Time shrinking
Avg Deaths	0.903	1.03e-19	Time shrinking
Avg Assists	1.083	6.51e-12	Time extending
Avg Kills	1.049	1.65e-06	Time extending
Mid Lane	1.104	6.89e-06	Time extending
Jungler	1.081	0.000419	Time extending
Avg Game Minutes	1.027	0.0218	Time extending

table

Cox Hazard Ratios

Cox coefficients are read as hazard ratios. Above one is risk-increasing. Below one is survival-favoring.

Term	Hazard ratio	HR low	HR high	P value	Direction
No Role Inferred	1.503	1.392	1.624	3.75e-25	Risk increasing
KDA	1.019	0.996	1.042	0.100	Weak risk-increasing
Avg Deaths	1.007	0.984	1.031	0.547	Weak risk-increasing
Jungler	1.003	0.956	1.051	0.914	Weak risk-increasing
Avg Game Minutes	0.979	0.957	1.002	0.0707	Weak survival-favoring

Weak terms are kept visible so the table does not overstate precision.

chart

Primary Cox Hazard Ratios

The forest plot gives a compact visual read of adjusted hazard ratios and confidence intervals.

League retention study primary Cox hazard-ratio forest plot.

chart

Calendar-Time Sensitivity

The sensitivity model switches from match count to calendar days. It checks whether the main retention-factor read survives a different time axis.

Product read

How product teams could use the findings

Findings

Role differences appear, but public data alone is not enough for role-specific product decisions.

Session length is the stronger read. Shorter matches did not automatically show better observed survival.

Performance patterns need first-party context before they can guide product changes.

Potential actions

Use the patterns to form hypotheses around onboarding, role education, matchmaking quality, and post-match recovery.

Test whether harder-to-learn roles need clearer guidance or better next-match prompts.

Study pacing and frustration before assuming shorter sessions improve retention.

More useful platform data

First-party login, queue, mode, party, rank, matchmaking, and progression data would separate true retention from public visibility.

Tutorial, mastery, toxicity, reward, mission, and post-match telemetry would make the model more actionable.

Experiment data would be needed before making causal product recommendations.

Want the deeper story?

Let's talk through the work.

Case Studies Download Resume Contact Dustin