Methodology

This page explains how we collected, processed, and analyzed data for the History's Attention Gap research project. Our goal was to compare historical importance (as measured by academics) with current internet attention (as measured by Wikipedia pageviews).

Data Sources

1. MIT Pantheon (Historical Importance)

Pantheon is a project from MIT's Collective Learning group that ranks historical figures by their global cultural impact. We used their public API to fetch the top 1,000 figures by Historical Popularity Index (HPI).

The HPI score considers:

Number of Wikipedia language editions with an article
Average article length across editions
Historical view counts (with recency weighting)
Age of the figure (older = harder to maintain relevance)

This gives us a peer-reviewed, academically-grounded measure of "who matters" across human history.

2. Wikipedia Pageviews API (Current Attention)

We used the Wikimedia REST API to fetch monthly pageview counts for the English Wikipedia article of each figure.

2024 data: Full year (January–December 2024)
2025 data: January–November 2025, annualized to 365 days

We use English Wikipedia specifically because it's the largest edition and most reflective of global internet attention.

Metrics

Attention Score

Our core metric measures whether a figure is getting more or less attention than their historical importance would predict.

Attention Score = log₁₀(actual views) − log₁₀(expected views)

Where expected views are calculated from HPI rank using a power law model.

We calculate expected views using:

Expected = median_views × (100 / hpi_rank)^0.3

The 0.3 exponent was fitted to the observed rank-views relationship in our dataset.

Interpreting the score:

Score	Meaning
`+1.0`	10× more views than expected
`+0.5`	~3× more views than expected
`0`	Attention matches importance
`-0.5`	~3× fewer views than expected
`-1.0`	10× fewer views than expected

Year-over-Year Change (YoY)

Compares annualized 2025 pageviews to actual 2024 pageviews.

YoY = ((annualized_2025 − actual_2024) / actual_2024) × 100

Positive = more attention in 2025. Negative = less attention.

The annualization for 2025 (which only has 11 months of data) is:

Annualized 2025 = total_jan_nov × (365 / 334)

334 = days from Jan 1 to Nov 30.

Momentum (Q1 → Q3)

Measures the trend within 2025 using complete quarters only (to avoid partial-quarter bias).

Momentum = ((Q3_views − Q1_views) / Q1_views) × 100

Q1 = Jan+Feb+Mar. Q3 = Jul+Aug+Sep.

This identifies:

Rising figures: Gaining attention through the year
Falling figures: Early spike (like a death or movie release) followed by decline

Labels

We assign descriptive labels based on combinations of attention score, momentum, and YoY change:

Label	Criteria
Breakout Star	High attention + YoY > 200%
Viral	High attention + fast rising momentum
Trending	High attention + rising momentum
Famous	High attention, stable
Rediscovered	Low attention + YoY surge
Forgotten	Very low attention
Fading	Low attention + falling momentum

Data Processing

Wikipedia Slug Matching

Pantheon provides Wikipedia slugs for each figure, which we use directly to query the Pageviews API. This avoids ambiguity issues (e.g., "Francis" could refer to multiple people).

Handling Missing Data

Figures with no Wikipedia data are excluded from analysis
We require at least 6 months of pageview data for a figure to be included
1,000 figures have complete 2024 and 2025 data

⚠️ Limitations

English Wikipedia only: Figures more prominent in non-English contexts may be underrepresented
Recency bias: Very recent figures (like Pope Leo XIV) may have inflated importance due to their novelty
2025 is incomplete: December data is not yet available; patterns may shift
Power law exponent: The 0.3 exponent in expected views is empirically fitted but not formally validated
Wikipedia ≠ true attention: Pageviews measure one platform; they don't capture books, podcasts, classroom discussion, etc.

Download the Data

All 1,000 figures with their metrics, labels, and monthly pageview data.

Open Explorer (with CSV export) →

Source Code

The data collection and analysis scripts are available on request. The main components are:

fetch_data.py – Pulls top 1,000 from Pantheon API
fetch_wiki_views.py – Fetches 2024 Wikipedia pageviews
fetch_wiki_2025.py – Fetches 2025 pageviews and calculates YoY
generate_figure_pages.py – Generates individual figure pages

Database is SQLite. Analysis done in Python.

Citation

If you use this research, please cite:

History's Not Boring. (2025). History's Attention Gap:
Who the Internet Ignores.
https://kidopoly.com/history/research/attention-gap/

Data sources:

Yu, A. Z., et al. (2016). Pantheon 1.0: A Manually Verified Dataset of Globally Famous Biographies. Scientific Data.
Wikimedia Foundation. (2025). Pageviews API. wikimedia.org/api/rest_v1