Methodology
This page explains how we collected, processed, and analyzed data for the History's Attention Gap research project. Our goal was to compare historical importance (as measured by academics) with current internet attention (as measured by Wikipedia pageviews).
Data Sources
1. MIT Pantheon (Historical Importance)
Pantheon is a project from MIT's Collective Learning group that ranks historical figures by their global cultural impact. We used their public API to fetch the top 1,000 figures by Historical Popularity Index (HPI).
The HPI score considers:
- Number of Wikipedia language editions with an article
- Average article length across editions
- Historical view counts (with recency weighting)
- Age of the figure (older = harder to maintain relevance)
This gives us a peer-reviewed, academically-grounded measure of "who matters" across human history.
2. Wikipedia Pageviews API (Current Attention)
We used the Wikimedia REST API to fetch monthly pageview counts for the English Wikipedia article of each figure.
- 2024 data: Full year (January–December 2024)
- 2025 data: January–November 2025, annualized to 365 days
We use English Wikipedia specifically because it's the largest edition and most reflective of global internet attention.
Metrics
Attention Score
Our core metric measures whether a figure is getting more or less attention than their historical importance would predict.
Where expected views are calculated from HPI rank using a power law model.
We calculate expected views using:
The 0.3 exponent was fitted to the observed rank-views relationship in our dataset.
Interpreting the score:
| Score | Meaning |
|---|---|
+1.0 |
10× more views than expected |
+0.5 |
~3× more views than expected |
0 |
Attention matches importance |
-0.5 |
~3× fewer views than expected |
-1.0 |
10× fewer views than expected |
Year-over-Year Change (YoY)
Compares annualized 2025 pageviews to actual 2024 pageviews.
Positive = more attention in 2025. Negative = less attention.
The annualization for 2025 (which only has 11 months of data) is:
334 = days from Jan 1 to Nov 30.
Momentum (Q1 → Q3)
Measures the trend within 2025 using complete quarters only (to avoid partial-quarter bias).
Q1 = Jan+Feb+Mar. Q3 = Jul+Aug+Sep.
This identifies:
- Rising figures: Gaining attention through the year
- Falling figures: Early spike (like a death or movie release) followed by decline
Labels
We assign descriptive labels based on combinations of attention score, momentum, and YoY change:
| Label | Criteria |
|---|---|
| Breakout Star | High attention + YoY > 200% |
| Viral | High attention + fast rising momentum |
| Trending | High attention + rising momentum |
| Famous | High attention, stable |
| Rediscovered | Low attention + YoY surge |
| Forgotten | Very low attention |
| Fading | Low attention + falling momentum |
Data Processing
Wikipedia Slug Matching
Pantheon provides Wikipedia slugs for each figure, which we use directly to query the Pageviews API. This avoids ambiguity issues (e.g., "Francis" could refer to multiple people).
Handling Missing Data
- Figures with no Wikipedia data are excluded from analysis
- We require at least 6 months of pageview data for a figure to be included
- 1,000 figures have complete 2024 and 2025 data
⚠️ Limitations
- English Wikipedia only: Figures more prominent in non-English contexts may be underrepresented
- Recency bias: Very recent figures (like Pope Leo XIV) may have inflated importance due to their novelty
- 2025 is incomplete: December data is not yet available; patterns may shift
- Power law exponent: The 0.3 exponent in expected views is empirically fitted but not formally validated
- Wikipedia ≠ true attention: Pageviews measure one platform; they don't capture books, podcasts, classroom discussion, etc.
Download the Data
All 1,000 figures with their metrics, labels, and monthly pageview data.
Open Explorer (with CSV export) →Source Code
The data collection and analysis scripts are available on request. The main components are:
fetch_data.py– Pulls top 1,000 from Pantheon APIfetch_wiki_views.py– Fetches 2024 Wikipedia pageviewsfetch_wiki_2025.py– Fetches 2025 pageviews and calculates YoYgenerate_figure_pages.py– Generates individual figure pages
Database is SQLite. Analysis done in Python.
Citation
If you use this research, please cite:
History's Not Boring. (2025). History's Attention Gap:
Who the Internet Ignores.
https://kidopoly.com/history/research/attention-gap/
Data sources:
- Yu, A. Z., et al. (2016). Pantheon 1.0: A Manually Verified Dataset of Globally Famous Biographies. Scientific Data.
- Wikimedia Foundation. (2025). Pageviews API. wikimedia.org/api/rest_v1