Methodology

This page explains how we collected, processed, and analyzed data for the History's Attention Gap research project. Our goal was to compare historical importance (as measured by academics) with current internet attention (as measured by Wikipedia pageviews).

Data Sources

1. MIT Pantheon (Historical Importance)

Pantheon is a project from MIT's Collective Learning group that ranks historical figures by their global cultural impact. We used their public API to fetch the top 1,000 figures by Historical Popularity Index (HPI).

The HPI score considers:

This gives us a peer-reviewed, academically-grounded measure of "who matters" across human history.

2. Wikipedia Pageviews API (Current Attention)

We used the Wikimedia REST API to fetch monthly pageview counts for the English Wikipedia article of each figure.

We use English Wikipedia specifically because it's the largest edition and most reflective of global internet attention.

Metrics

Attention Score

Our core metric measures whether a figure is getting more or less attention than their historical importance would predict.

Attention Score = log₁₀(actual views) − log₁₀(expected views)

Where expected views are calculated from HPI rank using a power law model.

We calculate expected views using:

Expected = median_views × (100 / hpi_rank)^0.3

The 0.3 exponent was fitted to the observed rank-views relationship in our dataset.

Interpreting the score:

Score Meaning
+1.0 10× more views than expected
+0.5 ~3× more views than expected
0 Attention matches importance
-0.5 ~3× fewer views than expected
-1.0 10× fewer views than expected

Year-over-Year Change (YoY)

Compares annualized 2025 pageviews to actual 2024 pageviews.

YoY = ((annualized_2025 − actual_2024) / actual_2024) × 100

Positive = more attention in 2025. Negative = less attention.

The annualization for 2025 (which only has 11 months of data) is:

Annualized 2025 = total_jan_nov × (365 / 334)

334 = days from Jan 1 to Nov 30.

Momentum (Q1 → Q3)

Measures the trend within 2025 using complete quarters only (to avoid partial-quarter bias).

Momentum = ((Q3_views − Q1_views) / Q1_views) × 100

Q1 = Jan+Feb+Mar. Q3 = Jul+Aug+Sep.

This identifies:

Labels

We assign descriptive labels based on combinations of attention score, momentum, and YoY change:

Label Criteria
Breakout Star High attention + YoY > 200%
Viral High attention + fast rising momentum
Trending High attention + rising momentum
Famous High attention, stable
Rediscovered Low attention + YoY surge
Forgotten Very low attention
Fading Low attention + falling momentum

Data Processing

Wikipedia Slug Matching

Pantheon provides Wikipedia slugs for each figure, which we use directly to query the Pageviews API. This avoids ambiguity issues (e.g., "Francis" could refer to multiple people).

Handling Missing Data

⚠️ Limitations

Download the Data

All 1,000 figures with their metrics, labels, and monthly pageview data.

Open Explorer (with CSV export) →

Source Code

The data collection and analysis scripts are available on request. The main components are:

Database is SQLite. Analysis done in Python.

Citation

If you use this research, please cite:

History's Not Boring. (2025). History's Attention Gap:
Who the Internet Ignores.
https://kidopoly.com/history/research/attention-gap/

Data sources: