Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. ERR (Expected Reciprocal Rank)

ERR (Expected Reciprocal Rank)

A probabilistic ranking metric that accounts for varying document relevance levels across positions.

Year: 2010Generality: 383
Back to Vocab

Expected Reciprocal Rank (ERR) is an evaluation metric used in information retrieval and search system assessment that measures the quality of a ranked list of documents by computing the expected reciprocal rank of the first relevant result a user encounters. Unlike simpler metrics, ERR models user behavior probabilistically: it assumes a user scans results from top to bottom and may stop at any point upon finding a sufficiently relevant document. The probability of stopping at a given rank depends on the relevance grades of all documents ranked above it, making ERR sensitive to both the position and the degree of relevance of each result.

The metric was introduced by Chapelle, Metzler, Zhang, and Grinspan in 2009 and quickly became influential in the information retrieval community. Its key innovation over predecessors like Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG) is its explicit cascade model of user behavior. In this cascade model, a user's probability of examining a document at rank r is the product of the probabilities of not being satisfied by any of the documents ranked above it. This makes ERR particularly well-suited for graded relevance judgments, where documents are not simply relevant or irrelevant but exist on a spectrum of usefulness.

ERR matters in machine learning contexts primarily because modern search engines and recommendation systems are trained and evaluated using offline metrics before deployment. Choosing the right metric directly shapes what a learned ranking model optimizes for. ERR's cascade assumption aligns more closely with observed user behavior in web search than position-blind metrics, making it a more faithful proxy for real-world user satisfaction. It is commonly used in learning-to-rank research and competitions such as those hosted by major search companies.

Despite its strengths, ERR has limitations: it is less interpretable than simpler metrics and can be sensitive to the specific relevance scale used. Nonetheless, it remains a standard tool in the evaluation toolkit for ranking systems, particularly when fine-grained relevance distinctions and realistic user models are priorities.

Related

Related

Reranking
Reranking

Reordering an initial set of retrieved results using a more sophisticated secondary model.

Generality: 580
RMSE (Root Mean Squared Error)
RMSE (Root Mean Squared Error)

A regression metric that penalizes large prediction errors by squaring residuals before averaging.

Generality: 796
IR (Information Retrieval)
IR (Information Retrieval)

Finding and ranking relevant documents from large collections in response to user queries.

Generality: 838
Prediction Error
Prediction Error

The gap between a model's predicted values and the actual observed outcomes.

Generality: 875
Rank Fusion
Rank Fusion

Combining multiple ranked lists into a single, more accurate aggregated ranking.

Generality: 527
Precision-Recall Curve
Precision-Recall Curve

A plot evaluating classifier performance by trading off precision against recall across thresholds.

Generality: 729