Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Landmarks

Landmarks

Specific reference points on objects that help AI systems interpret visual structure.

Year: 2012Generality: 384
Back to Vocab

In computer vision and machine learning, landmarks are a set of predefined, semantically meaningful points used to represent the structure or shape of an object within an image or video frame. Rather than processing raw pixel data in its entirety, models can operate on these compact coordinate representations to capture the spatial configuration of key features. In facial analysis, for example, landmarks typically denote the corners of the eyes, the tip of the nose, the edges of the lips, and the contour of the jawline — often numbering between 68 and 478 points depending on the annotation scheme. Similar frameworks exist for human body pose estimation, hand tracking, and medical imaging, where anatomical landmarks anchor downstream analysis.

Landmark detection is typically framed as a regression or heatmap prediction problem. A convolutional neural network takes an image as input and outputs either direct coordinate values for each landmark or a spatial heatmap indicating the probability of each point's location. Models are trained on large annotated datasets where human labelers have manually placed landmarks on thousands of images, providing the ground truth needed for supervised learning. Techniques like data augmentation, geometric transformations, and cascade regression have been developed to improve robustness to occlusion, lighting variation, and pose changes.

The practical importance of landmarks extends across many domains. In augmented reality, real-time facial landmark detection enables accurate overlay of virtual elements onto a user's face. In medical imaging, anatomical landmarks guide registration algorithms that align scans from different time points or imaging modalities. In human-computer interaction, body pose landmarks derived from frameworks like MediaPipe or OpenPose power gesture recognition and motion capture without specialized hardware. Landmarks thus serve as an efficient, interpretable intermediate representation that bridges raw visual input and higher-level semantic understanding.

The modern deep learning era significantly advanced landmark detection accuracy and speed, making real-time inference on mobile devices practical. Large-scale annotated datasets and benchmark challenges — such as those focused on facial alignment — drove rapid progress after 2012, cementing landmarks as a foundational tool in applied computer vision pipelines.

Related

Related

Object Detection
Object Detection

A computer vision task that identifies and localizes multiple objects within images.

Generality: 838
Image Recognition
Image Recognition

AI systems that identify and categorize objects, scenes, and content within images.

Generality: 871
Benchmark
Benchmark

A standardized test used to measure and compare AI model performance.

Generality: 796
Segmentation
Segmentation

Dividing images or data into meaningful regions to simplify analysis and recognition tasks.

Generality: 796
AlexNet
AlexNet

Landmark deep convolutional network that ignited the modern deep learning revolution in 2012.

Generality: 703
Semantic Segmentation
Semantic Segmentation

Classifying every pixel in an image into a meaningful object category.

Generality: 794