Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. GIGO (Garbage In, Garbage Out)

GIGO (Garbage In, Garbage Out)

Poor-quality input data inevitably produces poor-quality model outputs.

Year: 1957Generality: 794
Back to Vocab

GIGO — short for "Garbage In, Garbage Out" — is a foundational principle stating that the quality of a system's output is fundamentally bounded by the quality of its input. In machine learning, this means that no matter how sophisticated an algorithm is, it cannot compensate for training data that is noisy, mislabeled, incomplete, or systematically biased. The model will faithfully learn whatever patterns exist in the data it receives — including the flawed ones — and reproduce those flaws in its predictions.

The mechanism behind GIGO is straightforward: supervised learning models optimize their parameters to fit the statistical structure of training data. If that data contains label errors, the model learns incorrect associations. If it reflects historical biases — such as underrepresentation of certain demographic groups — the model encodes those biases into its decision boundaries. Even subtle issues like measurement noise, inconsistent labeling conventions, or data collected under non-representative conditions can degrade generalization performance in ways that are difficult to diagnose after the fact.

This principle drives the field of data-centric AI, which argues that improving dataset quality often yields larger performance gains than architectural innovations. Practitioners invest heavily in data pipelines that include deduplication, outlier detection, class balance correction, and human-in-the-loop annotation review. Techniques like data augmentation, curriculum learning, and robust loss functions can partially mitigate the effects of imperfect data, but they are no substitute for high-quality ground truth.

GIGO is especially consequential in high-stakes domains such as medical diagnosis, criminal justice, and financial lending, where biased or erroneous training data can produce models that cause real-world harm at scale. Regulatory frameworks increasingly require organizations to document data provenance and quality assurance processes precisely because of this principle. Understanding GIGO is not merely a technical concern — it is an ethical imperative that shapes how responsible AI systems are designed, audited, and deployed.

Related

Related

Information Gap
Information Gap

The shortfall between information available and information needed for accurate decisions.

Generality: 626
Noise
Noise

Unwanted variation in data or signals that degrades machine learning model performance.

Generality: 794
Ground Truth
Ground Truth

Verified reference data used to train and evaluate machine learning models.

Generality: 838
Training Data
Training Data

The labeled examples used to teach a machine learning model.

Generality: 920
Bias
Bias

Systematic errors in data or algorithms that produce unfair or skewed outcomes.

Generality: 854
In-Group Bias
In-Group Bias

AI systems unfairly favoring certain demographic groups due to biased training data.

Generality: 520