Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. LoRA (Low-Rank Adaptation)

LoRA (Low-Rank Adaptation)

A parameter-efficient method for fine-tuning large pre-trained models using low-rank matrices.

Year: 2021Generality: 398
Back to Vocab

LoRA (Low-Rank Adaptation) is a technique for fine-tuning large pre-trained models—particularly transformers—without modifying their original weights directly. Instead of updating all parameters in a layer, LoRA injects pairs of small, trainable matrices into specific layers (typically the attention projections) whose product represents a low-rank approximation of the weight update. Because the rank of these matrices is kept much smaller than the full weight dimensions, the number of trainable parameters introduced is a tiny fraction of the original model size, often less than 1%. The original pre-trained weights remain frozen throughout training, preserving the knowledge encoded during pretraining.

The mechanics rely on a straightforward decomposition: for a weight matrix W of shape d × k, LoRA adds a bypass path BA, where B is d × r and A is r × k, with rank r ≪ min(d, k). During training, only A and B are updated. At inference time, the product BA can be merged directly into W with a simple addition, meaning LoRA introduces zero additional latency compared to a fully fine-tuned model. This merge-at-inference property is a key practical advantage over other parameter-efficient methods that require persistent adapter modules.

LoRA matters because fine-tuning frontier-scale language models—which can have tens or hundreds of billions of parameters—is prohibitively expensive for most practitioners. LoRA reduces GPU memory requirements dramatically, enabling adaptation on consumer-grade hardware and making customization of powerful models broadly accessible. It also simplifies multi-task deployment: a single base model can be paired with multiple lightweight LoRA adapters, each specialized for a different task, without storing redundant copies of the full model weights.

Since its introduction, LoRA has become a foundational tool in the LLM ecosystem, underpinning popular fine-tuning frameworks and spawning variants such as QLoRA (which combines quantization with LoRA for even greater memory efficiency) and DoRA (which decomposes weight updates into magnitude and direction components). Its combination of simplicity, efficiency, and strong empirical performance has made it the default starting point for practitioners adapting large models to specific domains or tasks.

Related

Related

LAQ (Locally-Adaptive Quantization)
LAQ (Locally-Adaptive Quantization)

Quantization method that adjusts precision locally based on data characteristics for better efficiency.

Generality: 101
Adapter Layer
Adapter Layer

Small trainable modules inserted into pre-trained models to enable efficient task adaptation.

Generality: 384
DoLa (Decoding by Contrasting Layers)
DoLa (Decoding by Contrasting Layers)

A decoding method that reduces hallucinations by contrasting outputs across transformer layers.

Generality: 101
Adapter
Adapter

Small trainable modules added to frozen pre-trained models for efficient task-specific fine-tuning.

Generality: 520
Self-Adaptive LLMs (Large Language Models)
Self-Adaptive LLMs (Large Language Models)

LLMs that autonomously adjust their behavior at runtime without full retraining.

Generality: 511
LRM (Large Reasoning Models)
LRM (Large Reasoning Models)

Large-scale neural systems explicitly optimized for multi-step, structured reasoning tasks.

Generality: 384