Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. CUDA (Compute Unified Device Architecture)

CUDA (Compute Unified Device Architecture)

NVIDIA's parallel computing platform enabling GPUs to accelerate general-purpose and AI workloads.

Year: 2007Generality: 794
Back to Vocab

CUDA is a parallel computing platform and programming model developed by NVIDIA that allows developers to harness the massive parallelism of graphics processing units (GPUs) for general-purpose computation. Introduced in 2007, it provides extensions to standard languages like C, C++, and Fortran that let programmers write code targeting the GPU's thousands of smaller, specialized cores. Unlike a CPU, which optimizes for low-latency sequential execution across a handful of cores, a GPU is architected to execute thousands of lightweight threads simultaneously — making it exceptionally well-suited for the kind of dense matrix and tensor arithmetic that underlies modern machine learning.

In practice, CUDA exposes a hierarchy of parallelism through concepts like threads, thread blocks, and grids, which map onto the GPU's physical streaming multiprocessors. Developers write kernels — functions that execute in parallel across many threads — and manage data movement between CPU memory (host) and GPU memory (device). NVIDIA's ecosystem around CUDA includes highly optimized libraries such as cuBLAS for linear algebra and cuDNN for deep neural network primitives, which deep learning frameworks like PyTorch and TensorFlow rely on under the hood. This layered ecosystem means most practitioners benefit from CUDA without writing low-level GPU code directly.

CUDA's impact on AI has been transformative. The ability to train deep neural networks orders of magnitude faster than on CPUs was a key enabler of the deep learning revolution that accelerated through the early 2010s. Landmark results — such as AlexNet's 2012 ImageNet victory — were made practical largely because CUDA allowed researchers to train large convolutional networks in days rather than months. Today, virtually all serious deep learning training and inference pipelines depend on CUDA-enabled hardware, and GPU availability has become a primary constraint in large-scale AI development.

Beyond deep learning, CUDA is widely used in scientific simulation, computational finance, medical imaging, and any domain requiring high-throughput numerical computation. Its dominance has also shaped the competitive landscape: alternative platforms like OpenCL and AMD's ROCm exist, but CUDA's maturity, library support, and tight hardware-software integration have made it the de facto standard for GPU-accelerated AI research and production.

Related

Related

GPU (Graphics Processing Unit)
GPU (Graphics Processing Unit)

Massively parallel processor that accelerates deep learning by handling thousands of simultaneous computations.

Generality: 871
Accelerated Computing
Accelerated Computing

Using specialized hardware to dramatically speed up AI and machine learning workloads.

Generality: 794
Accelerator
Accelerator

Specialized hardware that speeds up AI training and inference beyond CPU capabilities.

Generality: 792
TPU (Tensor Processing Unit)
TPU (Tensor Processing Unit)

Google's custom chip designed to accelerate machine learning workloads at scale.

Generality: 550
Accelerator Chip
Accelerator Chip

Specialized hardware that dramatically speeds up AI training and inference workloads.

Generality: 781
Compute
Compute

The processing power and hardware resources required to train and run AI models.

Generality: 875