An optimization method that searches over symbolic programs instead of tuning neural network weights
Symbolic descent is a proposed optimization technique that replaces gradient descent's continuous parameter tuning with discrete search through the space of symbolic programs. Where gradient descent fits a parametric curve to data by iteratively adjusting weights, symbolic descent seeks the simplest possible symbolic expression — a compact program — that explains the observed input-output relationship. The term was introduced by François Chollet's lab NDIA in the context of building alternatives to deep learning from first principles.
The core motivation is a fundamental limitation of parametric models: they approximate functions by fitting enormous numbers of parameters, producing representations that are large, opaque, and brittle outside their training distribution. Symbolic descent instead aims to find minimal-length programs that capture the underlying structure of data. Because these programs are symbolic rather than numerical, traditional gradient-based optimization cannot be applied directly. The method requires new search algorithms that navigate discrete program spaces efficiently — a challenge closer to program synthesis than to conventional machine learning.
The practical implications are significant if the approach scales. Symbolic models that are orders of magnitude smaller than their neural counterparts would require far less data to learn, run far more efficiently at inference time, and generalize more robustly to novel inputs because they encode structural understanding rather than statistical correlation. They would also compose more naturally, since small symbolic modules can be chained and recombined in ways that massive parameter tensors cannot. The approach represents a direct response to evidence — most dramatically from benchmarks like ARC-AGI — that scaling parametric models alone does not produce the adaptive, exploratory reasoning characteristic of general intelligence.