An AI model that runs inference directly on local devices rather than the cloud.
An edge model is a machine learning model optimized to perform inference on the device where data is collected—such as a smartphone, IoT sensor, security camera, or embedded microcontroller—rather than sending that data to a remote server or cloud platform. This approach is a direct response to the limitations of centralized inference: network latency, bandwidth costs, connectivity requirements, and privacy concerns all create friction when raw data must travel to a data center before a prediction can be returned. By running the model locally, edge deployment eliminates that round trip entirely.
Building a model for the edge involves significant engineering tradeoffs. Most edge hardware operates under strict constraints on memory, compute, and power consumption that make deploying a standard deep learning model impractical. Practitioners rely on techniques such as quantization (reducing numerical precision from 32-bit floats to 8-bit integers), pruning (removing redundant weights), and knowledge distillation (training a smaller student model to mimic a larger teacher) to shrink models without unacceptable accuracy loss. Frameworks like TensorFlow Lite, PyTorch Mobile, and ONNX Runtime are specifically designed to package and execute these compressed models on resource-constrained hardware.
The practical importance of edge models has grown substantially alongside the proliferation of connected devices. Applications include real-time keyword spotting in smart speakers, on-device face unlock, predictive maintenance on industrial equipment, and autonomous vehicle perception systems where millisecond latency is non-negotiable. In healthcare and finance, edge inference also addresses regulatory and ethical concerns by keeping sensitive data on-premises rather than transmitting it externally.
Edge models represent a fundamental shift in how AI is deployed at scale. Rather than a single powerful model serving all users from the cloud, edge deployment distributes intelligence across millions of endpoints. This creates new challenges around model versioning, over-the-air updates, and monitoring model drift in the field—but it also unlocks use cases that are simply impossible when inference depends on a reliable internet connection.