latency

Inference latency measures the delay between input and output in machine learning model predictions, affecting real-time usability and system responsiveness.

[Inference latency](/en/terms/inference-latency) is a core performance metric that quantifies the end-to-end time required for a model to process input data, perform computations, and return a prediction. It can be decomposed into queuing delay, compute time, and data transfer overhead. Factors influencing latency include model size and architecture, hardware accelerators, software runtime, batch size, data preprocessing and postprocessing, network latency, and serving stack. Techniques to reduce latency cover model optimization (pruning, quantization, distillation), compiler and runtime optimizations (operator fusion, graph optimizations), and hardware acceleration (GPUs, TPUs, NPUs). Real-time and near-real-time applications (autonomous systems, trading, interactive assistants) demand tight latency budgets and careful measurement of tail latency (e.g., p95/p99).

        graph LR
  Center["latency"]:::main
  Rel_network_latency["network-latency"]:::related -.-> Center
  click Rel_network_latency "/terms/network-latency"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

      

🧠 Knowledge Check

1 / 1

🧒 Explain Like I'm 5

Latency is like the delay when you call someone's name and wait for them to say 'Hello'. If they are right next to you, latency is low. If they are across a football field, the sound takes time to travel, so latency is higher.

🤓 Expert Deep Dive

Latency is composed of several delays: Processing Delay (router speed), Queuing Delay (waiting in line), Transmission Delay (pushing bits onto the wire), and Propagation Delay (the speed of light in the medium). Every mile of fiber optic cable adds about 0.005ms of propagation latency.

📚 Sources