A modular framework for how agents select, compose, and blend external tools and internal policies.
TUMIX, or Tool-Use Mixture, is a probabilistic framework that formalizes how an intelligent agent decides which tools, subroutines, or internal reasoning strategies to employ when solving a task. Rather than treating tool use as a hard, discrete choice, TUMIX represents the agent's decision process as a mixture over a set of tool-specific and intrinsic policy primitives. A learned or inferred gating mechanism assigns context-dependent weights to candidate tools — which may include external APIs, symbolic subroutines, retrieval systems, or neural policy modules — allowing the agent to flexibly blend, switch between, or sequentially compose capabilities depending on the demands of the situation.
At its core, TUMIX frames tool use as a latent-variable problem closely related to mixture-of-experts architectures. Observations and task context condition a gating distribution over tool experts and internal reasoning kernels, and the agent's resulting action distribution is the weighted combination or sequential composition induced by that gating. This formulation supports principled training via expectation-maximization, variational inference, or gradient-based gating, and enables efficient credit assignment across tool calls. It also facilitates modular transfer: a calibrated tool expert learned in one context can be reused in new settings without relearning the underlying reasoning, making the framework particularly attractive for building generalizable, composable agents.
TUMIX has practical implications for both capability and safety. By disentangling tool-selection probabilities from downstream reasoning, the framework makes it easier to audit why and when a particular external capability was invoked, to impose constraints on high-risk tools, and to measure compositional generalization when tools are recombined in novel ways. Implementations typically pair differentiable gating mechanisms with tool interfaces that either support end-to-end gradient flow or learn surrogate estimators for non-differentiable calls.
The framework sits at the intersection of hierarchical reinforcement learning, mixture-of-experts modeling, and program synthesis, and gained traction alongside the rapid growth of tool-augmented large language models and modular agent architectures. As these systems increasingly rely on external tools to extend their capabilities, TUMIX provides a theoretically grounded vocabulary and design template for understanding and improving how agents orchestrate those tools.