An ensemble of expert models that jointly guide a student model's training.
A teacher committee is an ensemble of multiple pre-trained or high-performing models that collectively supervise the training of a single student model. Rather than relying on a single teacher's guidance, this approach pools the knowledge of several specialized models, each of which may excel in different aspects of the task. The student learns by aggregating signals from all committee members, which can take the form of soft probability distributions, intermediate feature representations, or ensemble-averaged predictions. This collective supervision tends to produce richer training signals than any individual teacher could provide alone.
The mechanism typically involves each teacher generating predictions or embeddings for a given input, which are then combined—through averaging, weighted aggregation, or learned attention—before being used to compute a distillation loss against the student's outputs. Because committee members often disagree on ambiguous examples, the student is exposed to calibrated uncertainty rather than overconfident labels, encouraging it to learn smoother, more generalizable decision boundaries. Some frameworks also allow teachers to be selected dynamically, routing different inputs to the most relevant specialists within the committee.
Teacher committees are closely related to knowledge distillation and model compression, where the goal is to transfer the collective intelligence of large or numerous models into a compact student suitable for deployment. They also appear in semi-supervised and self-supervised learning pipelines, where committees generate pseudo-labels for unlabeled data with higher reliability than a single model would. The diversity among committee members acts as a regularizer, reducing the risk that the student overfits to idiosyncratic biases of any one teacher.
The practical value of teacher committees lies in their ability to improve student accuracy, calibration, and robustness without requiring additional labeled data or architectural changes to the student. As model ensembles have become computationally expensive to deploy, distilling their knowledge into a single efficient student via a teacher committee offers an attractive trade-off between performance and inference cost—making this technique widely used in production machine learning systems across vision, language, and speech domains.