Safety refusals calibrated for the specific modality (speech vs. text) rather than generic text-based refusals.
Modality-appropriate refusals are safety responses that are calibrated for the specific communication modality in which a request is made — particularly the distinction between text and speech. A refusal in a text interface can be long, precise, and heavily qualified: "I'm sorry, but I can't help with that request because it involves [specific policy category]." A refusal in a spoken interface must be colloquial, brief, and natural-sounding — the same policy boundary expressed in the cadence and vocabulary of natural speech, without sounding robotic or stilted. Modality-appropriate refusals address this gap by calibrating the refusal response to the modality, ensuring that safety responses are both effective and natural-feeling across text, audio, and video channels.
The challenge is that safety responses generated for text can sound jarring or over-cautious when rendered as speech. A text refusal might include extensive hedging language that sounds natural in writing but unnatural in spoken dialogue. A text refusal might also be longer than appropriate for a spoken exchange, creating an awkward pause in a voice conversation. Conversely, a refusal that sounds natural in speech might be too brief or too casual for a text interface. Modality-appropriate refusals address this by training separate refusal behaviors for each modality, with the refusal boundary (what is refused versus what is permitted) calibrated to be equally firm but appropriately expressed.
The training approach described in interaction model research involves using a text-to-speech model to generate refusal and over-refusal training data covering a range of disallowed topics, with the refusal boundary calibrated to favor naturally-phrased but no less firm refusals. This allows the model to learn the appropriate prosody, pacing, and phrasing for refusals in the audio modality while maintaining the same safety policy precision as text refusals.
The deeper principle is that safety calibration is not modality-neutral. A model that is well-calibrated for text interactions may be over- or under-calibrated for audio or visual interactions, because the expression of uncertainty, the social meaning of a refusal, and the user's expectations of appropriate responses differ across modalities. Modality-appropriate safety is an active area of research as AI systems move from text-only to multimodal interaction.