Sleep

Processing multi-talker speech in a tone language: Dumplings interfere with sleep at a cocktail party.

TL;DR

A robust benefit of talker F0 separation is observed in Mandarin cocktail party listening, but the advantage is constrained by the lexical role of F0, with real-word tonal minimal pairs lowering recognition accuracy relative to baseline.

Key Findings

Target word recognition was substantially more accurate when target and masker talkers were of different sexes compared to the same sex.

  • Different-sex talker condition yielded 85% recognition accuracy.
  • Same-sex talker condition yielded 48% recognition accuracy.
  • This difference represents a robust benefit of talker F0 separation in a cocktail party scenario.
  • The finding demonstrates that tone language listeners leverage talker-F0 differences similarly to non-tone language listeners.

Real-word tonal minimal pairs interfered with target word recognition accuracy relative to baseline, whereas nonword tonal minimal pairs did not.

  • Real-word tonal minimal pairs produced an average 4% decrease in recognition accuracy relative to baseline.
  • Nonword tonal minimal pairs did not compromise recognition performance.
  • This pattern indicates that the effect of word-F0 on recognition was modulated by lexical status.
  • The interference effect suggests competition at the lexical level when tonal alternates form real words.

F0 serves a dual function in Mandarin that creates a constraint on the talker-separation benefit observed in cocktail party listening.

  • In Mandarin, F0 serves both to distinguish talker identity (via differences in vocal fundamental frequency between sexes) and to distinguish word meaning (lexical tone).
  • Tone language listeners exploit talker-F0 differences for source separation just as non-tone language listeners do.
  • However, when F0 cues also activate competing real-word lexical entries (tonal minimal pairs), recognition accuracy is reduced.
  • The dual role of F0 in tone languages creates a lexically-driven cost that is absent in non-tone language cocktail party listening.

What This Means

This research investigated how Mandarin Chinese listeners identify a target speaker's words when another talker is speaking at the same time — the classic 'cocktail party problem.' In Mandarin, the pitch of a voice (fundamental frequency, or F0) plays two roles simultaneously: it helps listeners tell different speakers apart (men typically speak at lower pitches than women), and it also changes the meaning of words, since Mandarin is a tone language where the same syllable spoken with different pitches means completely different things (for example, 'sleep' versus 'dumplings'). The study found a large benefit when the target and masking talkers were of different sexes (85% accuracy) compared to the same sex (48% accuracy), showing that Mandarin listeners, like listeners of non-tonal languages, use pitch differences between speakers to 'tune in' to the right voice. However, the study also found that when the target word had a tonal 'twin' — another real Mandarin word that sounds identical except for its tone — recognition accuracy dropped by about 4% compared to a baseline condition. Importantly, this interference only occurred when the tonal alternate was a real word, not when it was a nonsense word. This suggests that the interference happens at the level of word meaning: when pitch cues point toward a competing real word, listeners are pulled in two directions at once. This research suggests that while tone language listeners are just as skilled as speakers of non-tonal languages at using pitch to separate competing voices in noisy environments, there is a hidden cost: the same pitch information that helps them focus on one speaker can simultaneously activate unintended word meanings. This tension between pitch's two roles — speaker identity and word meaning — is a unique challenge for listeners of tone languages like Mandarin in real-world noisy communication settings.

Have a question about this study?

Citation

Wang X, Lee C, Zhang Y, Wiener S. (2026). Processing multi-talker speech in a tone language: Dumplings interfere with sleep at a cocktail party.. JASA express letters. https://doi.org/10.1121/10.0042461