IF-MDD: INDIRECT FUSION FOR PROMPT-FREE MISPRONUNCIATION DETECTION AND DIAGNOSIS

Author: Haopeng Geng, Saito Daisuke, Nobuaki Minematsu

Institution: Graduate School of Engineering, The University of Tokyo

Abstract

IF-MDD Overview

Mispronunciation detection and diagnosis (MDD) plays a vital role in computer-assisted language learning (CALL). Although recent approaches have achieved promising performance by leveraging canonical phonemes as auxiliary inputs, this reliance constrains their applicability in spontaneous language learning scenarios. In this work, we propose IF-MDD, an indirect fusion model that integrates canonical phoneme and error-related information as privileged information during training while obviating the requirement of text-prompting at inference. Despite being trained on limited data, IF-MDD achieved competitive diagnostic performance, reaching an F1 score of 60.67\% and an error diagnosis rate of 19.98\% on the L2-ARCTIC. Furthermore, experiments show that IF-MDD generalizes reliably to unseen speakers with diverse L1 backgrounds. These findings underscore the potential of IF-MDD as a scalable and practical solution for language learners. IF-MDD Overview
Legend:
Mispronunciation
True Accept
False Accept
False Reject
Correct Diagnosis
Error Diagnosis
View: