Haopeng Geng is currently pursuing a Ph.D. at The University of Tokyo. His research interests lie in solving atypical speech phenomena, such as L2 speech, accented speech, and pathological speech, using deep learning and signal processing methods.

Before joining UTokyo, he worked as an AI software engineer at Laronix, an Australian start-up company that developed the world’s first pneumatic larynx. In this role, he was responsible for evaluating and improving the speech quality of products designed for the voice loss community. He earned his M.S. and B.S. degrees from Nagoya University and Dalian University of Technology (DLUT), respectively.

📝 Publications

Submitted to ICASSP 2026
sym
  • IF-MDD: Indirect Fusion for Prompt-free Mispronunciation Detection and Diagnosis
    Haopeng Geng, Saito Daisuke, Nobuaki Minematsu.
    🎧Demo, 💻Github Repo
    • Prompt-free Mispronunciation Detection - Developed IF-MDD, an indirect fusion framework that leverages canonical phonemes only during training, enabling inference without text prompts.
    • Strong Diagnostic Performance - Achieved 60.67% F1 and 19.98% error diagnosis rate on L2-ARCTIC, showing competitive results even with limited training data.
    • Robust Generalization - Demonstrated reliable performance across unseen speakers from diverse L1 backgrounds, highlighting scalability for real-world CALL applications.
Interspeech 2025
sym
sym
  • A Perception-Based L2 Speech Intelligibility Indicator: Leveraging a Rater’s Shadowing and Sequence-to-sequence Voice Conversion
    Haopeng Geng, Saito Daisuke, Nobuaki Minematsu.
    🎧Demo
    • Customized Intelligibility Indicator – Proposed a novel metric leveraging native raters’ shadowing data, focusing on perceptual cues rather than purely native-like pronunciation.
    • Seq2Seq Voice Conversion Framework – Applied alignment and acoustic reconstruction modules to simulate how native listeners detect unintelligible segments.
    • Multi-Task Learning for Feedback – Jointly optimized speech reconstruction and disfluency detection, achieving closer alignment with native raters’ judgments than mainstream ASR and enabling more personalized CALL feedback.

📖 Educations

  • 2024.04 - Present,
  • 2020.04 - 2022.03
  • 2014.09 - 2019.06
    • B.S. in Computer Science and Technology, Dalian University of Technology,
    • B.A. in Japanese, Dalian University of Technology, China.

💬 Work Experiences

  • 2022.07 - 2023.12, Technical Assistant, Nagoya University,
  • 2022.03 - 2024.10, AI Software Engineer, Laronix, Australia.

🎖 Honors and Awards

💻 Internships