Lei Xie

Professor · Director, Audio, Speech and Language Processing Lab (ASLP)

lxie_pic.jpeg

Lei Xie is a Professor at Northwestern Polytechnical University, where he leads the Audio, Speech and Language Processing Lab (ASLP@NPU). His research focuses on speech processing, conversational AI, and neural models for speech and language technologies, with work spanning speech enhancement, automatic speech recognition, and speech synthesis.

He is also committed to building open-source tools and data resources for the research community, including the widely used WeNet toolkit and the WenetSpeech open-data series.

Professor Xie has published over 400 papers, received more than 17,000 Google Scholar citations, and has an H-index of 62. His work has received multiple best paper awards, won international challenge championships, and has been translated into industrial applications. He currently serves as Vice Chairperson of ISCA SIG-CSLP and Senior Area Editor for IEEE/ACM TASLP and IEEE SPL.

Email: lxie@nwpu.edu.cn
Address: Room 207, School of Computer Science, Changan Campus, Northwestern Polytechnical University, 710129, Changan Discrict, Xian, China
Full Biography

Lei Xie is a Professor at the School of Computer Science, Northwestern Polytechnical University (NPU), where he leads the Audio, Speech and Language Processing Lab (ASLP@NPU). His research focuses on speech processing, conversational AI, advanced neural models for speech and language technologies and large audio/speech language models, with contributions spanning speech enhancement, automatic speech recognition, speech synthesis and spoken dialogue systems.

He is also committed to advancing open-source research infrastructure for the community, leading projects such as the widely used WeNet speech recognition toolkit and the WenetSpeech open-data series.

Dr. Xie received his Ph.D. in Computer Engineering from NPU, where his doctoral research focused on speech recognition. Before joining NPU as a faculty member, he held research positions at Vrije Universiteit Brussel, City University of Hong Kong, and The Chinese University of Hong Kong.

He has received several honors and recognitions, including the New Century Excellent Talents Program of the Ministry of Education of China, the Shaanxi Young Science and Technology Star Award, recognition as one of the World’s Top 2% Scientists (Stanford University & Elsevier), and the title of Huawei Cloud AI Distinguished Teacher.

Professor Xie has published over 400 peer-reviewed papers in audio, speech, and language processing, with more than 17,000 citations on Google Scholar and an H-index of 62. His work has received multiple best paper awards at international conferences and won several international challenge championships. A number of his research outcomes have also been successfully translated into real-world industrial applications.

At ASLP@NPU, he mentors a diverse group of students and researchers working at the intersection of speech, audio, and language intelligence. He is also an active contributor to the research community, serving in leadership and editorial roles. He currently serves as Vice Chairperson of the ISCA Special Interest Group on Chinese Spoken Language Processing (SIG-CSLP) and as Senior Area Editor for both IEEE/ACM Transactions on Audio, Speech, and Language Processing and IEEE Signal Processing Letters.


News

Apr 10, 2026 The 2026 master’s cohort graduated successfully and joined top companies such as Alibaba, Tencent, and JD.com. Congratulations!
Apr 07, 2026 WenetSpeech-Wu - The largest Wu Chinese dataset to date, accepted by ACL2026.
Apr 07, 2026 LLM-forced Aligner, the technology behind Qwen3-Qwen/Qwen3-ForcedAligner, accepted by ACL2026
Mar 17, 2026 4 papers accepted by ICME2026
Jan 18, 2026 8 papers accepted by ICASSP2026
Jan 08, 2026 VoiceSculptor, a voice design model, now open-sourced

Lab

The Audio, Speech and Language Processing Lab (ASLP@NPU), led by Prof. Lei Xie at Northwestern Polytechnical University, is widely recognized as one of the leading research groups in speech, audio, and language technologies. The lab conducts cutting-edge research spanning speech recognition, speech synthesis, speech enhancement, spoken dialogue systems, and emerging audio language models, with a strong commitment to both scientific innovation and real-world impact.

ASLP@NPU places equal emphasis on research excellence and practical deployment, and has maintained close and long-term collaborations with industry. Many of its research outcomes have been successfully translated into real applications, while its open-source platforms and data resources — including WeNet and WenetSpeech — have been widely adopted by both academia and industry.

The lab has also played an important role in cultivating talent for the broader AI and speech community, with many alumni becoming technical leaders, senior researchers, and key engineering contributors in leading technology companies and research institutions.

By combining academic depth, engineering strength, and industrial relevance, ASLP@NPU continues to advance the frontier of speech intelligence and next-generation human–machine communication.

Recent Popular Open-source Projects
  • SoulX-Podcast — Inference codebase for generating high-fidelity podcasts from text with multi-speaker multi-dialect support
  • DiffRhythm — End-to-end full-length song generation via latent diffusion
  • OSUM — Open speech understanding model for limited academic resources
  • SongEval — Aesthetic evaluation toolkit for generated songs
  • WenetSpeech-Yue — Large-scale Cantonese speech corpus with multi-dimensional annotation
  • MeanVC — Lightweight and streaming zero-shot voice conversion via mean flows
  • VoiceSculptor — Instruct text-to-speech solution based on LLaSA and CosyVoice2
  • WenetSpeech-Chuan — Large-scale Sichuanese dialect speech corpus
  • DiffRhythm2 — Efficient high-fidelity song generation via block flow matching
  • WenetSpeech-Wu-Repo — Large-scale Wu dialect speech corpus with multi-dimensional annotation

Recent Publications

Collaborators

  1. ICASSP
    Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
    Bingshen Mu, Pengcheng Guo, Zhaokai Sun, Shuai Wang, Hexin Liu, Mingchen Shao, and 5 more authors
    In ICASSP, 2026
  2. ICASSP
    WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing
    Yuhang Dai, Ziyu Zhang, Shuai Wang, Longhao Li, Zhao Guo, Tianlun Zuo, and 10 more authors
    In ICASSP, 2026
  3. ICASSP
    Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
    Mingchen Shao, Bingshen Mu, Chengyou Wang, Hai Li, Ying Yan, Zhonghua Fu, and 1 more author
    In ICASSP, 2026
  4. ICASSP
    MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
    Guobin Ma, Jixun Yao, Ziqian Ning, Yuepeng Jiang, Lingxin Xiong, Lei Xie, and 1 more author
    In ICASSP, 2026
  5. ICASSP
    S²Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion
    Ziqian Wang, Xianjun Xia, Chuanzeng Huang, and Lei Xie
    In ICASSP, 2026
  6. ICASSP
    The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
    Guobin Ma, Yuxuan Xia, Jixun Yao, Huixin Xue, Hexin Liu, Shuai Wang, and 2 more authors
    In ICASSP, 2026
  7. ICASSP
    The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era
    Zhixian Zhao, Shuiyuan Wang, Guojian Li, Hongfei Xue, Chengyou Wang, Shuai Wang, and 10 more authors
    In ICASSP, 2026
  8. ICASSP
    Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
    Guojian Li, Chengyou Wang, Hongfei Xue, Shuiyuan Wang, Dehui Gao, Zihan Zhang, and 5 more authors
    In ICASSP, 2026
  9. ASRU
    DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
    Huakang Chen, Yuepeng Jiang, Guobin Ma, Chunbo Hao, Shuai Wang, Jixun Yao, and 4 more authors
    In ASRU, 2025
  10. AAAI
    Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
    Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, and 2 more authors
    In AAAI, 2025
  11. AAAI
    StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
    Jixun Yao, Yang Yuguang, Yu Pan, Ziqian Ning, Jianhao Ye, Hongbin Zhou, and 1 more author
    In AAAI, 2025
  12. ICASSP
    ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
    Xinfa Zhu, Lei He, Yujia Xiao, Xi Wang, Xu Tan, Sheng Zhao, and 1 more author
    In ICASSP, 2025
  13. ICASSP
    CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
    He Wang, Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou, Guojian Li, and 1 more author
    In ICASSP, 2025
  14. ICASSP
    HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
    Bingshen Mu, Kun Wei, Qijie Shao, Yong Xu, and Lei Xie
    In ICASSP, 2025
  15. ICASSP
    DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
    Qing Wang, Jixun Yao, Zhaokai Sun, Pengcheng Guo, Lei Xie, and John H.L. Hansen
    In ICASSP, 2025
  16. ICLR
    GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
    Jixun Yao, Hexin Liu, Chen Chen, Yuchen Hu, EngSiong Chng, and Lei Xie
    In ICLR, 2025
  17. Interspeech
    EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
    Jixun Yao, Hexin Liu, Eng Siong Chng, and Lei Xie
    In Interspeech, 2025
  18. Interspeech
    Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
    Tianyi Xu, Hongjie Chen, Wang Qing, Lv Hang, Jian Kang, Li Jie, and 3 more authors
    In Interspeech, 2025
  19. Interspeech
    Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
    Hongfei Xue, Yufeng Tang, Jun Zhang, Xuelong Geng, and Lei Xie
    In Interspeech, 2025
  20. Interspeech
    AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
    Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, and 5 more authors
    In Interspeech, 2025
Full Publications →

Professional Services


Awards

  • 1st Place, ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
  • 3rd Place, Single Track, Interspeech 2026 Audio Reasoning Challenge
  • 1st Place, In-Domain Singing Style Conversion Track, ASRU 2025 The Singing Voice Conversion Challenge
  • 1st Place, Zero-Shot Singing Style Conversion Track, ASRU 2025 The Singing Voice Conversion Challenge
  • 1st Place, General Audio Source Separation Track, NCMMSC 2025 CCF Advanced Audio Technology Competition
  • 2nd Place, Target Speaker Lipreading Track, ICME 2024 Chat-scenario Chinese Lipreading (ChatCLR) Challenge
  • 1st Place, Source Speaker Verification Against Voice Conversion Track, SLT 2024 Source Speaker Tracing Challenge(SSTC)
  • 1st Place, ICASSP 2024 Packet Loss Concealment (PLC) Challenge
  • 2nd Place, Real-time Track, ICASSP 2024 Speech Signal Improvement Challenge
  • 3rd Place, Non-real-time Track, ICASSP 2024 Speech Signal Improvement Challenge
  • 2nd Place, ICASSP 2024 Multimodal Information based Speech Processing (MISP) Challenge
  • 1st Place, 2024 Shenghua Cup Acoustic Technology Competition
  • 1st Place, Single-Speaker VSR Track, NCMMSC 2024 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, Multi-Speaker VSR Track, NCMMSC 2024 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge(LRDWWS Challenge)
  • 1st Place, Speech-to-Speech Translation (Offline) Track, ACL 2023 Speech-to-Speech Translation (S2ST)
  • 1st Place, Any-to-one, In-domain Singing Voice Conversion Track, ASRU 2023 The Singing Voice Conversion Challenge
  • 2nd Place, Any-to-one, Cross-domain Singing Voice Conversion Track, ASRU 2023 The Singing Voice Conversion Challenge
  • 2nd Place, Audio-Visual Target Speaker Extraction (AVTSE) Track, ICASSP 2023 Multi-modal Information based Speech Processing (MISP) Challenge
  • 1st Place, UDASE (Unsupervised Domain Adaptation for Speech Enhancement) Track, Interspeech 2023 CHiME Speech Separation and Recognition Challenge (CHiME-7)
  • 1st Place, Non-personalized AEC Track, ICASSP 2023 Acoustic Echo Cancellation Challenge (AEC Challenge)
  • 2nd Place, Personalized AEC Track, ICASSP 2023 Acoustic Echo Cancellation Challenge (AEC Challenge)
  • 2nd Place, Audio-Visual Diarization & Recognition Track, ICASSP 2023 Multimodal Information based Speech Processing (MISP) - Challenge
  • 3rd Place, Audio-Visual Speaker Diarization Track, ICASSP 2023 Multimodal Information based Speech Processing (MISP) Challenge
  • 1st Place, Headset Speech Enhancement Track, ICASSP 2023 Deep Noise Suppression Challenge
  • 1st Place, Speakerphone Speech Enhancement Track, ICASSP 2023 Deep Noise Suppression Challenge
  • 1st Place, Speech Enhancement Track, 2023 Shenghua Cup Acoustic Technology Competition
  • 1st Place, ASRU 2023 MultiLingual Speech processing Universal PERformance Benchmark (SUPERB)
  • 1st Place, Single-Speaker VSR Track, NCMMSC 2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, Multi-Speaker VSR Track, NCMMSC 2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, Speaker Anonymization Track, Interspeech 2022 VoicePrivacy 2022 Challenge (VPC 2022)
  • 2nd Place, Fully-supervised Track, Interspeech 2022 Far-field Speaker Verification Challenge (FFSVC)
  • 2nd Place, Semi-supervised Track, Interspeech 2022 Far-field Speaker Verification Challenge (FFSVC)
  • 2nd Place, ISCSLP 2022 Magichub Code-Switching ASR Challenge
  • 3rd Place, ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge
  • 1st Place, Constrained Track, O-COCOSDA 2022 Indic Multilingual Speaker Verification Challenge (I-MSV)
  • 3rd Place, Unconstrained Track, O-COCOSDA 2022 Indic Multilingual Speaker Verification Challenge (I-MSV)
  • 3rd Place, NCMMSC 2022 Low-resource Mongolian Text-to-Speech Challenge
  • 2nd Place, Training with VoxCeleb 1/2 Only Track, VoxSRC 2021 Workshop 2021 VoxCeleb Speaker Recognition Challenge (VoxSRC)
  • 2nd Place, Additional Public Data Allowed (e.g., MUSAN, RIR) Track, VoxSRC 2021 Workshop 2021 VoxCeleb Speaker Recognition - Challenge (VoxSRC)
  • 3rd Place, Real-Time Wideband Speech Enhancement Track, Interspeech 2021 Deep Noise Suppression Challenge (DNS Challenge)
  • 3rd Place, Real-Time AEC & Speech Enhancement Track, Interspeech 2021 Acoustic Echo Cancellation Challenge (AEC Challenge)
  • 1st Place, Close-talking Single-channel Track, ISCSLP 2021 Personalized Voice Trigger Challenge (PVTC)
  • 1st Place, Real-Time Wideband Speech Enhancement Track, Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge)
  • 2nd Place, Non-Real-Time Wideband Speech Enhancement Track, Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge)
  • 1st Place, Closed-set Word-level Audio-Visual Speech Recognition Track, ICMI 2019 Mandarin Audio-Visual Speech Recognition - Challenge
  • 3rd Place, Interspeech 2018 CHiME Speech Separation and Recognition Challenge (CHiME-5)
  • 2nd Place, Unsupervised Subword Unit Modeling Track, Interspeech 2017 Zero Resource Speech Challenge
  • 1st Place, Spoken Term Discovery Track, Interspeech 2015 Zero Resource Speech Challenge
  • 1st Place, QUESST (Query-by-Example Speech Search) Track, MediaEval Multimedia Benchmark Workshop 2015 Query-by-Example Search on Speech Task (QUESST)
  • 2nd Place, QUESST (Query-by-Example Speech Search) Track, MediaEval Multimedia Benchmark Workshop 2014 Query-by-Example Search on Speech Task (QUESST)