๐จโ๐ Biography
Dr. Zhe Li is a Postdoctoral Fellow at The University of Hong Kong (HKU). His research focuses on speech large language models (Speech LLMs) and robust speaker representation learning, with broader interests in multimodal AI for healthcare applications. He received his Ph.D. degree from the Department of Electrical and Electronic Engineering at The Hong Kong Polytechnic University (PolyU). He was a research intern at Microsoft Research Asia (MSRA) and previously conducted international collaborative research as a visiting student scholar with the Department of Electrical Engineering, Stanford University. As a key contributor, he received the 2020 Excellent Science and Technology Achievement Award from the Chinese Association for Artificial Intelligence, and his co-authored paper received the Best Student Paper Runner-Up Award at PRICAI 2024.
โYou are more than what you have become!โ
๐ฐ News
๐ 2026
- Apr. 2026 ๐ Our paper โDB-SMGA: Dual-Branch Sequential Multi-Granularity Attention for Speech Depression Detectionโ has been accepted for publication in IEEE Signal Processing Letters (SPL). Congratulations to Dr. Meirong Song for her excellent work!
- Apr. 2026 ๐ Our paper โUncertainty-Aware Multi-Head Multi-Mode Knowledge Distillation for Self-Supervised Speaker Verificationโ has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Dr. Jin!
- Apr. 2026 ๐ Our tutorial Speech Large Language Models for Under-Resourced Languages has been accepted by InterSpeech 2026 โ see you in September 27โOctober 1, Sydney, Australia ๐ฆ๐บ!
- Mar. 2026 ๐ Our paper โTowards A Unified Perspective on Parameter-Efficient Fine Tuning for Speaker Verificationโ has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Prof. Mak!
- Jan. 2026 ๐ Two papers accepted to ICASSP 2026 โ see you on May 4โ8, 2026, in Barcelona, Spain! ๐ช๐ธ
๐ 2025
- Dec. 2025 ๐ My First Tutorial! Our tutorial Speech Large Language Models: Architectures, Efficient Adaptation, and Applications has been accepted by IEEE ICME 2026 โ see you in Bangkok, Thailand ๐น๐ญ, July 5โ9, 2026!
- Sep. 29, 2025 ๐ Our paper โWhisMultiNet: Advancing End-to-End Speech Topic Classification with Whisper and MultiGateGNNโ has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Xiaozhe Qi!
- Sep. 4, 2025 ๐ Our paper โDisentangling Speech Representations Learning with Latent Diffusion for Speaker Verificationโ has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Prof. Mak!
- Aug. 20, 2025 ๐ One paper accepted to EMNLP 2025 โ see you in Suzhou, China ๐จ๐ณ!
- Jun. 18, 2025 ๐ One paper accepted to MICCAI 2025 โ see you in Daejeon, South Korea ๐ฐ๐ท!
- Jun. 14, 2025 ๐ Our paper โMutual Information-Enhanced Contrastive Learning with Margin for Maximal Speaker Separabilityโ has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP). Thanks to Prof. Mak!
- May 19, 2025 ๐ Two papers accepted to Interspeech 2025 โ see you in Rotterdam, Netherland ๐ณ๐ฑ!
- Mar. 4, 2025 ๐ง๐ปโ๐ซ Paper Sharing Session: I gave a talk on Spectral-Aware Low-Rank Adaptation for Speaker Verification (ICASSP 2025).
- Feb. 11, 2025 ๐ง๐ปโ๐ป Joined Microsoft Research Asia (MSRA) as a Research Intern, focusing on multimodal large models for healthcare.
๐ 2024
- Dec. 21, 2024 ๐ Four papers accepted to ICASSP 2025 โ see you in Hyderabad, India ๐ฎ๐ณ!
- Dec. 4, 2024 ๐ Enhancing Multimodal Rumor Detection with Statistical Image Features and Modal Alignment via Contrastive Learning received the Best Student Paper Runner-Up Award ๐ฅ at PRICAI 2024.
- Jun. 17, 2024 ๐ง๐ปโ๐ซ Paper Sharing Session: Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification (Interspeech 2024).
- Apr. 3, 2024 ๐ง๐ปโ๐ซ Paper Sharing Session: Dual Parameter-Efficient Fine-Tuning for Speaker Representation via Speaker Prompt Tuning and Adapters (ICASSP 2024).
๐ค 2023
- Dec. 8, 2023 Presented Maximal Speaker Separability via Robust Speaker Representation Learning at NCMMSC 2023, Soochow, China ๐จ๐ณ.
- Dec. 3, 2023 Presented Maximal Speaker Separability via Contrastive Learning with Angular Margin and Class-Aware Attention for Hard Samples at International Doctoral Forum 2023, Hong Kong SAR ๐ญ๐ฐ.
๐ 2022โ2020
- May 15, 2023 Paper Sharing Session: Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space (ICASSP 2023).
- Jul. 1, 2022 Participant Talk: Shared on speaker verification at Odyssey-CNSRC Workshop 2022.
- May 29, 2021 ๐ Completed Masterโs oral examination.
- Nov. 14, 2020 ๐ CAAI Award: Received the Excellent Scientific and Technological Achievements Award of the Chinese Association for Artificial Intelligence.
- Oct. 29, 2020 Video: Uploaded CCL 2020 oral presentation.
- Oct. 11, 2020 Video: Uploaded CCMT 2020 oral presentation.
๐ฌ Research Interests
- ๐ง Speech Large Language Models (Speech LLMs) โ efficient fine-tuning, post-training alignment, and speech-based healthcare applications
- ๐ฃ๏ธ Speech Signal Processing โ speaker representation learning, accent recognition, and robust speech modeling
- ๐ฉบ Multimodal and Deep Learning โ multimodal representation learning, cross-modal fusion
๐ผ Research Experience
๐ Postdoctoral Fellow, The University of Hong Kong (HKU) ๐ญ๐ฐ
Supervised by Prof. Ricky Chan
Speech-Language Models ยท Post-training ยท Multimodal Learning๐ป Research Intern, Microsoft Research Asia (MSRA) ๐ญ๐ฐ
Supervised by Dr. Shujie Liu
LLM Fine-tuning ยท Speech Reasoning Models ยท Multilingual Adaptation๐งฎ Visiting PhD Researcher, Stanford University ๐บ๐ธ
Supervised by Prof. Mert Pilanci
Optimization Theory ยท Efficient Adaptation ยท Spectral Methods๐ PhD in Electrical and Electronic Engineering, The Hong Kong Polytechnic University (PolyU) ๐ญ๐ฐ
Supervised by Prof. Man-Wai Mak
Speaker Representation ยท Speaker Verification๐ MSc in Software Engineering, Xinjiang University ๐จ๐ณ
Supervised by Prof. Wushour Silamu, Academician of the Chinese Academy of Engineering
Low-resource NLP ยท Uyghur Language Modeling ยท Harmful Content Detection
