Dr. Zhe LI

👨‍🎓 Biography

Dr. Zhe Li is a Postdoctoral Fellow at The University of Hong Kong (HKU). His research focuses on speech large language models (Speech LLMs) and robust speaker representation learning, with broader interests in multimodal AI for healthcare applications. He received his Ph.D. degree from the Department of Electrical and Electronic Engineering at The Hong Kong Polytechnic University (PolyU). He was a research intern at Microsoft Research Asia (MSRA) and previously conducted international collaborative research as a visiting student scholar with the Department of Electrical Engineering, Stanford University. As a key contributor, he received the 2020 Excellent Science and Technology Achievement Award from the Chinese Association for Artificial Intelligence, and his co-authored paper received the Best Student Paper Runner-Up Award at PRICAI 2024.

“You are more than what you have become!”

📰 News

🏆 2026

Apr. 2026 🎉 Our paper “DB-SMGA: Dual-Branch Sequential Multi-Granularity Attention for Speech Depression Detection” has been accepted for publication in IEEE Signal Processing Letters (SPL). Congratulations to Dr. Meirong Song for her excellent work!
Apr. 2026 🎉 Our paper “Uncertainty-Aware Multi-Head Multi-Mode Knowledge Distillation for Self-Supervised Speaker Verification” has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Dr. Jin!
Apr. 2026 🎉 Our tutorial Speech Large Language Models for Under-Resourced Languages has been accepted by InterSpeech 2026 — see you in September 27–October 1, Sydney, Australia 🇦🇺!
Mar. 2026 🎉 Our paper “Towards A Unified Perspective on Parameter-Efficient Fine Tuning for Speaker Verification” has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Prof. Mak!
Jan. 2026 🎉 Two papers accepted to ICASSP 2026 — see you on May 4–8, 2026, in Barcelona, Spain! 🇪🇸

🏆 2025

Dec. 2025 🎉 My First Tutorial! Our tutorial Speech Large Language Models: Architectures, Efficient Adaptation, and Applications has been accepted by IEEE ICME 2026 — see you in Bangkok, Thailand 🇹🇭, July 5–9, 2026!
Sep. 29, 2025 🎉 Our paper “WhisMultiNet: Advancing End-to-End Speech Topic Classification with Whisper and MultiGateGNN” has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Xiaozhe Qi!
Sep. 4, 2025 🎉 Our paper “Disentangling Speech Representations Learning with Latent Diffusion for Speaker Verification” has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP)! Thanks to Prof. Mak!
Aug. 20, 2025 🎉 One paper accepted to EMNLP 2025 — see you in Suzhou, China 🇨🇳!
Jun. 18, 2025 🎉 One paper accepted to MICCAI 2025 — see you in Daejeon, South Korea 🇰🇷!
Jun. 14, 2025 🎉 Our paper “Mutual Information-Enhanced Contrastive Learning with Margin for Maximal Speaker Separability” has been accepted by IEEE Transactions on Audio, Speech, and Language Processing (T-ASLP). Thanks to Prof. Mak!
May 19, 2025 🎉 Two papers accepted to Interspeech 2025 — see you in Rotterdam, Netherland 🇳🇱!
Mar. 4, 2025 🧑🏻‍🏫 Paper Sharing Session: I gave a talk on Spectral-Aware Low-Rank Adaptation for Speaker Verification (ICASSP 2025).
Feb. 11, 2025 🧑🏻‍💻 Joined Microsoft Research Asia (MSRA) as a Research Intern, focusing on multimodal large models for healthcare.

🏆 2024

Dec. 21, 2024 🎉 Four papers accepted to ICASSP 2025 — see you in Hyderabad, India 🇮🇳!
Dec. 4, 2024 🏅 Enhancing Multimodal Rumor Detection with Statistical Image Features and Modal Alignment via Contrastive Learning received the Best Student Paper Runner-Up Award 🥈 at PRICAI 2024.
Jun. 17, 2024 🧑🏻‍🏫 Paper Sharing Session: Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification (Interspeech 2024).
Apr. 3, 2024 🧑🏻‍🏫 Paper Sharing Session: Dual Parameter-Efficient Fine-Tuning for Speaker Representation via Speaker Prompt Tuning and Adapters (ICASSP 2024).

🎤 2023

Dec. 8, 2023 Presented Maximal Speaker Separability via Robust Speaker Representation Learning at NCMMSC 2023, Soochow, China 🇨🇳.
Dec. 3, 2023 Presented Maximal Speaker Separability via Contrastive Learning with Angular Margin and Class-Aware Attention for Hard Samples at International Doctoral Forum 2023, Hong Kong SAR 🇭🇰.

📚 2022–2020

May 15, 2023 Paper Sharing Session: Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space (ICASSP 2023).
Jul. 1, 2022 Participant Talk: Shared on speaker verification at Odyssey-CNSRC Workshop 2022.
May 29, 2021 🎓 Completed Master’s oral examination.
Nov. 14, 2020 🏅 CAAI Award: Received the Excellent Scientific and Technological Achievements Award of the Chinese Association for Artificial Intelligence.
Oct. 29, 2020 Video: Uploaded CCL 2020 oral presentation.
Oct. 11, 2020 Video: Uploaded CCMT 2020 oral presentation.

🔬 Research Interests

🧠 Speech Large Language Models (Speech LLMs) – efficient fine-tuning, post-training alignment, and speech-based healthcare applications
🗣️ Speech Signal Processing – speaker representation learning, accent recognition, and robust speech modeling
🩺 Multimodal and Deep Learning – multimodal representation learning, cross-modal fusion

💼 Research Experience

🎓 Postdoctoral Fellow, The University of Hong Kong (HKU) 🇭🇰
Supervised by Prof. Ricky Chan
Speech-Language Models · Post-training · Multimodal Learning
💻 Research Intern, Microsoft Research Asia (MSRA) 🇭🇰
Supervised by Dr. Shujie Liu
LLM Fine-tuning · Speech Reasoning Models · Multilingual Adaptation
🧮 Visiting PhD Researcher, Stanford University 🇺🇸
Supervised by Prof. Mert Pilanci
Optimization Theory · Efficient Adaptation · Spectral Methods
🎓 PhD in Electrical and Electronic Engineering, The Hong Kong Polytechnic University (PolyU) 🇭🇰
Supervised by Prof. Man-Wai Mak
Speaker Representation · Speaker Verification
🎓 MSc in Software Engineering, Xinjiang University 🇨🇳
Supervised by Prof. Wushour Silamu, Academician of the Chinese Academy of Engineering
Low-resource NLP · Uyghur Language Modeling · Harmful Content Detection