Zhiyong Wang
I am a postdoc at The University of Edinburgh , where I am fortunate to work with Prof. Lukasz Szpruch, Prof. Fengxiang He, and Prof. Jun Wang (University College London) . I obtained my Ph.D. degree in Computer Science and Engineering at The Chinese University of Hong Kong (CUHK) in 2025, fortunate to be supervised by Prof. John C.S. Lui (ACM/IEEE fellow). I was a visiting scholar at Cornell University from March to October 2024, where I was privileged to be advised by Prof. Wen Sun.
The primary goal of my research is to design provably efficient and practical algorithms for data-driven online sequential decision-making under uncertainty. Specifically, I am interested in reinforcement learning (RL), multi-armed bandits, and their applications (e.g., in (conversational) recommendation systems, computer networks, video analytics, etc). Recently, I have also been interested in RL (including bandits) + Generative AI (e.g., diffusion models, LLMs, etc). If you share common interests and would like to explore collaboration or simply have a discussion, feel free to contact me.
Email  / 
CV  / 
GoogleScholar / 
OpenReview / 
LinkedIn / 
Twitter  / 
WeChat
|
|
Ph.D. Thesis
|
Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision Making [arXiv]
Zhiyong Wang.
|
Preprints & Workshop Publications (* denotes equal contribution)
|
Efficient Controllable Diffusion via Optimal Classifier Guidance [arXiv]
Owen Oertell*, Shikun Sun*, Yiding Chen*, Jin Peng Zhou, Zhiyong Wang, and Wen Sun.
|
Large Language Model-Enhanced Multi-Armed Bandits [arXiv]
Jiahang Sun*, Zhiyong Wang*, Runhan Yang*, Chenjun Xiao, John C.S. Lui, and Zhongxiang Dai.
Accepted in ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
|
Meta-Prompt Optimization for LLM-Based Sequential Decision Making [arXiv]
Mingze Kong, Zhiyong Wang, Yao Shu, and Zhongxiang Dai.
Accepted in ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
|
Federated Linear Dueling Bandits [arXiv]
Xuhan Huang, Yan Hu, Zhiyan Li, Zhiyong Wang, Benyou Wang, and Zhongxiang Dai.
|
Cascading Bandits Robust to Adversarial Corruptions [arXiv]
Jize Xie, Cheng Chen, Zhiyong Wang, and Shuai Li.
|
Publications (* denotes equal contribution, # denotes corresponding author)
|
Provable Zero-Shot Generalization in Offline Reinforcement Learning [Workshop Version] [arXiv]
Zhiyong Wang, Chen Yang, John C.S. Lui, and Dongruo Zhou.
Accepted in the Adaptive Learning in Complex Environments TTIC Workshop, 2024
Also accepted in ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and Theorists
Presented at TTIC Summer Workshop 2024: Data-Driven Decision Processes: From Theory to Practice
Accepted in the Forty-Second International Conference on Machine Learning (ICML), 2025.
|
Online Clustering of Dueling Bandits [arXiv]
Zhiyong Wang, Jiahang Sun, Mingze Kong, Jize Xie, Qinghua Hu, John C.S. Lui, and Zhongxiang Dai.
Accepted in the Forty-Second International Conference on Machine Learning (ICML), 2025.
|
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality [arXiv]
Ruhan Wang*, Zhiyong Wang*, Chengkai Huang*, Rui Wang, Tong Yu, Lina Yao, John C.S. Lui, and Dongruo Zhou.
Accepted in the Forty-Second International Conference on Machine Learning (ICML), 2025.
|
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [arXiv]
Zhiyong Wang, Dongruo Zhou, John C.S. Lui, and Wen Sun.
Selected as a course reference paper for CS 6789: Foundations of Reinforcement Learning at Cornell University.
Invited talks @John Hopcroft Center of Shanghai Jiao Tong University (SJTU), May 2025 [slides]; and @ School of Informatics
of The University of Edinburgh, August 2025.
Accepted in the Thirteenth International Conference on Learning Representations (ICLR), 2025.
|
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [arXiv]
Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, and Dongruo Zhou.
Accepted in the Adaptive Learning in Complex Environments TTIC Workshop, 2024
Also accepted in ICML 2024 Workshop: Foundations of Reinforcement Learning and Control -- Connections and Perspectives
Presented at the 25th International Symposium on Mathematical Programming (ISMP), 2024.
Accepted in the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 2025.
|
Online Learning and Detecting Corrupted Users for Conversational Recommendation Systems
Xiangxiang Dai*, Zhiyong Wang*#, Jize Xie, Tong Yu, and John C.S. Lui.
Accepted in the IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024 
|
Conversational Recommendation with Online Learning and Clustering on Misspecified Users
Xiangxiang Dai*, Zhiyong Wang*#, Jize Xie, Xutong Liu, and John C.S. Lui.
Accepted in the IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024 
|
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
[arXiv]
Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C.S. Lui, and Wei Chen.
Accepted in the Forty-first International Conference on Machine Learning (ICML), 2024 
|
Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols
Xiangxiang Dai*, Zhiyong Wang*, Jiancheng Ye, and John C.S. Lui.
Accepted in the IEEE/ACM International Symposium on Quality of Service (IWQoS), 2024 
|
Online Optimal Service Caching for Multi-Access Edge Computing: A Constrained Multi-Armed Bandit Optimization Approach
Weibo Chu, Xiaoyan Zhang, Xinming Jia, John C.S. Lui, and Zhiyong Wang.
Accepted in the Computer Networks. 2024. 
|
Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users [arXiv]
Hantao Yang, Xutong Liu, Zhiyong Wang, Hong Xie, John C.S. Lui, Defu Lian, and Enhong Chen.
Accepted in the AAAI Conference on Artificial Intelligence (AAAI), 2024  
|
Learning Context-Aware Probabilistic Maximum Coverage Bandits: A Variance-Adaptive Approach
Xutong Liu, Jinhang Zuo, Junkai Wang, Zhiyong Wang, Yuedong Xu, and John C.S. Lui.
Accepted in IEEE International Conference on Computer Communications (INFOCOM), 2024  
|
Online Clustering of Bandits with Misspecified User Models [arXiv] [OpenReview]
Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, and John C.S. Lui.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023  
|
Online Corrupted User Detection and Regret Minimization [arXiv] [OpenReview]
Zhiyong Wang, Jize Xie, Tong Yu, Shuai Li, and John C.S. Lui.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023  
|
Adversarial Attacks on Online Learning to Rank with Click Feedback [arXiv] [OpenReview]
Jinhang Zuo, Zhiyao Zhang, Zhiyong Wang , Shuai Li, Mohammad Hajiesmaili, and Adam Wierman.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023  
|
Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits [arXiv]
Zhiyong Wang, Xutong Liu, Shuai Li, and John C.S. Lui.
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023  
|
- TTIC Summer Workshop Travel Grant for Adaptive Learning in Complex Environments, TTIC, 2024
- Reaching Out Award, HKSAR Government, 2024
- Full Postgraduate Studentship, CUHK, 2021-2025
- National Scholarship, the Ministry of Education of China, 2018-2019
- National Scholarship, the Ministry of Education of China, 2017-2018
- Outstanding Graduates of Huazhong University of Science and Technology, HUST, 2021
- Outstanding Undergraduates in terms of Academic Performance, HUST, 2017-2021
- Scholarship for excellent academic performance, HUST, 2019-2020
- Merit Student, HUST, 2018-2019
- Merit Student, HUST, 2017-2018
- Scholarship for outstanding academic performance for Freshmen, HUST, 2018
|
- Guest Lecture at CS 6789: Foundations of Reinforcement Learning, Cornell University, Fall 2024
- CSCI2040: Introduction to Python, CUHK, Fall 2021, Fall 2022, Spring 2023, Fall 2023
- CSCI1510: Computer Principles and C Programming, CUHK, Spring 2022
|
| |