Zhiyong Wang

I am a fourth-year (final-year) Computer Science Ph.D. candidate at The Chinese University of Hong Kong (CUHK), fortunate to be supervised by Prof. John C.S. Lui (ACM/IEEE fellow). I was a visiting scholar at Cornell University from March to October 2024, where I was privileged to be advised by Prof. Wen Sun. I am also fortunate to cooperate closely with many other excellent researchers, including Prof. Dongruo Zhou from Indiana University, Prof. Shuai Li from Shanghai Jiao Tong University, Prof. Zhongxiang Dai from CUHK Shenzhen, and Dr. Tong Yu from Adobe Research.

The primary goal of my research is to design provably efficient and practical algorithms for data-driven online sequential decision-making under uncertainty. Specifically, I am interested in reinforcement learning (RL), multi-armed bandits, and their applications (e.g., in (conversational) recommendation systems, computer networks, video analytics, etc). Recently, I have also been interested in RL (including bandits) + Generative AI (e.g., diffusion models, LLMs, etc). If you share common interests and would like to explore collaboration or simply have a discussion, feel free to contact me.

Email  /  CV  /  GoogleScholar /  OpenReview /  LinkedIn /  Twitter  /  WeChat

profile photo
Preprints & Workshop Publications (* denotes equal contribution)
Towards Zero-Shot Generalization in Offline Reinforcement Learning [OpenReview]
Zhiyong Wang, Chen Yang, John C.S. Lui, and Dongruo Zhou.
Accepted in the Adaptive Learning in Complex Environments TTIC Workshop, 2024
Also accepted in ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and Theorists
Presented at TTIC Summer Workshop 2024: Data-Driven Decision Processes: From Theory to Practice
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
Ruhan Wang*, Zhiyong Wang*, Chengkai Huang*, Rui Wang, Tong Yu, Lina Yao, John C.S. Lui, and Dongruo Zhou.
Online Clustering of Dueling Bandits [arXiv]
Zhiyong Wang, Jiahang Sun, Mingze Kong, Jize Xie, Qinghua Hu, John C.S. Lui, and Zhongxiang Dai.
Large Language Model-Enhanced Multi-Armed Bandits [arXiv]
Jiahang Sun*, Zhiyong Wang*, Runhan Yang*, Chenjun Xiao, John C.S. Lui, and Zhongxiang Dai.
Accepted in ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
Meta-Prompt Optimization for LLM-Based Sequential Decision Making [arXiv]
Mingze Kong, Zhiyong Wang, Yao Shu, and Zhongxiang Dai.
Accepted in ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
Federated Linear Dueling Bandits [arXiv]
Xuhan Huang, Yan Hu, Zhiyan Li, Zhiyong Wang, Benyou Wang, and Zhongxiang Dai.
Cascading Bandits Robust to Adversarial Corruptions [arXiv]
Jize Xie, Cheng Chen Zhiyong Wang, and Shuai Li.
Publications (* denotes equal contribution, # denotes corresponding author)
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [arXiv]
Zhiyong Wang, Dongruo Zhou, John C.S. Lui, and Wen Sun.
Selected as a course reference paper for CS 6789: Foundations of Reinforcement Learning at Cornell University.
Accepted in the Thirteenth International Conference on Learning Representations (ICLR), 2025.
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [arXiv]
Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, and Dongruo Zhou.
Accepted in the Adaptive Learning in Complex Environments TTIC Workshop, 2024
Also accepted in ICML 2024 Workshop: Foundations of Reinforcement Learning and Control -- Connections and Perspectives
Presented at the 25th International Symposium on Mathematical Programming (ISMP), 2024.
Accepted in the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 2025.
Online Learning and Detecting Corrupted Users for Conversational Recommendation Systems
Xiangxiang Dai*, Zhiyong Wang*#, Jize Xie, Tong Yu, and John C.S. Lui.
Accepted in the IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024 
Conversational Recommendation with Online Learning and Clustering on Misspecified Users
Xiangxiang Dai*, Zhiyong Wang*#, Jize Xie, Xutong Liu, and John C.S. Lui.
Accepted in the IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024 
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond [arXiv]
Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C.S. Lui, and Wei Chen.
Accepted in the Forty-first International Conference on Machine Learning (ICML), 2024 
Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols
Xiangxiang Dai*, Zhiyong Wang*, Jiancheng Ye, and John C.S. Lui.
Accepted in the IEEE/ACM International Symposium on Quality of Service (IWQoS), 2024 
Online Optimal Service Caching for Multi-Access Edge Computing: A Constrained Multi-Armed Bandit Optimization Approach
Weibo Chu, Xiaoyan Zhang, Xinming Jia, John C.S. Lui, and Zhiyong Wang.
Accepted in the Computer Networks. 2024. 
Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users [arXiv]
Hantao Yang, Xutong Liu, Zhiyong Wang, Hong Xie, John C.S. Lui, Defu Lian, and Enhong Chen.
Accepted in the AAAI Conference on Artificial Intelligence (AAAI), 2024  
Learning Context-Aware Probabilistic Maximum Coverage Bandits: A Variance-Adaptive Approach
Xutong Liu, Jinhang Zuo, Junkai Wang, Zhiyong Wang, Yuedong Xu, and John C.S. Lui.
Accepted in IEEE International Conference on Computer Communications (INFOCOM), 2024  
Online Clustering of Bandits with Misspecified User Models [arXiv] [OpenReview]
Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, and John C.S. Lui.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023  
Online Corrupted User Detection and Regret Minimization [arXiv] [OpenReview]
Zhiyong Wang, Jize Xie, Tong Yu, Shuai Li, and John C.S. Lui.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023  
Adversarial Attacks on Online Learning to Rank with Click Feedback [arXiv] [OpenReview]
Jinhang Zuo, Zhiyao Zhang, Zhiyong Wang , Shuai Li, Mohammad Hajiesmaili, and Adam Wierman.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023  
Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits [arXiv]
Zhiyong Wang, Xutong Liu, Shuai Li, and John C.S. Lui.
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023  
Education
Working Experience
Awards
  • TTIC Summer Workshop Travel Grant for Adaptive Learning in Complex Environments, TTIC, 2024
  • Reaching Out Award, HKSAR Government, 2024
  • Full Postgraduate Studentship, CUHK, 2021-2025
  • National Scholarship, the Ministry of Education of China, 2018-2019
  • National Scholarship, the Ministry of Education of China, 2017-2018
  • Outstanding Graduates of Huazhong University of Science and Technology, HUST, 2021
  • Outstanding Undergraduates in terms of Academic Performance, HUST, 2017-2021
  • Scholarship for excellent academic performance, HUST, 2019-2020
  • Merit Student, HUST, 2018-2019
  • Merit Student, HUST, 2017-2018
  • Scholarship for outstanding academic performance for Freshmen, HUST, 2018
Teaching Experience
  • Guest Lecture at CS 6789: Foundations of Reinforcement Learning, Cornell University, Fall 2024
  • CSCI2040: Introduction to Python, CUHK, Fall 2021, Fall 2022, Spring 2023, Fall 2023
  • CSCI1510: Computer Principles and C Programming, CUHK, Spring 2022