Zhiyong

Zhiyong Wang

I am a fourth-year (final-year) Computer Science Ph.D. candidate at The Chinese University of Hong Kong (CUHK), fortunate to be supervised by Prof. John C.S. Lui (ACM/IEEE fellow). I was a visiting scholar at Cornell University from March to October 2024, where I was privileged to be advised by Prof. Wen Sun. I am also fortunate to cooperate closely with many other excellent researchers, including Prof. Dongruo Zhou from Indiana University, Prof. Shuai Li from Shanghai Jiao Tong University, Prof. Zhongxiang Dai from CUHK Shenzhen, and Dr. Tong Yu from Adobe Research.

The primary goal of my research is to design provably efficient and practical algorithms for data-driven online sequential decision-making under uncertainty. Specifically, I am interested in reinforcement learning (RL), multi-armed bandits, and their applications (e.g., in (conversational) recommendation systems, computer networks, video analytics, etc). Recently, I have also been interested in RL (including bandits) + Generative AI (e.g., diffusion models, LLMs, etc). If you share common interests and would like to explore collaboration or simply have a discussion, feel free to contact me.

Email / CV / GoogleScholar / OpenReview / LinkedIn / Twitter / WeChat

Ph.D. Thesis

Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision Making [arXiv]
Zhiyong Wang.

Preprints & Workshop Publications (* denotes equal contribution)

Efficient Controllable Diffusion via Optimal Classifier Guidance [arXiv]
Owen Oertell*, Shikun Sun*, Yiding Chen*, Jin Peng Zhou, Zhiyong Wang, and Wen Sun.

Large Language Model-Enhanced Multi-Armed Bandits [arXiv]
Jiahang Sun*, Zhiyong Wang*, Runhan Yang*, Chenjun Xiao, John C.S. Lui, and Zhongxiang Dai.
Accepted in ICLR 2025 Workshop on Reasoning and Planning for Large Language Models

Meta-Prompt Optimization for LLM-Based Sequential Decision Making [arXiv]
Mingze Kong, Zhiyong Wang, Yao Shu, and Zhongxiang Dai.
Accepted in ICLR 2025 Workshop on Reasoning and Planning for Large Language Models

Federated Linear Dueling Bandits [arXiv]
Xuhan Huang, Yan Hu, Zhiyan Li, Zhiyong Wang, Benyou Wang, and Zhongxiang Dai.

Cascading Bandits Robust to Adversarial Corruptions [arXiv]
Jize Xie, Cheng Chen, Zhiyong Wang, and Shuai Li.

Publications (* denotes equal contribution, # denotes corresponding author)

Provable Zero-Shot Generalization in Offline Reinforcement Learning [Workshop Version] [arXiv]
Zhiyong Wang, Chen Yang, John C.S. Lui, and Dongruo Zhou.
Accepted in the Adaptive Learning in Complex Environments TTIC Workshop, 2024
Also accepted in ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and Theorists
Presented at TTIC Summer Workshop 2024: Data-Driven Decision Processes: From Theory to Practice
Accepted in the Forty-Second International Conference on Machine Learning (ICML), 2025.

Online Clustering of Dueling Bandits [arXiv]
Zhiyong Wang, Jiahang Sun, Mingze Kong, Jize Xie, Qinghua Hu, John C.S. Lui, and Zhongxiang Dai.
Accepted in the Forty-Second International Conference on Machine Learning (ICML), 2025.

Federated In-Context Learning: Iterative Refinement for Improved Answer Quality [arXiv]
Ruhan Wang*, Zhiyong Wang*, Chengkai Huang*, Rui Wang, Tong Yu, Lina Yao, John C.S. Lui, and Dongruo Zhou.
Accepted in the Forty-Second International Conference on Machine Learning (ICML), 2025.

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [arXiv]
Zhiyong Wang, Dongruo Zhou, John C.S. Lui, and Wen Sun.
Selected as a course reference paper for CS 6789: Foundations of Reinforcement Learning at Cornell University.
Invited to give a talk @John Hopcroft Center of Shanghai Jiao Tong University (SJTU), May 2025. [slides]
Accepted in the Thirteenth International Conference on Learning Representations (ICLR), 2025.

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [arXiv]
Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, and Dongruo Zhou.
Accepted in the Adaptive Learning in Complex Environments TTIC Workshop, 2024
Also accepted in ICML 2024 Workshop: Foundations of Reinforcement Learning and Control -- Connections and Perspectives
Presented at the 25th International Symposium on Mathematical Programming (ISMP), 2024.
Accepted in the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 2025.

Online Learning and Detecting Corrupted Users for Conversational Recommendation Systems
Xiangxiang Dai*, Zhiyong Wang*#, Jize Xie, Tong Yu, and John C.S. Lui.
Accepted in the IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024

Conversational Recommendation with Online Learning and Clustering on Misspecified Users
Xiangxiang Dai*, Zhiyong Wang*#, Jize Xie, Xutong Liu, and John C.S. Lui.
Accepted in the IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond [arXiv]
Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C.S. Lui, and Wei Chen.
Accepted in the Forty-first International Conference on Machine Learning (ICML), 2024

Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols
Xiangxiang Dai*, Zhiyong Wang*, Jiancheng Ye, and John C.S. Lui.
Accepted in the IEEE/ACM International Symposium on Quality of Service (IWQoS), 2024

Online Optimal Service Caching for Multi-Access Edge Computing: A Constrained Multi-Armed Bandit Optimization Approach
Weibo Chu, Xiaoyan Zhang, Xinming Jia, John C.S. Lui, and Zhiyong Wang.
Accepted in the Computer Networks. 2024.

Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users [arXiv]
Hantao Yang, Xutong Liu, Zhiyong Wang, Hong Xie, John C.S. Lui, Defu Lian, and Enhong Chen.
Accepted in the AAAI Conference on Artificial Intelligence (AAAI), 2024

Learning Context-Aware Probabilistic Maximum Coverage Bandits: A Variance-Adaptive Approach
Xutong Liu, Jinhang Zuo, Junkai Wang, Zhiyong Wang, Yuedong Xu, and John C.S. Lui.
Accepted in IEEE International Conference on Computer Communications (INFOCOM), 2024

Online Clustering of Bandits with Misspecified User Models [arXiv] [OpenReview]
Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, and John C.S. Lui.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

Online Corrupted User Detection and Regret Minimization [arXiv] [OpenReview]
Zhiyong Wang, Jize Xie, Tong Yu, Shuai Li, and John C.S. Lui.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

Adversarial Attacks on Online Learning to Rank with Click Feedback [arXiv] [OpenReview]
Jinhang Zuo, Zhiyao Zhang, Zhiyong Wang , Shuai Li, Mohammad Hajiesmaili, and Adam Wierman.
Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits [arXiv]
Zhiyong Wang, Xutong Liu, Shuai Li, and John C.S. Lui.
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023

Education

The Chinese University of Hong Kong (CUHK), Ph.D. in Computer Science and Engineering, August 2021- July 2025 (expected).
Cornell University, Visiting Ph.D. student in Computer Science, March 2024- October 2024.
Huazhong University of Science and Technology (HUST), B.E. in Electronic Information Engineering (Advanced Class in Mathematics and Physics for Information Science), Sep. 2017 - Jun. 2021

Working Experience

Research Intern at Microsoft Research Asia Theory Center (Jun. 2023- Sep. 2023), mentor: Dr. Wei Chen (ACM/IEEE fellow, Director of Microsoft Research Asia Theory Center).

Awards

TTIC Summer Workshop Travel Grant for Adaptive Learning in Complex Environments, TTIC, 2024
Reaching Out Award, HKSAR Government, 2024
Full Postgraduate Studentship, CUHK, 2021-2025
National Scholarship, the Ministry of Education of China, 2018-2019
National Scholarship, the Ministry of Education of China, 2017-2018
Outstanding Graduates of Huazhong University of Science and Technology, HUST, 2021
Outstanding Undergraduates in terms of Academic Performance, HUST, 2017-2021
Scholarship for excellent academic performance, HUST, 2019-2020
Merit Student, HUST, 2018-2019
Merit Student, HUST, 2017-2018
Scholarship for outstanding academic performance for Freshmen, HUST, 2018

Invited Talks

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds, @John Hopcroft Center of Shanghai Jiao Tong University (SJTU), May 2025. [slides]

Teaching Experience

Guest Lecture at CS 6789: Foundations of Reinforcement Learning, Cornell University, Fall 2024
CSCI2040: Introduction to Python, CUHK, Fall 2021, Fall 2022, Spring 2023, Fall 2023
CSCI1510: Computer Principles and C Programming, CUHK, Spring 2022