Data-driven optimization for high-dimensional complex systems
Guest Speaker:Assistant Professor Ye Wei,City University of Hong Kong
Inviter: Assistant Professor Chao Yang
Date&Time: Friday, 21.Jan., 10:00-11:30
Venue: Yiucheng Lecture Hall(500),Xu Zuyao Building
Biography:
Ye Wei is the Presidential Assistant Professor at the Department of Data Science, City University of Hong Kong since January 2025. Ye earned his MSc in Physics from RWTH Aachen University in 2018 and completed his PhD in 2021 at the Max Planck Institute for Sustainable Materials and Intelligent Systems in Germany. Following his doctorate, he pursued postdoctoral research in computer science at the Institute of Interdisciplinary Information Science, Tsinghua University, China. From 2023 to 2024, he has been conducting research at the School of Bioengineering, EPFL, focusing on AI-driven computational design. His research interests include optimization, self-supervised learning, and physics-informed machine learning, with a focus on tackling high-dimensional, nonlinear challenges in complex real-world systems. Relevant work has been published in journals including Science, Nature Communications, and Advanced Science, and has been covered by renowned science and technology media outlets including MIT Technology Review and Chemistry World.
Abstract:
Inferring optimal solutions from limited data is considered the ultimate goal in scientific discovery. Artificial intelligence offers a promising avenue to greatly accelerate this process. Existing methods often depend on large datasets, strong assumptions about objective functions, and classic machine learning techniques, restricting their effectiveness to low-dimensional or data-rich problems. We present a deep active learning pipeline that combines deep neural networks with a novel tree search to find superior solutions in high-dimensional complex problems with non-cumulative objectives and limited data availability. Our pipeline iteratively approaches the optimum using a neural surrogate and introduces new search mechanisms to bypass the local optimum and minimize the number of samples needed to achieve superior solutions. These contributions enable our pipeline to achieve superior solutions across diverse problems with up to 2,000 dimensions, whereas existing methods are limited to 100 dimensions and require up to 10 times more data points. Our pipeline demonstrates wide applicability, discovering superior solutions in various domain science problems. This advancement enables data-efficient knowledge discovery and opens the path towards scalable self-driving laboratories. Although we focus on problems within the realm of scientific domain, the advancements achieved herein are applicable to a broader spectrum of challenges across all quantitative disciplines.