Machine Learning in Chemistry: A Data Centred, Hands-on Introductory Machine Learning Course for Undergraduate Students

ai-ml
jupyter
python
website

Machine learning (ML) is rapidly reshaping the chemical sciences, with applications spanning molecular property prediction, chemical reaction design, molecular structure generation, and other data-driven discovery. With the growing integration of ML into chemical research, undergraduate chemistry students increasingly need training that bridges traditional chemical education with ML methods. Here we present Machine Learning in Chemistry (MLChem), an undergraduate-level course designed with a chemistry-first perspective to lower barriers to entry into ML while maintaining disciplinary relevance. This course introduces fundamental ML algorithms using chemical datasets, such as the small molecule solubility dataset, and the peptide activity dataset. It progresses from traditional ML algorithms to neural networks, complemented by advanced modules on emerging topics such as reinforcement learning for retrosynthesis, ML-based force fields, deep learning for the predictions of protein structure and dynamics. By combining chemical context with hands-on coding and exposure to frontier applications, MLChem equips undergraduate chemistry students with both conceptual foundations and practical skills, preparing them to participate in ML-driven chemical research.

Reference

Mingyi Xue, Bojun Liu, and Xuhui Huang, ChemRxiv, DOI: 10.26434/chemrxiv-2025-9zldf

MLCHEM Documentation: https://xuhuihuang.github.io/mlchem/html/