Journal of Physical Chemistry C

, volume 128 , issue 50 , pages 21349-21367

A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

Parastoo Semnani ^{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

Mihail Bogojeski ^{1, 2, 4, 5, 6, 8, 9, 10}

Florian Bley ^{1, 2, 4, 5, 6, 8, 9, 10}

Zizheng Zhang ^{12, 13}

Qiong Wu ^{12, 13}

Thomas Kneib ^{12, 13}

Jan Herrmann ¹⁴

Christoph Weisser ¹⁴

Florina Patcas ^{14, 15, 16}

Klause Muller ^{1, 2, 4, 5, 6, 8, 9, 10, 17, 18, 19, 20, 21, 22, 23, 24}

Hide authors affiliations Show authors affiliations: 24 affiliations

Machine Learning Group, TU Berlin, Berlin 10587, Germany |

Berlin Institute for the Foundations of Learning and Data, Berlin 10587, Germany |

BASLEARN−TU Berlin/BASF Joint Lab for Machine Learning, TU Berlin, Berlin 10587, Germany |

⁴

Machine Learning Group

⁵

TU Berlin |

⁶

Berlin Institute for the Foundations of Learning and Data

⁷

BASLEARN−TU Berlin/BASF Joint Lab for Machine Learning |

⁸

Machine Learning Group, Berlin, Germany |

⁹

TU Berlin, Berlin, Germany |

¹⁰

Berlin Institute for the Foundations of Learning and Data, Berlin, Germany |

¹¹

BASLEARN−TU Berlin/BASF Joint Lab for Machine Learning, Berlin, Germany |

¹²

Chair of Statistics and Campus Institute Data Science, Göttingen, Germany |

¹³

Georg-August-University Göttingen, Göttingen, Germany |

¹⁴

BASF SE, Ludwigshafen, Germany |

¹⁵

BASF SE, Ludwigshafen 67056, Germany |

¹⁶

BASF SE

¹⁷

Max Planck Institute for Informatics, Saarbrücken 66123, Germany |

¹⁸

Department of Artificial Intelligence, Korea University, Seoul 02841, South Korea |

¹⁹

Max Planck Institute for Informatics |

²⁰

Department of Artificial Intelligence

²¹

Korea University |

²²

Max Planck Institute for Informatics, Saarbrücken, Germany |

²³

Department of Artificial Intelligence, Seoul, South Korea

²⁴

Korea University, Seoul, South Korea |

Publication type: Journal Article

Publication date: 2024-12-06

American Chemical Society (ACS)

Journal of Physical Chemistry C

scimago Q1

wos Q3

SJR: 0.914

CiteScore: 6.2

Impact factor: 3.2

ISSN: 19327447, 19327455

DOI: 10.1021/acs.jpcc.4c05332

Copy DOI

Abstract

The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.

Found

Top-30

Journals

	1
Journal of Analytical and Applied Pyrolysis	Journal of Analytical and Applied Pyrolysis, 1, 16.67% Journal of Analytical and Applied Pyrolysis 1 publication, 16.67%
Electrocatalysis	Electrocatalysis, 1, 16.67% Electrocatalysis 1 publication, 16.67%
Journal of Controlled Release	Journal of Controlled Release, 1, 16.67% Journal of Controlled Release 1 publication, 16.67%
International Journal of Hydrogen Energy	International Journal of Hydrogen Energy, 1, 16.67% International Journal of Hydrogen Energy 1 publication, 16.67%
Journal of Power Sources	Journal of Power Sources, 1, 16.67% Journal of Power Sources 1 publication, 16.67%
Chemical Reviews	Chemical Reviews, 1, 16.67% Chemical Reviews 1 publication, 16.67%
	1

Publishers

	1 2 3 4
Elsevier	Elsevier, 4, 66.67% Elsevier 4 publications, 66.67%
Springer Nature	Springer Nature, 1, 16.67% Springer Nature 1 publication, 16.67%
American Chemical Society (ACS)	American Chemical Society (ACS), 1, 16.67% American Chemical Society (ACS) 1 publication, 16.67%
	1 2 3 4

We do not take into account publications without a DOI.
Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST |

Cite this

GOST Copy

Semnani P. et al. A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery // Journal of Physical Chemistry C. 2024. Vol. 128. No. 50. pp. 21349-21367.

GOST all authors (up to 50) Copy

Semnani P. et al. A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery // Journal of Physical Chemistry C. 2024. Vol. 128. No. 50. pp. 21349-21367.

RIS |

Cite this

RIS Copy

TY - JOUR

DO - 10.1021/acs.jpcc.4c05332

UR - https://pubs.acs.org/doi/10.1021/acs.jpcc.4c05332

TI - A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

T2 - Journal of Physical Chemistry C

AU - Semnani, Parastoo

AU - Bogojeski, Mihail

AU - Bley, Florian

AU - Zhang, Zizheng

AU - Wu, Qiong

AU - Kneib, Thomas

AU - Herrmann, Jan

AU - Weisser, Christoph

AU - Patcas, Florina

AU - Muller, Klause

PY - 2024

DA - 2024/12/06

PB - American Chemical Society (ACS)

SP - 21349-21367

IS - 50

VL - 128

SN - 1932-7447

SN - 1932-7455

ER -

BibTex |

Cite this

BibTex (up to 50 authors) Copy

@article{2024_Semnani,

author = {Parastoo Semnani and Mihail Bogojeski and Florian Bley and Zizheng Zhang and Qiong Wu and Thomas Kneib and Jan Herrmann and Christoph Weisser and Florina Patcas and Klause Muller and others},

title = {A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery},

journal = {Journal of Physical Chemistry C},

year = {2024},

volume = {128},

publisher = {American Chemical Society (ACS)},

month = {dec},

url = {https://pubs.acs.org/doi/10.1021/acs.jpcc.4c05332},

number = {50},

pages = {21349--21367},

doi = {10.1021/acs.jpcc.4c05332}

}

MLA

Cite this

MLA Copy

Semnani, Parastoo, et al. “A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery.” Journal of Physical Chemistry C, vol. 128, no. 50, Dec. 2024, pp. 21349-21367. https://pubs.acs.org/doi/10.1021/acs.jpcc.4c05332.

Publisher

American Chemical Society (ACS)

Journal

Journal of Physical Chemistry C

scimago Q1

wos Q3

SJR

0.914

CiteScore

6.2

Impact factor

3.2

ISSN

19327447 (Print)

19327455 (Electronic)