A Convolutional Neural Network-based gradient boosting framework for prediction of the band gap of photo-active catalysts

Received: 30 Jan 2023, Revised: 20 March 2023, Accepted: 24 April 2023, Available online: 14 Apr 2023, Version of Record: 30 June 2023

Avan Kumar (1), Sreedevi Upadhyayula (2), Hariprasad Kodamana (3)

(1)_Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, India

(2)_Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, 110016, India

(3)_Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, India

Abstract


A recent trend in chemical synthesis is photo-catalysis, which uses photo-active catalyst materials that are semiconductor materials. A well-known electronic property of semiconducting materials is the band gap. A photo-catalyst’s desired band gap range is between 1.5 eV and 6.2 eV. A rational design and synthesis of photo-active catalysts require knowledge of the band gap as an initial screening parameter. Herein, we propose an integrated deep learning-based framework to classify the photo-active catalysts and predict their band gap using compositional features. To this extent, we have utilized the dataset extracted from the “catalyst hub” site by web scraping with the help of a Python script. Extensive data cleaning and pre-processing are done to make input data amenable for training the models. Also, more valuable features are made using two methods: (a) one hot-encoding and (b) calculating the mean of the embeddings of catalysts computed by Mat2Vec, a pre-trained transformer-based model. With the help of this generated feature set, we have proposed a two-stage deep-learning framework for classification and regression tasks. In the first stage, a 2D-Convolutional Neural Net (CNN)-based classifier is used to classify whether a catalyst belongs to the photo-active catalyst class. After the first stage screening, in the second stage, we use a 1D-VGG-based gradient boosting framework to predict the band gap of the photo-active catalyst only using compositional features as inputs. 2D-CNN for the classification task has an accuracy of 0.903 and 0.886 for the train and test datasets, respectively. Further, the proposed integrated model that uses 1D-Convolutional layers of VGG followed by the XGBoostRegressor has a test R2 of 0.750, much higher than baseline models reported in the literature.

Keywords: Photo-active catalyst, Band gap, Deep learning models, CNN, VGG
Gradient boosting, Feature engineering



Description



   

Indexed in scopus

https://www.scopus.com/authid/detail.uri?authorId=57209842615
      

Article metrics

10.31763/DSJ.v5i1.1674 Abstract views : | PDF views :

   

Cite

   

Full Text

Download

Conflict of interest


“Authors state no conflict of interest”


Funding Information


This research received no external funding or grants


Peer review:


Peer review under responsibility of Defence Science Journal


Ethics approval:


Not applicable.


Consent for publication:


Not applicable.


Acknowledgements:


None.