Аннотация:Polycyclic aromatic hydrocarbons (PAHs) play a key role in the evolution of the interstellar medium, making modeling
of their observed infrared (IR) spectra crucial for astrochemical studies that require spectra for a vast number of individual
compounds. Density Functional Theory (DFT), traditionally used for such tasks, is extremely resource-intensive, especially
for large PAHs. This has led to the application of neural networks to accelerate the prediction of PAH IR spectra. Despite
the success of neural networks to bypass expensive DFT calculations and achieve high-quality predictions for numerous
molecules, this approach has excluded ionized molecules from training [1]. In this study, we focus on leveraging the significantly expanded PAHs spectral data to enable the prediction of IR spectra for the widest possible range of molecules, with a
particular emphasis on ionized species. To achieve this, we propose the implementation of two machine learning techniques
combined with capacious representations of molecular structure and ionization state for predicting the IR spectra of PAHs. Two models are introduced: a XGBoost model trained on Morgan fingerprints and a graph neural network (GNN) that employs
molecular graph representations. Charged molecules are treated by incorporating one-hot or learnable NN encoding to molecular representations. Both models demonstrate excellent predictive capabilities, for the first time enabling fast and accurate
prediction of charged PAHs IR spectra. While the XGBoost model demonstrates the highest accuracy achieved up to date, the
GNN shows significant promise for future advancements due to the inherent capabilities of molecular graph representations.
Remaining challenges, such as scarcity of data on heteroatomic PAHs, and potential approaches of addressing them are also
discussed in the manuscript. [1] P. Kov ́acs et al.,The Astrophysical Journal, 2020, 902(2). This works was supported by The
Russian Science Foundation (Project No. 23-13-00207)