|
ИСТИНА |
Войти в систему Регистрация |
ИСТИНА ПсковГУ |
||
This study investigates how supplementing experimental spectroscopic data with variational autoencoder (VAE) generated spectra affects the accuracy of quantitative analysis of ions in multi-component aqueous solutions. We consider the inverse problem of spectroscopy of multi-component aqueous solutions, aimed at determining the concentrations of various ions in the solutions based on their spectral data (Raman, IR or optical absorption spectra). Complex multi-component systems often exhibit strongly nonlinear and interdependent spectral responses that benefit from more sophisticated modeling approaches such as neural networks. However, their performance depends heavily on the representativity of available training data. We evaluate whether the inclusion of VAE generated spectra in training data can enhance the accuracy of concentration determination models. A systematic examination of various proportions between experimental and synthetic data is conducted to establish the optimal balance: insufficient synthetic data may fail to improve model generalization, while excessive synthetic content risks introducing spectral artifacts unrepresentative of true chemical systems. Several possible approaches to spectra generation are studied. Using a conditioned VAE (cVAE) trained on experimental data we are able to produce spectra from prescribed concentration sets; however, this method requires careful design of the concentration sampling strategy. Standard VAE generates unlabeled spectra, requiring subsequent determination of the target ion concentration, which can be done using an ML regression model trained on experimental data. Subsequently, generated patterns can be used in various ways along with experimental ones during training of regression neural networks solving the inverse problem. This research provides practical guidelines for extending spectral dataset with VAE generated patterns, offering a potential pathway to improve accuracy in applications where experimental data may be limited. The results will inform best practices leveraging modern computational techniques in spectroscopic analysis. The study was carried out at the expense of the grant No. 24-11-00266 from the Russian Science Foundation, https://rscf.ru/en/project/24-11-00266/.