|
ИСТИНА |
Войти в систему Регистрация |
ИСТИНА ПсковГУ |
||
This study explores application of data representativity enhancement using variational autoencoders (VAEs) to the inverse problem of Raman spectroscopy of multicomponent aqueous solutions of inorganic salts. By extending our earlier work on optical absorption spectroscopy to Raman scattering, we assess the transferability of VAE-based dataset expansion methods used to solve inverse problems in spectroscopy across spectroscopic techniques. The objective of the considered spectroscopic studies is to determine the concentrations of various ions in multi-component aqueous solutions based on spectral information. Unlike other spectroscopic techniques such as infrared or optical absorption spectroscopy, Raman spectroscopy provides more detailed information about the vibrational states of molecules, making it particularly sensitive to changes in the ionic composition of a solution. Raman spectra are high-dimensional, correlated, and nonlinearly dependent on the sample composition, what complicates their interpretation. To address this challenge, machine learning methods, particularly regression-based neural networks, can be employed. A critical factor influencing model is the representativity of the training dataset. We attempt to expand the training dataset by generating synthetic spectra using VAEs and investigate the potential of this approach to improve the representativity of the training data, which may in turn lead to a reduction in concentration determination errors when solving the inverse problem. While conditioned VAEs (cVAEs) offer a direct way to incorporate target analyte concentrations, we also examine alternative strategies using standard VAEs paired with auxiliary regression models to assign target concentrations to generated spectra. While further validation is needed, the considered approaches may provide a basis for developing synthetic data generation methods that could potentially better capture the physical characteristics of Raman spectra. This study has been performed at the expense of the grant of the Russian Science Foundation no. 24-11-00266.