ИСТИНА |
Войти в систему Регистрация |
|
ИСТИНА ПсковГУ |
||
The next era of cheminformatics and pharmaceutical research in general is focused on mining the heterogeneous big data, which is accumulating at ever growing pace, and this will likely use more sophisticated algorithms such as deep learning. There has been increasing use of deep learning which has shown powerful advantages in learning from images and languages as well as many other areas. However the accessibly of this technique for cheminformatics is hindered as it is not available readily to non-experts, it is currently not in any of the major cheminformatics tools. It is therefore our goal to develop a deep learning algorithm and toolkit which can be used as a standalone or integrated in new software being developed by us such as the Open Science Data Repository (OSDR). We will show how classic machine learning (CML) methods (Naïve Bayes, logistic regression, Support Vector Machines etc.) compares to cutting edge deep learning and talk about challenges associated with deep neural networks (DNN) learning models. The open source Scikit-learn (http://scikit-learn.org/stable/) ML python library was used for building, tuning, and validating all CML models. The DNN learning models of different complexity (up to 6 hidden layers) were built and tuned (different number of hidden units per layer, multiple activation functions, optimizers, drop out fraction, regularization parameters, and learning rate) using Keras (https://keras.io/), a deep learning library, and Tensorflow (www.tensorflow.org) as a backend. All the developed pipelines consist of stratified splitting of the input dataset into train (80%) and test (20%) datasets. The receiver operating characteristic (ROC) curve and the area under the curve (AUC) were computed for each model for ADME/Tox and other physicochemical properties. DNN learning models were found to be very good in predicting activities and can outperform most of the CML models.