Assessing deep and shallow learning methods for quantitative prediction of acute chemical toxicity.

No Thumbnail Available
Date
0000-00-00
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Animal based methods for assessing chemical toxicity are struggling to meet testing demands In silico approaches including machine learning methods are promising alternatives Recently deep neural networks DNNs were evaluated and reported to outperform other machine learning methods for quantitative structure activity relationship modeling of molecular properties However most of the reported performance evaluations relied on global performance metrics such as the root mean squared error RMSE between the predicted and experimental values of all samples without considering the impact of sample distribution across the activity spectrum Here we carried out an in depth analysis of DNN performance for quantitative prediction of acute chemical toxicity using several datasets We found that the overall performance of DNN models on datasets of up to 30 000 compounds was similar to that of random forest RF models as measured by the RMSE and correlation coefficients between the predicted and experimental results However our detailed analyses demonstrated that global performance metrics are inappropriate for datasets with a highly uneven sample distribution because they show a strong bias for the most populous compounds along the toxicity spectrum For highly toxic compounds DNN and RF models trained on all samples performed much worse than the global performance metrics indicated Surprisingly our variable nearest neighbor method which utilizes only structurally similar compounds to make predictions performed reasonably well suggesting that information of close near neighbors in the training sets is a key determinant of acute toxicity predictions
Description
Keywords
Citation
Collections