Merging applicability domains for in silico assessment of chemical mutagenicity.

Liu, Ruifeng; Wallqvist, Anders

Merging applicability domains for in silico assessment of chemical mutagenicity.

Date

2014-03-24

Authors

Liu, Ruifeng

Wallqvist, Anders

Abstract

Using a benchmark Ames mutagenicity data set we evaluated the performance of molecular fingerprints as descriptors for developing quantitative structure activity relationship QSAR models and defining applicability domains with two machine learning methods random forest RF and variable nearest neighbor v NN The two methods focus on complementary aspects of chemical mutagenicity and use different characteristics of the molecular fingerprints to achieve high levels of prediction accuracies Thus while RF flags mutagenic compounds using the presence or absence of small molecular fragments akin to structural alerts the v NN method uses molecular structural similarity as measured by fingerprint based Tanimoto distances between molecules We showed that the extended connectivity fingerprints could intuitively be used to define and quantify an applicability domain for either method The importance of using applicability domains in QSAR modeling cannot be understated compounds that are outside the applicability domain do not have any close representative in the training set and therefore we cannot make reliable predictions Using either approach we developed highly robust models that rival the performance of a state of the art proprietary software package Importantly based on the complementary approach used by the methods we showed that by combining the model predictions we raised the applicability domain from roughly 80 to 90 These results indicated that the proposed QSAR protocol constituted a highly robust chemical mutagenicity prediction model