Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases.

No Thumbnail Available
Date
2008-12-24
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this article we present a new method termed CatFam Catalytic Families to automatically infer the functions of catalytic proteins which account for 20 40 of all proteins in living organisms and play a critical role in a variety of biological processes CatFam is a sequence based method that generates sequence profiles to represent and infer protein catalytic functions CatFam generates profiles through a stepwise procedure that carefully controls profile quality and employs nonenzymes as negative samples to establish profile specific thresholds associated with a predefined nominal false positive rate FPR of predictions The adjustable FPR allows for fine precision control of each profile and enables the generation of profile databases that meet different needs function annotation with high precision and hypothesis generation with moderate precision but better recall Multiple tests of CatFam databases generated with distinct nominal FPRs against enzyme and nonenzyme datasets show that the method s predictions have consistently high precision and recall For example a 1 FPR database predicts protein catalytic functions for a dataset of enzymes and nonenzymes with 98 6 precision and 95 0 recall Comparisons of CatFam databases against other established profile based methods for the functional annotation of 13 bacterial genomes indicate that CatFam consistently achieves higher precision and in most cases higher recall and that on average CatFam provides 21 9 additional catalytic functions not inferred by the other similarly reliable methods These results strongly suggest that the proposed method provides a valuable contribution to the automated prediction of protein catalytic functions The CatFam databases and the database search program are freely available at http www bhsai org downloads catfam tar gz
Description
Keywords
Citation
Collections