|
|
| Research article summary (published 17 Oct 2006): |
|
Free Full Text! See links below |
Towards the identification of essential genes using targeted genome sequencing and comparative analysis.
Full Abstract
BACKGROUND:
The identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in the identification of potential drug targets in novel pathogens. With the great time and expense required for experimental studies aimed at constructing a catalog of essential genes in a given organism, a computational approach which could identify essential genes with high accuracy would be of great value.
RESULTS:
We gathered numerous features which could be generated automatically from genome sequence data and assessed their relationship to essentiality, and subsequently utilized machine learning to construct an integrated classifier of essential genes in both S. cerevisiae and E. coli. When looking at single features, phyletic retention, a measure of the number of organisms an ortholog is present in, was the most predictive of essentiality. Furthermore, during construction of our phyletic retention feature we for the first time explored the evolutionary relationship among the set of organisms in which the presence of a gene is most predictive of essentiality. We found that in both E. coli and S. cerevisiae the optimal sets always contain host-associated organisms with small genomes which are closely related to the reference. Using five optimally selected organisms, we were able to improve predictive accuracy as compared to using all available sequenced organisms. We hypothesize the predictive power of these genomes is a consequence of the process of reductive evolution, by which many parasites and symbionts evolved their gene content. In addition, essentiality is measured in rich media, a condition which resembles the environments of these organisms in their hosts where many nutrients are provided. Finally, we demonstrate that integration of our most highly predictive features using a probabilistic classifier resulted in accuracies surpassing any individual feature.
CONCLUSION:
Using features obtainable directly from sequence data, we were able to construct a classifier which can predict essential genes with high accuracy. Furthermore, our analysis of the set of genomes in which the presence of a gene is most predictive of essentiality may suggest ways in which targeted sequencing can be used in the identification of essential genes. In summary, the methods presented here can aid in the reduction of time and money invested in essential gene identification by targeting those genes for experimentation which are predicted as being essential with a high probability.
Learn Faster Today Improve your study skills
Author information
Author/s: Gustafson, Adam M (AM); Snitkin, Evan S (ES); Parker, Stephen C J (SC); DeLisi, Charles (C); Kasif, Simon (S);
Affiliation: Bioinformatics Graduate Program, Boston University, Boston, MA 02215 USA. gustafad(-atsign-)bu.edu
Grants: 1P20GM066401 (Agency:NIGMS NIH HHS) ; 1T32GM070409 (Agency:NIGMS NIH HHS) ; R01 HG003367-01A1 (Agency:NHGRI NIH HHS)
Journal and publication information
Publication Type: Comparative Study; Journal Article; Research Support, N.I.H., Extramural; Research Support, U.S. Gov't, Non-P.H.S.
Journal: BMC genomics (BMC Genomics), published in England. (Language: eng)
Reference: 2006-; vol 7 (issue ) : pp 265
Dates: Created 2006/10/26; Completed 2006/11/20; Revised 2008/11/20;
PMID: 17052348, status: MEDLINE (last retrieval date: 12/26/2008)
Sourced from the National Library of Medicine. Abstract text and other information may be subject to copyright.
External Links for this article (including full text providers, if available):
Click Electronic Full-text Provider Links to see options for finding the electronic full text links to this article. Note there may be a subscription or fee required for access to the full text. See our FAQ for information on finding FREE full text articles.
This article may also be located in paper journal collections available in many libraries. Use the Journal and Publication Information above to find the full article.
MeSH headings (categories)
This article was linked to the MESH Headings shown below.
|
|
Related articles
These are the highest related articles currently in the database:
- Replication stalling at unstable inverted repeats: interplay between DNA hairpins and fork stabilizing proteins.
13 Jul 2008 - RecA-mediated SOS induction requires an extended filament conformation but no ATP hydrolysis.
2 Jul 2008 - ruvA Mutants that resolve Holliday junctions but do not reverse replication forks.
5 Mar 2008 - Characterization of a yjjQ mutant of avian pathogenic Escherichia coli (APEC).
30 Mar 2008 - Sequence of conjugative plasmid pIP1206 mediating resistance to aminoglycosides by 16S rRNA methylation and to hydrophilic fluoroquinolones by efflux.
3 May 2008 - Specific roles of the iroBCDEN genes in virulence of an avian pathogenic Escherichia coli O78 strain and in production of salmochelins.
7 Jun 2008 - A synthetic mammalian gene circuit reveals antituberculosis compounds.
7 Jul 2008 - ATP-induced shrinkage of DNA with MukB protein and the MukBEF complex of Escherichia coli.
5 Mar 2008 - The CTX-M-15-producing Escherichia coli diffusing clone belongs to a highly virulent B2 phylogenetic subgroup.
9 Mar 2008 - Activation of glucose transport under oxidative stress in Escherichia coli.
25 Mar 2008
Related Article Map
Legend:
- FREE Full text Article.
- Abstract only.
- Title only. More help.
See a large map of 100+ related articles.