|
|
| Research article summary (published 13 Jul 2005): |
|
Free Full Text! See links below |
Automating genomic data mining via a sequence-based matrix format and associative rule set.
Full Abstract
There is an enormous amount of information encoded in each genome--enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands.
Author information
Author/s: Wren, Jonathan D (JD); Johnson, David (D); Gruenwald, Le (L);
Affiliation: Advanced Center for Genome Technology, Department of Botany and Microbiology, 101 David L, Boren Blvd, Rm 2025. Jonathan.Wren(-atsign-)OU.edu
Journal and publication information
Publication Type: Journal Article; Research Support, U.S. Gov't, Non-P.H.S.
Journal: BMC bioinformatics (BMC Bioinformatics), published in England. (Language: eng)
Reference: 2005-Jul; vol 6 Suppl 2 (issue ) : pp S2
Dates: Created 2005/07/19; Completed 2006/01/19; Revised 2008/11/20;
PMID: 16026599, status: MEDLINE (last retrieval date: 2/18/2009, IMS Date: )
Sourced from the National Library of Medicine. Abstract text and other information may be subject to copyright.
External Links for this article
(including full text providers, if available):
Click Electronic Full-text Provider Links to see options for finding the electronic full text links to this article. Note there may be a subscription or fee required for access to the full text. See our FAQ for information on finding FREE full text articles.
This article may also be located in paper journal collections available in many libraries. Use the Journal and Publication Information above to find the full article.
MeSH headings (categories)
This article was linked to the MESH Headings shown below.
Related articles
These are the highest related articles currently in the database:
- Automatically annotating documents with normalized gene lists.
22 May 2005 - Evaluation of BioCreAtIvE assessment of task 2.
22 May 2005 - Automated system for gene annotation and metabolic pathway reconstruction using general sequence databases.
30 Oct 2007 - XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences.
9 Oct 2007 - Conformational analysis of alternative protein structures.
10 Oct 2007 - Vector-G: multi-modular SVM-based heterotrimeric G protein prediction.
30 Dec 2007 - A fast SEQUEST cross correlation algorithm.
4 Sep 2008 - Validation of protein models by a neural network approach.
27 Jan 2008 - Analysis and identification of beta-turn types using multinomial logistic regression and artificial neural network.
26 Jun 2007 - Sequence variation in ligand binding sites in proteins.
28 Sep 2005
Related Article Map
Legend:
- FREE Full text Article.
- Abstract only.
- Title only. More help.
See a large map of 100+ related articles.