Generalized transition graph

2/20/2023

The most well-known (and largest) related species are the human, mouse and rat genomes, soon to be joined by the chimpanzee and other mammals. Fortunately, for a growing number of species, genome centers are now generating two or more sequences from closely related organisms. This has prompted some researchers in the field to investigate the use of homology evidence to improve the accuracy of current gene-finding technology. Unfortunately, ab initio methods of gene prediction are far from perfect. We suggest possible ways of relaxing those assumptions to improve the utility of the system without sacrificing efficiency beyond what is practical.Īvailability: Available at under the open-source Artistic License.Ĭontact: the amount of genomic sequence available in public archives skyrockets, our reliance on purely or largely automatic methods of identifying genes in unannotated sequence will likely also increase. We describe the implementation of this GPHMM and we explicitly address the assumptions and limitations of the system.

Results: We have developed an open-source GPHMM gene finder, TWAIN, which performs very well on two related Aspergillus species, A.fumigatus and A.nidulans, finding 89% of the exons and predicting 74% of the gene models exactly correctly in a test set of 147 conserved gene pairs. However, all GPHMM implementations currently available are either closed-source or the details of their operation are not fully described in the literature, leaving a significant hurdle for others wishing to advance the state of the art in GPHMM design. Generalized pair hidden Markov models (GPHMMs) have been proposed as one means to address this need.

Motivation: The increased availability of genome sequences of closely related organisms has generated much interest in utilizing homology to improve the accuracy of gene prediction programs.

0 Comments

Generalized transition graph

Leave a Reply.

Author

Archives

Categories