![]() |
Center for Bioinformatics & |
|
AlterORFEach gene has 6 potential reading frames, three on the forward DNA strand and three on the reverse strand (figure). Usually only one of the reading frames is translated into a protein because it is associated with a ribosome binding site (RBS), a start codon (usually ATG in E. coli) and has an open reading frame (ORF) that is terminated by one of the three stop codons. However, extensive ORFs (i.e. potentially encoding at least 100 amino acids) can occur in alternate reading frames, although they are generally not associated with properly positioned RBSs and are probably not translated. These alternate ORFs are surprisingly common, especially in high G+C genomes (e.g. 70% G+C) (read more). Computational programs that automatically annotate genomes can usually identify the correct ORF because it is more likely to obey the average codon usage of the genome in question and it usually exhibits other identifiable characteristics of a gene such as a predicted RBS and appropriate dinucleotide distributions. Also its computationally translated product may have a significant BLAST hit with a known protein. However, at times, alternate ORFs can be misannotated as real genes, especially those in frame -1 because this frame exhibits some computationally identifiable features that are similar to the real gene such as codon usage, amino acid content and percentage of predicted a-helix and (read more). Using AlterORF as a tool to depurate potential genome annotation errors. Some of the alternate ORFs predicted to be genes by automatic annotation programs are subsequently culled by human curators; however, many escape even expert curation. AlterORF provides a database of such potentially mis-annotated ORFs. It has warehoused all alternate ORFs in fully sequenced microbial genomes that have significant hits with one or more of the protein features decribed in CDD, COG, KOG, PFAM, PRK and SMART in which the corresponding annotated genes that have been deposited in the databases have no such characteristics. In such instances, it is suggested that the alternate ORF rather than the annotated gene may be the real gene (example). It is hoped that the AlterORF database will provide a platform of such potential errors that can be reviewed by expert curators to determine if the existing annotation should be revised. If the database is successful, these potential mis-annotations should be corrected and any such changes will be tracked in future updates as a metric of the success of AlterORF.Using AlterORF as a tool to find new genes. Some of the alternate ORFs that have significant protein features (CDD, COG, etc.) are associated with annotated genes that also have significant protein features, making it moot which is the real gene. Some of these instances are potentially dual function genes in which both ORFs may be expressed (e.g. ref). AlterORF identifies these possibilities for further computational and experimental investigation.Potential role of alternate ORFs in the generation of new proteins. A particularly exciting direction that can be explored using AlterORF is the search for genes that may have arisen by the capture of alternate ORFs. This could occur if an alternative ORF gains signals for its transcription and translation. It could also occur by recombination or transposition that places an alternate ORF, or a portion of one, inside an existing gene (figure). This would allow the generation of new folds or domains within an existing protein and will leave a molecular fossil of the original gene that travelled with the alternate ORF.An inspection of all the proteins that can be generated computationally from all alternate ORFs shows that frame -1 alternate ORFs have amino acid compositions and predicted a-helix contents that are similar to real genes and so might be expected to fold correctly and escape degradation by proteosomes. Therefore, if they were to be expressed, as outlined above, they might be expected to survive long enough to be subjected to natural selection. On the other hand, the other four alternate frames (+2, +3, -2 and -3) potentially generate proteins that have unusual amino acid compositions and a-helix contents that might promote incorrect folding and thus serve as targets for proteolytic destruction by proteosomes . It is widely accepted that novel genetic information can be generated by gene duplication followed by divergence of the copies via mutation. However, the original sequence delimits the way in which the copies can subsequently mutate and restricts the evolutionary space that can be explored. In contrast, “captured” alternate ORFs represent a novel source of genetic information that has not been previously subjected to direct selection at the amino acid level - although, of course, they are linked in sequence to the real gene that is subjected to such pressures. Therefore, alternate ORFs can be considered as a reservoir of novel genetic information that may play an important role in gene evolution and the AlterORF database is a useful repository of potential examples of such events. CC - [Gustavo R. 2008] - [ Last update 5/03/2009 ] |
||||||||