Methods for

 Computational Gene Prediction


Slides : PDF / Powerpoint

IMPORTANT: You will need some extra fonts if you download the powerpoint slides.

Overview of gene prediction
HMM's (part I)
HMM's (part II)
Feature Sensing
Comparative gene prediction
SCFG's for noncoding gene prediction
Machine learning
Conditional Random Fields (CRF's) for gene prediction
Multiple Sequence Alignment
Higher-order PhyloHMM's

coming soon

coming soon

Addional exercises

[coming soon]

click here to suggest additional exercises

Data sets

Synthetic data:

G. simplicans data from chapter 5:

NOTE TO INSTRUCTORS:  You can generate your own synthetic data using this script.  It will generate separate training and test sets using the same codon frequencies, signal weight matrices, and GC% (these biases are randomly generate anew at each run of the program).  Exon, intron, and intergenic length distributions will be similar to those for the data sets used in the book (G. simplicans, above).

Real data:

FASTA and GFF files from various organisms (human, mouse, mosquito, rice, and others) can be found here