Skip to content

Repeat Identification and Annotation

We have created a pipeline to identify and annotate DNA repeats in mammalian genomes, using two pre-exisiting tools (PALS/PILER and RepeatScout) which had previously not been used on an entire mammalian genome.

The pipeline breaks up the genome into manageable chunks to run PALS in a parallelized fashion on a computer cluster. The chunks are then concatenated at the chromosome level and used as input for PILER, generating clustered, consensus sequences for repeats on each chromosome. RepeatScout was run on individual chromosomes and its output converted to make it compatible with PILER output. To identify redundancy across chromosomes, consensus sequences and RepeatScout output were aligned to each other using WUBLAST. Redundancy was minimized by clustering the consensus sequences along with the RepeatScout output on the basis of the WUBLAST output to generate globally alignable non-redundant consensus sequences. In this fashion we have identified many previously known repeats and a number of heretofore unknown repeats present at both low and high copy number.

equine repeat correlation heat mapBy analysing interspersed repeat data we have found underlying correlations with respect to repeat numbers/insertions in mammalian genomes. An example of this type of correlation analysis is shown in the figure on the right:

Honours research projects are available. Contact Prof. David Adelson if you are interested.

Centre for Bioinformatics and Computational Genetics
Address

Room 2.10, The Braggs Building
The University of Adelaide

SA 5005
AUSTRALIA
North Terrace Campus

Contact

Prof. David Adelson
Room 2.10, The Braggs Building
T: +61 8 8313 7555
F: +61 8 8313 4362
email