Verónica Becher

Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome

2009, Bioinformatics 25 (14), 1746-1753, 2009
Citas: 38
Agregar PDF Importar citas Importar citas SCRAPME Plots Conexiones

Autor(es)

Verónica Becher and Alejandro Deymonnaz and Pablo Heiber

Abstract

Motivation: There is a significant ongoing research to identify the number and types of repetitive DNA sequences. As more genomes are sequenced, efficiency and scalability in computational tools become mandatory. Existing tools fail to find distant repeats because they cannot accommodate whole chromosomes, but segments. Also, a quantitative framework for repetitive elements inside a genome or across genomes is still missing. Results: We present a new efficient algorithm and its implementation as a software tool to compute all perfect repeats in inputs of up to 500 million nucleotide bases, possibly containing many genomes. Our algorithm is based on a suffix array construction and a novel procedure to extract all perfect repeats in the entire input, that can be arbitrarily distant, and with no bound on the repeat length. We tested the software on the Homo sapiens DNA genome NCBI 36.49 …

Plot de citas