La descarga está en progreso. Por favor, espere

La descarga está en progreso. Por favor, espere

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat.

Presentaciones similares


Presentación del tema: "Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat."— Transcripción de la presentación:

1 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat. Diego Mauricio Riaño Pachón Brazilian Bioethanol Science and Technology Laboratory (CTBE) Brazilian Center for Research in energy and Materials (CNPEM) diego.riano@bioetanol.org.br http://bce.bioetanol.cnpem.br

2 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 2 Mapping/Aligning reads to a reference: What for? RNASeq Peaks: ChIPSeq

3 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 3 Mapping/Aligning reads to a reference Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/ Why not using BLAST? You all know that one, right. It is a widely, ubiquitous tool, for alignment. It was designed for longer sequences You could but, It is too slow. You either need a very large computational infrastructure or have time to spare waiting for results

4 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Many short read mappers available, ca 2012 4 Fonseca et al., 2012. http://bioinformatics.oxfordjournals.org/content/28/24/3169.full.pdf+html

5 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 5 Mapping/Aligning reads to a reference: Speed, ca. 2009 The alternative is to use an index in memory (RAM) that allow fast searches. There are several ways to do this, some are more efficient than others, i.e., need less memory or take less time to do the same.

6 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 6 Mapping/Aligning reads to a reference: Speed, ca. 2009 Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/

7 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 7 Bowtie: How is that achieved? Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Bowtie represents the target (genome) using the Burrows-Wheeler transform, a technique borrowed from compressing files (bzip) and thus can store a complete human genome in less than 2Gb of RAM. (This is a toy representation, as the actual implementation uses further data structures to make it more efficient) http://www.cs.jhu.edu/~langmea/resources/bwt_fm.pdf This type of strategy can allow for up to a few mismatches, without loosing too much efficiency. Why is it important to allow mismatches?

8 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 8 Bowtie: Burrows-Wheeler Transform Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/ Genome = acaacg Add terminator character acaacg$ caacg$a aacg$ac acg$aca cg$acaa g$acaac $acaacg Make all cyclic rotations $acaacg aacg$ac acaacg$ acg$aca caacg$a cg$acaa g$acaac Lexicographic Sort, $ goes first The Burrows- Wheeler Transform is the rightmost column: gc$aaac BW Matrix Keep the first and last columns from the BW matrix. You can get your original string from the BW transform, so this is a reversible process. http://www.cs.jhu.edu/~langmea/resources/bwt_fm.pdf

9 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 9 The problem with spliced-reads Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Kim et al., 2013. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdf

10 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 10 The problem with spliced-reads Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Kim et al., 2013. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdf

11 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 11 Sort order Reference sequences and length Program info

12 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 12 Fields: https://samtools.github.io/hts-specs/SAMv1.pdf

13 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 13 Flags: https://samtools.github.io/hts-specs/SAMv1.pdf

14 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 14 Flags: What is the meaning of the following flags: 65 129 81 161 77 You can use: https://broadinstitute.github.io/picard/explain-flags.htmlhttps://broadinstitute.github.io/picard/explain-flags.html

15 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Digital normalization Dr. rer. nat. Diego Mauricio Riaño Pachón Brazilian Bioethanol Science and Technology Laboratory (CTBE) Brazilian Center for Research in energy and Materials (CNPEM) diego.riano@bioetanol.org.br http://bce.bioetanol.cnpem.br

16 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Too much data? 16 1.Imagine that you have too much data for your assembly. That could be too much for a single isolate – Or, imagine a community with different organism abundances (More difficult) 2.This can create some problems, could you enumerate a few? 3.How to remove un-informative/extra data?

17 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 17 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar Each single base error generates ~k new k-mers Generally, erroneous k-mers, appear only once Digital normalization basics: Errors create new k-mers

18 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 K-mer abundance plots have true and false k-mers 18 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

19 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 K-mer abundance plots 19 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

20 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 What does digital normalization do? 20 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

21 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Digital normalization: The procedure 21 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar If next read is from a high covered region - discard

22 Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Digital normalization: The procedure 22 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar


Descargar ppt "Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat."

Presentaciones similares


Anuncios Google