Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat.

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat. Diego Mauricio Riaño Pachón Brazilian Bioethanol Science and Technology Laboratory (CTBE) Brazilian Center for Research in energy and Materials (CNPEM) diego.riano@bioetanol.org.br http://bce.bioetanol.cnpem.br

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 2 Mapping/Aligning reads to a reference: What for? RNASeq Peaks: ChIPSeq

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 3 Mapping/Aligning reads to a reference Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/ Why not using BLAST? You all know that one, right. It is a widely, ubiquitous tool, for alignment. It was designed for longer sequences You could but, It is too slow. You either need a very large computational infrastructure or have time to spare waiting for results

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Many short read mappers available, ca 2012 4 Fonseca et al., 2012. http://bioinformatics.oxfordjournals.org/content/28/24/3169.full.pdf+html

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 5 Mapping/Aligning reads to a reference: Speed, ca. 2009 The alternative is to use an index in memory (RAM) that allow fast searches. There are several ways to do this, some are more efficient than others, i.e., need less memory or take less time to do the same.

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 6 Mapping/Aligning reads to a reference: Speed, ca. 2009 Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 7 Bowtie: How is that achieved? Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Bowtie represents the target (genome) using the Burrows-Wheeler transform, a technique borrowed from compressing files (bzip) and thus can store a complete human genome in less than 2Gb of RAM. (This is a toy representation, as the actual implementation uses further data structures to make it more efficient) http://www.cs.jhu.edu/~langmea/resources/bwt_fm.pdf This type of strategy can allow for up to a few mismatches, without loosing too much efficiency. Why is it important to allow mismatches?

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 8 Bowtie: Burrows-Wheeler Transform Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/ Genome = acaacg Add terminator character acaacg$ caacg$a aacg$ac acg$aca cg$acaa g$acaac $acaacg Make all cyclic rotations $acaacg aacg$ac acaacg$ acg$aca caacg$a cg$acaa g$acaac Lexicographic Sort, $ goes first The Burrows- Wheeler Transform is the rightmost column: gc$aaac BW Matrix Keep the first and last columns from the BW matrix. You can get your original string from the BW transform, so this is a reversible process. http://www.cs.jhu.edu/~langmea/resources/bwt_fm.pdf

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 9 The problem with spliced-reads Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Kim et al., 2013. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdf

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 10 The problem with spliced-reads Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Kim et al., 2013. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdf

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 11 Sort order Reference sequences and length Program info

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 12 Fields: https://samtools.github.io/hts-specs/SAMv1.pdf

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 13 Flags: https://samtools.github.io/hts-specs/SAMv1.pdf

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 14 Flags: What is the meaning of the following flags: 65 129 81 161 77 You can use: https://broadinstitute.github.io/picard/explain-flags.htmlhttps://broadinstitute.github.io/picard/explain-flags.html

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Digital normalization Dr. rer. nat. Diego Mauricio Riaño Pachón Brazilian Bioethanol Science and Technology Laboratory (CTBE) Brazilian Center for Research in energy and Materials (CNPEM) diego.riano@bioetanol.org.br http://bce.bioetanol.cnpem.br

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Too much data? 16 1.Imagine that you have too much data for your assembly. That could be too much for a single isolate – Or, imagine a community with different organism abundances (More difficult) 2.This can create some problems, could you enumerate a few? 3.How to remove un-informative/extra data?

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 17 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar Each single base error generates ~k new k-mers Generally, erroneous k-mers, appear only once Digital normalization basics: Errors create new k-mers

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 K-mer abundance plots have true and false k-mers 18 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 K-mer abundance plots 19 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 What does digital normalization do? 20 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Digital normalization: The procedure 21 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar If next read is from a high covered region - discard

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Digital normalization: The procedure 22 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat.

Presentaciones similares

Presentación del tema: "Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat."— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback

Iniciar la sesión

Autorizarse a través de una red social:

Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat.

Presentaciones similares

Presentación del tema: "Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat."— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback