Descargar la presentación
La descarga está en progreso. Por favor, espere
Publicada porEmilia Montero Cabrera Modificado hace 9 años
1
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat. Diego Mauricio Riaño Pachón Brazilian Bioethanol Science and Technology Laboratory (CTBE) Brazilian Center for Research in energy and Materials (CNPEM) diego.riano@bioetanol.org.br http://bce.bioetanol.cnpem.br
2
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 2 Mapping/Aligning reads to a reference: What for? RNASeq Peaks: ChIPSeq
3
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 3 Mapping/Aligning reads to a reference Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/ Why not using BLAST? You all know that one, right. It is a widely, ubiquitous tool, for alignment. It was designed for longer sequences You could but, It is too slow. You either need a very large computational infrastructure or have time to spare waiting for results
4
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Many short read mappers available, ca 2012 4 Fonseca et al., 2012. http://bioinformatics.oxfordjournals.org/content/28/24/3169.full.pdf+html
5
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 5 Mapping/Aligning reads to a reference: Speed, ca. 2009 The alternative is to use an index in memory (RAM) that allow fast searches. There are several ways to do this, some are more efficient than others, i.e., need less memory or take less time to do the same.
6
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 6 Mapping/Aligning reads to a reference: Speed, ca. 2009 Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/
7
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 7 Bowtie: How is that achieved? Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Bowtie represents the target (genome) using the Burrows-Wheeler transform, a technique borrowed from compressing files (bzip) and thus can store a complete human genome in less than 2Gb of RAM. (This is a toy representation, as the actual implementation uses further data structures to make it more efficient) http://www.cs.jhu.edu/~langmea/resources/bwt_fm.pdf This type of strategy can allow for up to a few mismatches, without loosing too much efficiency. Why is it important to allow mismatches?
8
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 8 Bowtie: Burrows-Wheeler Transform Langmead et al., 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690996/ Genome = acaacg Add terminator character acaacg$ caacg$a aacg$ac acg$aca cg$acaa g$acaac $acaacg Make all cyclic rotations $acaacg aacg$ac acaacg$ acg$aca caacg$a cg$acaa g$acaac Lexicographic Sort, $ goes first The Burrows- Wheeler Transform is the rightmost column: gc$aaac BW Matrix Keep the first and last columns from the BW matrix. You can get your original string from the BW transform, so this is a reversible process. http://www.cs.jhu.edu/~langmea/resources/bwt_fm.pdf
9
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 9 The problem with spliced-reads Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Kim et al., 2013. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdf
10
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 10 The problem with spliced-reads Trapnell & Salzberg, 2009. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2836519/pdf/nihms182376.pdf Kim et al., 2013. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053844/pdf/gb-2013-14-4-r36.pdf
11
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 11 Sort order Reference sequences and length Program info
12
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 12 Fields: https://samtools.github.io/hts-specs/SAMv1.pdf
13
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 13 Flags: https://samtools.github.io/hts-specs/SAMv1.pdf
14
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 SAM format 14 Flags: What is the meaning of the following flags: 65 129 81 161 77 You can use: https://broadinstitute.github.io/picard/explain-flags.htmlhttps://broadinstitute.github.io/picard/explain-flags.html
15
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Sequence and annotation of genomes and metagenomes with Galaxy Digital normalization Dr. rer. nat. Diego Mauricio Riaño Pachón Brazilian Bioethanol Science and Technology Laboratory (CTBE) Brazilian Center for Research in energy and Materials (CNPEM) diego.riano@bioetanol.org.br http://bce.bioetanol.cnpem.br
16
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Too much data? 16 1.Imagine that you have too much data for your assembly. That could be too much for a single isolate – Or, imagine a community with different organism abundances (More difficult) 2.This can create some problems, could you enumerate a few? 3.How to remove un-informative/extra data?
17
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 17 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar Each single base error generates ~k new k-mers Generally, erroneous k-mers, appear only once Digital normalization basics: Errors create new k-mers
18
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 K-mer abundance plots have true and false k-mers 18 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar
19
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 K-mer abundance plots 19 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar
20
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 What does digital normalization do? 20 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar
21
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Digital normalization: The procedure 21 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar If next read is from a high covered region - discard
22
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015 Digital normalization: The procedure 22 From Prof. Titus Brown presentation: http://www.slideshare.net/c.titus.brown/2013-hmpassemblywebinar
Presentaciones similares
© 2025 SlidePlayer.es Inc.
All rights reserved.