Modelización y análisis de secuencias biológicas

Slides:



Advertisements
Presentaciones similares
Lunes el diecinueve de marzo. The future tense is used in Spanish to talk about future events. It is used when ever we would use the phrase I will… in.
Advertisements

AUTO-ORGANIZACIÓN Samuel Sanz Guzmán.
Cancela, JM. Ayán C. University Of Vigo. Throughout history the definition of learning has been conceptualized in many different ways depending on the.
Detección de Secuencias Reguladoras en el Genoma
AGCTTGCCA AGCTTGCCA AGCTTGCCATTGCCCATGCT TTGCCATTGCCA TTGCCA TTGCCA TTGCCA TTGCCA La secuenciación de los genomas ¿ qué información nos da?
La Hora... Telling Time in Spanish. ¿Que hora es? The verb ser is used to express the time of day. Use es when referring to "one o'clock" and use son.
Fact Family4, 5, 20 (provided by the teacher) Problem4 X 5= 20 Perimeter = 18 cm Area4 x 5= 20 sq cm 5 cm 4 cm Example: Teachers: Assign students.
First Grade – High Frequency Word Reading Competition Classroom Competition Created by: Malene Golding School Improvement Officer: Kimberly Fonteno.
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Time Telling time is rather easy. You only need to know the numbers up to 59 to be able to tell the time.
El calentamiento - el 6 de octubre 1.el bolígrafo la bolígrafo 2.la carpetaslas carpetas 3.el deporteslos deportes 4.el plumala pluma 5.las animaleslos.
Adverbs are words that describe how, when, and where actions take place. They can modify verbs, adjectives, and even other adverbs. In previous lessons,
Indirect Object Pronouns Original PowerPoint was by Ms. Martin of Tri-Center Community Schools.
What are some other organic molecules? Lipids/ Lipidos Fats/ Grasas.
Aim: How do scientists use biotechnology to manipulate genomes? Objetivo: ¿Cómo los científicos utilizan biotecnología para manipular genomas?
Understanding Documents from Mexico—Part 1 Naming Conventions, Birth Certificates, and Immunization Records Sonja Williams Migrant Education Program NCDPI.
Week 30 Spanish II. Para Emepzar 11 de abril What is this question asking: ¿Dónde te gustaría visitar y por qué?
Agenda: Boot verbs and jugar. La Familia. Tener: Quack Video
UNIVERSIDAD AUTÓNOMA DEL ESTADO DE HIDALGO ESCUELA SUPERIOR DE ZIMAPÁN
Las clases de Sra. Schwarz Realidades 1
To be, or not to be? Let’s start out with one of the most important verbs in Spanish: ser, which means “to be.”
Como se regula este proceso?
To be, or not to be? Let’s start out with one of the most important verbs in Spanish: ser, which means “to be.”
Aim: How do scientists identify people using DNA Fingerprinting
Warm Up BLUFF!!! Vocabulary of Commands and Greetings Numbers Create a dialogue of 6 exchanges an present it in class for FICHAS!!!
Demonstrative Adjectives and Pronouns
Síntesis de DNA: (NMP)n+1 + PPi (NMP)n + NTP
ANTE TODO Adverbs are words that describe how, when, and where actions take place. They can modify verbs, adjectives, and even other adverbs. In previous.
El 28 de Agosto Objetivos:
First Grade Dual High Frequency Words
More sentences that contain if…
Why does Spiderman have these special powers?
THE ATMOSPHERE.
Un buen comienzo para un buen futuro Tópico: “No hay uno sin dos” Sucesiones Ing. Gabriel Jaime Ramírez Henao.
Youden Analysis. Introduction to W. J. Youden Components of the Youden Graph Calculations Getting the “Circle” What to do with the results.
Síntesis de ARN y proteínas
INSTRUCCIÓNES PARA ENVÍO DE LA EVIDENCIA DESCRIBING POTENTIAL CLIENTS 1. Responda las seis primeras preguntas del punto 1 de la evidencia “
Translation-Protein manufactory Miguel Suárez Barrera- Microbiólogo Industrial MSc.
Replicación del ADN.
UD IV. GENÈTICA. IV. 3. Dels gens a les proteïnes
De DNA a proteína.
¿Dónde se juegan los deportes?
Los números.
Recetas 3 Objetivo: Hacer preguntas Hablar sobre la comida
The present tense of ir and jugar
-er & -ir Verbs As we saw in the previous presentation, there are three conjugations of verbs in Spanish: -ar, -er, and -ir. -er and -ir verbs are often.
Copyright ©The McGraw-Hill Companies, Inc
Forming questions.
Copyright ©The McGraw-Hill Companies, Inc
ANTE TODO Adverbs are words that describe how, when, and where actions take place. They can modify verbs, adjectives, and even other adverbs. In previous.
Kindergarten Spanish High Frequency Words
An Animated PowerPoint Template. This 2007 PowerPoint contains video animation, but does not support text over video. The PowerPoint 2010 version includes.
Preparacion Hoy es jueves el 3 de diciembre
If you feel that the resources that I have created have helped you save time and/or money, please consider supporting the work that I do by doing your.
UNIT 1: The structure of matter: FQ3eso_U1_3: Electron configurations
Los adjetivos demostrativos Notes #16 What is a demonstrative adjective in English? Demonstrative adjectives in English are simply the words: THISTHESE.
Replicación del ADN.
Page 228 Realidades 1 Tener, Poner, Hacer.
Indicaciones y Dónde Está
Development of the concert programme
Quasimodo: Prepárate para la prueba.
Jugar.
CONSEQUENCES OF ACCIDENTS AT WORK GLORIANA RUIZ. CONSEQUENCE OF WORKING WITHOUT GLOVES 1. A person can have side effects if he sprays a plant with chemicals.
How to write my report. Checklist – what I need to include Cover page Contents page – with sections Introduction - aims of project - background information.
The causative is a common structure in English. It is used when one thing or person causes another thing or person to do something.
Globalization Politics and the preservation of nation state.
a. Which job do you think pays more? I think an assistant chef earns more, as he spends all day working, while the dog walker earns according to the dogs.
Las Preguntas (the questions) Tengo una pregunta… Sí, Juan habla mucho con el profesor en clase. No, Juan no habla mucho en clase. s vo s vo Forming.
Transcripción de la presentación:

Modelización y análisis de secuencias biológicas

MODELIZACION DEL ADN Visión simplificada del ADN: secuencias de caracteres pertenecientes al alfabeto finito {A, C, G, T} Estas secuencias están estructuradas Regiones codificantes Exones separados por intrones Pequeños fragmentos de inicio o terminación Regiones no codificantes 21/11/2018

Modelización de las regiones Cada región posee propiedades estadísticas distintas Si podemos capturar las propiedades en un modelo adecuado Podremos desarrollar tests para decidir si una región es o no codificante. La modelización se basará en conjuntos de datos de entrenamiento cuya estructura es conocida. 21/11/2018

Jerarquías de modelos Como es habitual empezamos por modelos simples y pasamos a sucesivos modelos cada vez más complejos. Independencia Dependencia markoviana Orden 1 Orden > 1  CM no homogéneas Descomposición de la dependencia máxima …. 21/11/2018

Modelos para señales A lo largo de las secuencias de ADN se encuentran pequeñas señales que indican por ejemplo: Donde empieza o acaba un exón / intrón Donde se enlaza una polimerasa de RNA Estas señales pueden utilizarse para detectar donde empieza o acaba un gen 21/11/2018

Datos de ejemplo Para construir modelos de las señales precisamos de datos de ejemplo El sitio http://www-biology.ucsd.edu/labs/Kadonaga/DCPD.htm es una base de dato de promotores de Drosophila Transcription is centrally involved in an array of biological processes, which include growth, development, and response to external stimuli. In eukaryotes, protein-coding genes are transcribed by the RNA polymerase II transcriptional machinery, which comprises RNA polymerase II and other factors that are required for basal and regulated transcription. Transcription by RNA polymerase II is directed by cis-acting DNA sequences that typically consist of a core promoter along with regulatory elements, such as enhancers, that contain binding sites for sequence-specific transcriptional activator and/or repressor proteins. Thus, the study of both the trans-acting protein factors and the cis-acting DNA elements is necessary to gain a better understanding of the fundamental mechanisms by which genes are transcribed (for recent reviews, see Björkland and Kim 1996; Burley and Roeder 1996 ; Orphanides et al. 1996 ; Roeder 1996 ; Verrijzer and Tjian 1996 ; Ptashne and Gann 1997 ; Sauer and Tjian 1997 ; Smale 1997 ; Tansey and Herr 1997 ). The key DNA element that is essential for transcription by RNA polymerase II is the core promoter the DNA sequences, which encompass the transcription start site (within about 40 to +40 relative to the +1 start site) and are sufficient to direct the accurate initiation of transcription. Two important core promoter motifs are the TATA box and the initiator (Inr) (Fig. 1). The TATA box is an A/T-rich sequence that is located ~25-30 nucleotides upstream of the RNA start site of many, but not all, promoters. It is recognized by the TATA box-binding polypeptide (TBP), which is a component of the multisubunit TFIID complex. The Inr encompasses the RNA start site, and like the TATA box, it is also present in many, but not all, core promoters (Smale and Baltimore 1989 ; Smale 1994 , 1997 ). Inr elements have been characterized in various TATA-less and TATA-containing promoters, and the Inr consensus sequence is Py-Py-A+1-N-T/A-Py-Py (where A+1 is the transcription start site) in mammalian genes (Smale and Baltimore 1989 ; Bucher 1990 ; Javahery et al. 1994 ) and T-C-A+1-G/T-T-T/C in Drosophila genes (Hultmark et al. 1986 ; Purnell et al. 1994 ; Arkhipova 1995 ). View larger version (22K): [in this window] [in a new window]   Figure 1.     The TATA box, Inr, and DPE are core promoter elements. The consensus sequences and locations of the TATA box, Inr, and DPE motifs are indicated. The TATA box and DPE appear to be functionally redundant, and promoters generally do not contain both elements. Many promoters contain functionally important sequences that are downstream of the transcription start site. Such downstream promoter sequences have been found in TATA-containing promoters (see, e.g., Lewis and Manley 1985 ; Nakatani et al. 1990 ; Lee et al. 1992 ; Emanuel and Gilmour 1993 ; Purnell and Gilmour 1993 ), as well as in TATA-less promoters (see, e.g., Biggin and Tjian 1988 ; Perkins et al. 1988 ; Soeller et al. 1988 ; Smale and Baltimore 1989 ; Jarrell and Meselson 1991 ; Contursi et al. 1995 ; Minchiotti et al. 1997 ). It appears that many of these downstream promoter sequences are involved in basal transcription, but it is also important to consider that some downstream promoter sequences might be binding sites for sequence-specific transcriptional activators. 21/11/2018

Modelo estadístico básico El modelo estadístico básico para los miembros de una familia de señales es Pr(sequence of n bases|Member) = f(sequence of n bases) El objetivo es obtener un modelo para los miembros de la familia basado en un conjunto de datos de entrenamiento Que pueda ser utilizado para clasificar nuevas secuencias 21/11/2018

Matrices de pesos A simple model would be that the nucleotide at each position within the signal is independent of the nucleotides at other positions. The model for the signal would be f(b1 ...bn)=Πk=1nfk(bk) , where f() is the probability that a putative sequence, b1 ...bn, could be generated by the signal family; bk is the base at position k in the sequence; and fk(b) is the probability of finding base b at position k The Π (product) arises from the independence assumption. 21/11/2018

NOTA Este tema está momentáneamente incompleto La base para el tema es el capítulo 5, secciones (5.2) y (5.3) del libro Statistical methods in Bioinformatics. Unas notas sobre el mismo se encuentran en las notas Basic Signal Analysis del curso de Steve Kachman 21/11/2018

Referencias Cap. 5: The Analysis of One DNA sequence 21/11/2018