Introducción a la Bioinformática 2002 Universidad Nacional San Cristobal de Huamanga, Ayacucho Mirko Zimic.

Slides:



Advertisements
Presentaciones similares
BLAST.
Advertisements

Verb Conjugations Easier than you think.
Control en cascada.
Química Biológica I - Bioquímica I
The future tense ESSENTIAL ?: How can I describe events that will happen in the future?
Writing A Lab Report.
Preguntas Esenciales 1.What are essential questions? What are their purpose in learning? 2.Look over the essential questions and think about how they might.
Building a Library of Alexandria Scientific Supercourse "One of the greatest gifts we can give to another generation is our experiences, our wisdom" Desmond.
Sequencing activities From drilling to meaningful communication.
REQUISITOS PARA LA GRADUATION DE LAS HIGH SCHOOLS DE ALLIANCE Alliance High School Graduation Requirements.
Helping Your Child at Home with Math Agenda Welcome and Overview Math Tools Using Math Strategies Homework Grade Level Games Closing: Mathematics Vision.
La pregunta: ¿Cuál país del mundo hispanohablante está en África? ¿Cuál país del mundo hispanohablante está en Europa? ¿Cuál país del mundo hispanohablante.
SOCIEDAD PARA EL DESARROLLO REGIONAL DE CANTABRIA (SODERCAN) Knowledge Management tools Knowledge management tools.
TELEFÓNICA Research (I+D ) © 2008 Telefónica Investigación y Desarrollo, S.A. Unipersonal ICT 2008 – Collective Intelligence Networking Nov. 26, 2008 ©
Health Products Beauty Products Diet/Weight loss Financial Freedom.
Health Products Beauty Products Diet/Weight loss Financial Freedom.
Present Perfect o Préterito Perfecto In Spanish, like in English, the present perfect describes what HAS HAPPENED. In general it refers to recently completed.
¡Hola! ¡Buenos días! ¡Bienvenidos! Please find your class number on the class paper that is taped to the board and then look for your seat.
Notes #18 Numbers 31 and higher Standard 1.2
DIRECT OBJECT PRONOUNS. DIRECT OBJECTS The object that directly receives the action of the verb is called the direct object. Mary kicked the ball. "Ball"
Bioinformática Introducción, Bases de datos biológicas Prof. Mirko Zimic.
Federación de servicios a la ciudadanía medios de comunicación, artes, cultura y deportes The impact of noise on musicians Looking for solutions.
What has to be done today? It can be done in any order. Make a new ALC form Do the ALC Get two popsicle sticks Get 16 feet of yarn. That is 4 arms width.
Leading in Learning – Spanish Collective Memory. Plenary 1 Did you know any of the words already? If so, which? Why are the colours significant do you.
Hace + Time Expressions
Bioinformática Predicción estructural y funcional Máster Biomedicina
Science Career Research Project By Jose pineda. Name of Job: Electrical Engineering I like that I would like to invent new electric and new social networks.
ALC 73 Hoy es lunes el 16 de abril de Completa la información usando frases completas. Tu nombre completo Tu fecha de nacer El lugar de nacer El.
Mi dormitorio My bedroom
Bioinformática.
WWW. WIRELESS WIFI Wireless-Fidelity Es un mecanismo de conexión de dispositivos electrónicos de forma inalámbrica.
¿Qué haces en la escuela? Question words, objects, yo-go’s.
Las Preguntas (the questions) Tengo una pregunta… Sí, Juan habla mucho con el profesor en clase. No, Juan no habla mucho en clase. s vo s vo Forming.
Organización de los Estados Americanos Organização dos Estados Americanos Organisation des États Américains Organization of American States FIRST INTER-AMERICAN.
Tecnología y Estructura de Costos. Technologies u A technology is a process by which inputs are converted to an output. u E.g. labor, a computer, a projector,
Time Telling time is rather easy. You only need to know the numbers up to 59 to be able to tell the time.
ALC 63: Traducir Hoy es miércoles el 25 de marzo, 2015 I text on the phone every day. What do you do everyday? I play video games in my living room. Where.
Escriban y respondan las preguntas.
Spanish Sentence Structure How can we make better sentences?
© Copyright Ebiointel,SL 2006 Recursos para el análisis de secuencias The Biocatalog.
Portafolios E E- Portfolios What is - Qué es e-portfolio? e-Portfolio: A portfolio is a collection of work developed across varied contexts over.
TK Hemmick1 HBD Status TK Hemmick for the HBD group.
Welcome to EFNEP and SNAP-Ed New Educator Training !
Subdirección General de Estadísticas de los Servicios XBRL PILOT TASK FORCE MEETING Introduction of XBRL in Spanish SBS-data collection XBRL PILOT TASK.
Español II Segundo tema.  You will be able to tell about what you do everyday in your daily routine.  You will be able to identify items you use to.
JUEVES, EL 10 DE SEPTIEMBRE LT: I WILL RECOGNIZE SOME NEW VOCABULARY WORDS. Go over tests & retake procedures Interpretive Assessment: numbers & alphabet.
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Mapping reads Dr. rer. nat.
Hacer Ahora. Usa saber o conocer para completar las oraciones.
Operations Charts Keep these operations charts posted by the wall you usually work out math word problems. Print them in color and paste them on the same.
1 SURVIVAL SPANISH FOR TRAVELERS Lesson 2 ☞ Hi again! In this lesson we are going to learn different ways of greeting and farewell, as well as how to become.
What are some other organic molecules? Lipids/ Lipidos Fats/ Grasas.
To be, or not to be? Let’s start out with one of the most important verbs in Spanish: ser, which means “to be.”
If you won the lottery, what would you do?
GENE MUTATIONS/ MUTACIONES GENICAS
LOS VERBOS REFLEXIVOS. WRITE: What is a reflexive verb? A reflexive verb describes when a person doing an action is also receiving the action.
SCAFFOLDING & DIFFERENTIATION
English Language II (2). English Language I (2) Warm-up.
LO: SWBAT explain how protein shape is determined and differentiate between the different types of mutations. Objetivo: Explica como se determina la forma.
¡BIENVENIDOS! ALPHABET, COGNATES.. DO NOW Take five minutes to Silently and Independently fill out the calendar on your desk. Every Calendar should have:
AIM: How do comparative studies help trace evolution? Como ayuda la comparacion a establecer relaciones evolutivas?
Aim: How do scientists use biotechnology to manipulate genomes? Objetivo: ¿Cómo los científicos utilizan biotecnología para manipular genomas?
What is Genetic Engineering? Que es la Ingenieria Genetica? Genetic Engineering is a new process that scientists use to alter the genetic instructions.
100 Book Challenge Desafío de leer 100 libros. Cada niño tendrán: Una bolsa de libros Una carpeta 2 libros para leer cada noche Un informe de lectura.
Campanada guidelines in your composition notebook ¿Cómo es tu familia? (10 points) Ex. Hoy es Miercoles el 27 de enero First line will start with the date.
Martes, 4 de octubre WALT: how to tell the time in Spanish WILF: to be able to understand and begin to say the time in Spanish Can you match these times.
¿Qué son los verbos irregulares?
First Grade Dual High Frequency Words
Quasimodo: Tienes que hacer parte D de la tarea..
How to write my report. Checklist – what I need to include Cover page Contents page – with sections Introduction - aims of project - background information.
Las Preguntas (the questions) Tengo una pregunta… Sí, Juan habla mucho con el profesor en clase. No, Juan no habla mucho en clase. s vo s vo Forming.
Transcripción de la presentación:

Introducción a la Bioinformática 2002 Universidad Nacional San Cristobal de Huamanga, Ayacucho Mirko Zimic

Tópicos de interés en la bioinformática Análisis de secuencias Filogenia y evolución molecular Modelamiento molecular Plegamiento de Proteínas Genómica y Proteómica Genética estadística Microarreglos Programación científica

Pongamos un ejemplo … Cisteíno proteasa de la fasciola hepática: En busca de un péptido inmunogénico

Alineamiento: cisteíno proteasas de mamífero Vs. cisteíno proteasa de Fasciola hepatica. AA IdénticosAA divergentes

Epítope Discontinuo, formado por porciones distantes de la secuencia. Denaturación El epítope se pierde con la denaturación.

Denaturación El epítope se conserva como tal. Epítope Continuo, formado por una porción de la secuencia

Modelaje tridimensional por homología. Identidad de secuencia de 56% con quimopapaína (1YAL)

AA idénticosAA divergentes Análisis de Superficie: vista de átomos por radio de van der Waals

TMEGQYMKNERTSISFS YYTVQSGSEVELK NLIGSE QSQTCSPLRVN RYNKQLGVAKV Selección de secuencias (1)divergentes, (2)accesibles al solvente y (3)contínuas.

Evaluación de la estabilidad conformacional de los péptidos por minimización de energía. H2OH2O “backbone” TMEGQYMKNERTSISFSYYTVQSGSEVELKNLIGSE

Pongamos otro ejemplo… Sensibilidad de la aspartyl proteasa del HIV-1 a los inhibidores más frecuentes

Representación en “cartoon” de la enzima proteasa de HIV-1

MONOMERO PROTEASA HIV

Enzima proteasa de HIV-1 mostrando los elementos de estructura secundaria, flaps y sitio activo

Enzima proteasa de HIV-1 indicando los residuos consenso de unión inhibidor-enzima

INDINAVIR

RITONAVIR

Asociación de indinavir a la proteasa de HIV-1

P roteasa de HIV-1 mutante modelada en complejo con Ritonavir

COMPARACION ENTRE UNA ENZIMA SENSIBLE Y UNA RESISTENTE A RITONAVIR

Un ejemplo más… Ordenamiento filogenético y el contenido de GC en tripanosomátidos

Reported %GC variation for each codon position in Trypanosomatids (Alonso et al,1992)

Codon usage in Trypanosomatids leucine

Codon usage in Trypanosomatids serine

Phylogeny of Trypanosomatid lineage (Maslov & Simpson)

“Hole” formation by DNA replication

GC content variation in time Restriction: AA family conservation and AA conservation

%GC variation in Trypanosomatid lineage (Nuclear coding DNA)

I. Proyecto Genoma Humano La secuencia del genoma está casi completa! – aproximadamente 3.5 billones de pares de bases.

All the Genes Any human gene can now be found in the genome by similarity searching with over 90% certainty. However, the sequence still has many gaps – one is unlikely to find a complete and uninterrupted genomic segment for any gene – still can’t identify pseudogenes with certainty This will improve as more sequence data accumulates

Raw Genome Data:

The next step is obviously to locate all of the genes and describe their functions. This will probably take another years!

–so why are there 60,000 human genes on Affymetrix GeneChips? –Why does GenBank have 49,000 gene coding sequence and UniGene have 89,000 clusters of unique ESTs? Clearly we are in desperate need of a theoretical framework to go with all of this data …Algunos años atrás… Celera sostenía que sólo habrían 30,000 genes

Implications for Biomedicine Physicians will use genetic information to diagnose and treat disease. –Virtually all medical conditions (other than trauma) have a genetic component. Faster drug development research –Individualized drugs –Gene therapy All Biologists will use gene sequence information in their daily work

II. Bioinformatics Challenges  Lots of new sequences being added - automated sequencers - Human Genome Project - EST sequencing  GenBank has over 10 Billion bases and is doubling every year!! (problem of exponential growth...)  How can computers keep up? The huge dataset

New Types of Biological Data Microarrays - gene expression Multi-level maps: genetic, physical, sequence, annotation Networks of Protein-protein interactions Cross-species relationships –Homologous genes –Chromosome organization

Similarity Searching the Databanks  What is similar to my sequence?  Searching gets harder as the databases get bigger - and quality degrades  Tools: BLAST and FASTA = time saving heuristics (approximate)  Statistics + informed judgement of the biologist

Alignment  Alignment is the basis for finding similarity  Pairwise alignment = dynamic programming  Multiple alignment: protein families and functional domains  Multiple alignment is "impossible" for lots of sequences  Another heuristic - progressive pairwise alignment

Sample Multiple Alignment

Structure- Function Relationships  Can we predict the function of protein molecules from their sequence? sequence > structure > function  Conserved functional domains = motifs  Prediction of some simple 3-D structures (  -helix,  -sheet, membrane spanning, etc.)

Protein domains

DNA Sequencing  Automated sequencers > 40 KB per day  500 bp reads must be assembled into complete genes - errors especially insertions and deletions - error rate is highest at the ends where we want to overlap the reads - vector sequences must be removed from ends  Faster sequencing relies on better software  overlapping deletions vs. shotgun approaches: TIGR

Finding Genes in genome Sequence is Not Easy About 2% of human DNA encodes functional genes. Genes are interspersed among long stretches of non-coding DNA. Repeats, pseudo-genes, and introns confound matters

Pattern Finding Tools It is possible to use DNA sequence patterns to predict genes: promoters translational start and stop codes (ORFs) intron splice sites codon bias Can also use similarity to known genes/ESTs

Phylogenetics  Evolution = mutation of DNA (and protein) sequences  Can we define evolutionary relationships between organisms by comparing DNA sequences -is there one molecular clock? -phenetic vs. cladisitic approaches -lots of methods and software, what is the "correct" analysis?

II. El papel del Biólogo en la Era de la Información

El Internet provee abundante información biologica  Puede resultar abrumador… - - Web  Necesidad de nuevas habilidades = localizar información necesaria de manera eficiente

Computing in the lab - everyday tasks (vs. computational biology)  ordering supplies  reference books  lab notes  literature searching

Training "computer" scientists  Know the right tool for the job  Get the job done with tools available  Network connection is the lifeline of the scientist  Jobs change, computers change, projects change, scientists need to be adaptable

The job of the biologist is changing As more biological information becomes available … –The biologist will spend more time using computers –The biologist will spend more time on data analysis (and less doing lab biochemistry) –Biology will become a more quantitative science (think how the periodic table and atomic theory affected chemistry)

III. Molecular Biology Software Tools

GCG (Wisconsin Package)  The most popular and most comprehensive set of tools for the molecular biologist. - Runs on mainframe computers: (UNIX) - Web, X-Windows (SeqLab) interfaces - Inexpensive for large numbers of users - Requires local databases (on the mainframe computer) - Allows for custom databases and programming

The Web  Many of the best tools are free over the Web  BLAST  ENTREZ/PUBMED  Protein motifs databases  Bioinformatics “service providers”  DoubleTwist ™, Celera, BioNavigator ™  Hodgepodge collection of other tools  PCR primer design  Pairwise and Multiple Alignment

Personal Computer Programs u Macintosh and Windows applications - Commercial: Vector NTI™, MacVector™, OMIGA™, Sequencher™ - Freeware: Phylip, Fasta, Clustal, etc. u Better graphics, easier to use u Can't access very large databases or perform demanding calculations u Integration with web databases and computing services

Putting it all together u The current state of the art requires the biologist to jump around from Web to mainframe to personal computer u The trend is for integration –Web + personal computer will replace text interface to mainframe ? –Will the Web become the ultimate interface for all computing ??

IV. Genómica

Genomics Technologies Automated DNA sequencing Automated annotation of sequences DNA microarrays –gene expression (measure RNA levels) –single nucleotide polymorphisms (SNPs) Protein chips (SELDI, etc.) Protein-protein interactions

cDNA spotted microarrays

Affymetrix Gene Chips

Impact on Bioinformatics Genomics produces high-throughput, high- quality data, and bioinformatics provides the analysis and interpretation of these massive data sets. It is impossible to separate genomics laboratory technologies from the computational tools required for data analysis.

Pharmacogenomics The use of DNA sequence information to measure and predict the reaction of individuals to drugs. Personalized drugs Faster clinical trials –Selected trail populations Less drug side effects –toxicogenomics