U N A M Universidad Nacional Autónoma de México Servicios Web con aplicaciones en Bioinformática 24 de marzo, 2009.

Slides:



Advertisements
Presentaciones similares
PLT EXPERIENCES IN SPAIN
Advertisements

Español la memoria (2): cómo trabajarla bien en grupos.
Learning Achievement in Creativity and Design Subjects according to Professional Profiles (2006) European Transfer Credit System (ECTS) Methodology in.
Diagnóstico climático del Golfo de California
You need to improve the way you write and think in Spanish - Time This is an interactive presentation. You need your work sheet, your pencil, and your.
You need to improve the way you write and think in Spanish – At what time? This is an interactive presentation. You need your worksheet, your pencil, and.
Unifying Concepts and Processes in Science. Explanatory Framework Across Science Disciplines Science is a way of knowing, a process--- it is a systematic.
Mr. Redaelli OnlineTaco.com. To Have - Tener Yo tengo – I have Tú tienes – You have (Informal) Usted tiene – You have (Formal) El tiene – He has Ella.
Goal: Factor trinomials of the form ax2 + bx + c
SECRETARÍA DE ESTADO DE CAMBIO CLIMÁTICO DIRECCION GENERAL DE CALIDAD Y EVALUACION AMBIENTAL PRESENTATION BY SPAIN TO THE EXECUTIVE BODY FOR THE CLRTAP:
To be, or not to be? Lets start out with one of the most important verbs in Spanish: ser, which means to be.
QUESTION OF THE MONTH August, In the next two slides you will see the perinatal mortality figures during most of the XX Century, as presented by.
DESCRIBILICIOUS – LEVEL 3 The following presentation is designed to help you improve your understanding of descriptions in Spanish. Youll need to stay.
Telling Time Grammar Essential #8.
RENAISSANCE es un proyecto del programa CONCERTO co-financiado por la Comisión Europea dentro del Sexto Programa Marco RENAISSANCE - ZARAGOZA - SPAIN 1.
RENAISSANCE es un proyecto del programa CONCERTO co-financiado por la Comisión Europea dentro del Sexto Programa Marco 1 WP 1.5 Description of work (month.
8 Noviembre, 2006 Semana de la Ciencia en Extremadura 2006 Miguel Cárdenas Montes, CETA-CIEMAT Aplicaciones Grid Séptimo Tutorial.
Mexico Chances for Wind Power
Fundación Comunidad Valenciana – Región Europea FCVRE Fundación Comunidad Valenciana – Región Europea Project Forum Corner Which kind of.
© 2006 XBRL International, All Rights Reservedwww.xbrl.org/Legal Ignacio Hernández-Ros Technology development XBRL International Using XQuery to process.
Action Plan Template Intel ISEF 2009 Educator Academy May 2009
Grupos de Trabajo 6 - Informe Working Group 6 – Report Transparency.
Grupos de Trabajo # 7 - Informe Working Group # 7 – Report General Business and Operational Risks.
Beginning Low ESL Teacher
USING THE INTERNET. 2 Concept 6.1 What Is the Internet? The Internet is millions of computers from all parts of the world connected so that they can communicate.
2003 HYPACK MAX Training Seminar1 Sample Belgium Test Inputting Tag Line information into HYPACK® MAX In the old days, a Tag Line was anchored to.
Game Cluedo: How to Play 1.Your group should have the 21 cards containing 6 cards of suspects, 9 rooms and 6 weapons, a tally card for each member and.
Conceptos Básicos del lenguaje de programación COBOL
8 th Grade Midterm Study Guide Complete the think-tac-toe that will help you prepare for your midterm. The tasks that you complete must create a tic- tac-toe,
Description Digital school is an educational movement that use technology to learn and transform the educational practice to promote the students integral.
Arquitectura de Computadores I
Español 1 el 30 de agosto. Bell Dinger – el 30 de agosto
Making comparisons In this slide show, well look at ways of expressing differences and similarities.
Implementing DDI in the National Institute of Statistics and Geography of Mexico Eric Rodriguez.
Ibero-American Network of National Engineering and Hydraulic Research Institutes Roundtable Polioptro Martínez-Austria Mexican Institute of Water Technology.
Iberoamerican National Institutes Network of Engineering and Hydraulic Research Roundtable Manuel Echeverria CEDEX Spain Millenium Development Goals on.
Departamento Administrativo de Ciencia, Tecnología e innovación Colciencias República de Colombia VII Ordinary Meeting of the COMCYT Working Group Technological.
1. My dad likes Match of the Day because he loves football. 2. My older sister loves X Factor because she likes singing. 3. My mum likes Come Dine with.
Señora Johnson Tambien necesitas 5 separadores Chambas Apuntes Impresos Actividades Registros.
ALC 11 lunes el 26 de septiembre. Bienvenida lunes el 26 de septiembre Front row face the back row. Back row will read the words on the screen in the.
California Standards Test (CST). Every student in California takes a test to see if they have learned the necessary knowledge and skills for their grade.
9/20 A- Complete the sentences: 1. Me llam_ Pablo. 2. ¿Cómo te llam_ _? 3.¿Cómo se llam_ ? B- Contesta las preguntas (oraciones completas). 1. ¿Estudias.
ExpoForo 2008 "Políticas Públicas em la era digital" Camillo Speroni VP & GM Novell Latin America
(por favor) By emory gibson Para describir how long ago en español, presta attencion.
TEMA 2.- INTRODUCCIÓN A LOS MÉTODOS ÓPTICOS PROPIEDADES DE LA RADIACION ELECTROMAGNETICA Muchas de las propiedades de la radiación electromagnética.
Antonio Gámir TSP – Windows Client Microsoft Ibérica.
Su Negocio Conectado. VisibilidadVisibilidad ColaboraciónColaboración PlanificaciónPlanificación EjecuciónEjecución Build Connections.
Clustered Hard Disk Drives Cold data.
LA SOCIEDAD CONECTADA EL lugar de trabajo del futuro.
HAZ AHORA / DO NOW Responde en frases completas: Por ejemplo: ¿Qué te gusta más, nadar o esquiar? Pues, me gusta más nadar. Pues, no me gusta ni nadar.
Richard Feynman: "El carácter de la ley física.
Telling time in Spanish: Explanation Practice exercise index McCarron.
Cancela, JM. Ayán C. University Of Vigo. Throughout history the definition of learning has been conceptualized in many different ways depending on the.
Articles, nouns and contractions oh my!. The POWER of the article THE 1. There are four ways to express THE in Spanish 2. The four ways are: El La Los.
 Making complete sentences How to make complete sentences in Spanish. The following presentation is designed to help you learn how to do the following:
1 DEFINITION OF A CIRCLE and example CIRCLES PROBLEM 1a PROBLEM 2a Standard 4, 9, 17 PROBLEM 1b PROBLEM 2b PROBLEM 3 END SHOW PRESENTATION CREATED BY SIMON.
THOMSON SCIENTIFIC Updates for April Copyright 2006 Thomson Corporation 2 2 Actualizaciones Actualizaciones a las herramientas Refine y Analyze.
 1. Why should a person learn Spanish? Give at least 3 reasons in your explanation.  2. What Spanish experiences have you had? (None is not an option.
Negatives and Questions. Negatives Consider the following sentences: Juan estudia mucho. Marta y Antonio viven en Georgia. Rita y el chico necesitan.
Español la memoria (1): cómo trabajarla bien. Hoy vamos a… mirar escucharpensar hablar memorizar.
Derechos de Autor©2008.SUAGM.Derechos Reservados Sistema Universitario Ana G. Méndez División de Capacitación Basic Quality Tools CQIA Primer Section VII.
Social Networks and Parent Teacher Meetings: A Question that can´t Wait Molina, M.D., Rodríguez, J., Collado, J.A. y Pérez, E. University of Jaén (SPAIN)
Santiago de Chile January 2012 Roundtable 6: Lobby regulation János Bertók Head of Public Sector Integrity Division Organización para la cooperación.
WALT: talking about dates and saying when your birthday is WILF: To be able to write & understand months and dates in Spanish to get to a Level 2 To be.
ECOM-6030 PASOS PARA LA INSTALACIÓN DE EASYPHP Prof. Nelliud D. Torres © - Derechos Reservados.
Día número 3 Español 1—Acelerado
First Grade Dual High Frequency Words
Welcome to PowerPoint gdskcgdskfcbskjc. Designer helps you get your point across PowerPoint Designer suggests professional designs for your presentation,
Stevanov Notes.
Integrated Management System
Transcripción de la presentación:

U N A M Universidad Nacional Autónoma de México Servicios Web con aplicaciones en Bioinformática 24 de marzo, 2009

Introducción. Navegando a través del tiempo en la genética Era Genómica Genoma Humano Retos Explosión de datos. Análisis integrados. Bioinformatica Qué es? Consorcios y Grupos. Herramientas Web Services web services workflows

Navegando a través del tiempo en la genética 1869: Friedrich Miescher isolates DNA for the first time. 1879: Mitosis observed 1865: Mendel's Peas Gregor Mendel describes his experiments with peas showing that heredity is transmitted in discrete units. Walter Flemming described chromosome behavior during animal cell division. Miescher isolated a material rich in phosphorus from the cells and called it nuclein.

1900s 1900: Rediscovery of Mendel's workRediscovery of Mendel's work 1902: Orderly Inheritance of Disease Observed 1902: Chromosome Theory of Heredity 1909: The Word Gene Coined 1911: Fruit Flies Illuminate the Chromosome TheoryOrderly Inheritance of Disease ObservedChromosome Theory of HeredityThe Word Gene CoinedFruit Flies Illuminate the Chromosome Theory 1940's 1941: One Gene, One Enzyme 1943: X-ray Diffraction of DNA 1944: DNA is "Transforming Principle" 1944: Jumping GenesOne Gene, One EnzymeX-ray Diffraction of DNADNA is "Transforming Principle"Jumping Genes 1950's 1952: Genes are Made of DNA 1953: DNA Double Helix 1955: 46 Human Chromosomes 1955: DNA Copying Enzyme 1956: Cause of Disease Traced to Alteration 1958: Semiconservative Replication of DNA 1959: Chromosome Abnormalities IdentifiedGenes are Made of DNADNA Double Helix46 Human ChromosomesDNA Copying EnzymeCause of Disease Traced to AlterationSemiconservative Replication of DNAChromosome Abnormalities Identified

1960's 1961: mRNA Ferries Information 1961: First Screen for Metabolic Defect in Newborns 1966: Genetic Code Cracked 1968: First Restriction Enzymes DescribedmRNA Ferries InformationFirst Screen for Metabolic Defect in NewbornsGenetic Code CrackedFirst Restriction Enzymes Described 1970's 1972: First Recombinant DNA 1973: First Animal Gene Cloned : DNA Sequencing 1976: First Genetic Engineering Company 1977: Introns DiscoveredFirst Recombinant DNAFirst Animal Gene ClonedDNA SequencingFirst Genetic Engineering CompanyIntrons Discovered 1980's : First Transgenic Mice and Fruit Flies 1982: GenBank Database Formed 1983: First Disease Gene Mapped 1983: PCR Invented 1986: First Time Gene Positionally Cloned 1987: First Human Genetic Map 1987: YACs Developed 1989: Microsatelites, New Genetic Markers 1989: Sequence-tagged Sites, Another MarkerFirst Transgenic Mice and Fruit FliesGenBank Database FormedFirst Disease Gene MappedPCR InventedFirst Time Gene Positionally ClonedFirst Human Genetic MapYACs DevelopedMicrosatelites, New Genetic MarkersSequence-tagged Sites, Another Marker

1990's 1990: Launch of the Human Genome Project NIHLaunch of the Human Genome Project 1990: ELSI Founded 1990: Research on BACs 1991: ESTs, Fragments of Genes 1992: Second-generation Genetic Map of Human Genome 1992: Data Release Guidelines Established 1993: NEW HGP Five-year Plan 1994: FLAVR SAVR Tomato 1994: Detailed Human Genetic Map 1994: Microbial Genome ProjectELSI FoundedResearch on BACsESTs, Fragments of GenesSecond-generation Genetic Map of Human GenomeData Release Guidelines EstablishedNEW HGP Five-year PlanFLAVR SAVR TomatoDetailed Human Genetic MapMicrobial Genome Project 1995: Ban on Genetic Discrimination in Workplace 1995: Two Microbial Genomes Sequenced 1995: Physical Map of Human Genome Completed 1996: International Strategy Meeting on Human Genome Sequencing 1996: Mouse Genetic Map Completed 1996: Yeast Genome Sequenced 1996: Archaea Genome Sequenced 1996: Health Insurance Discrimination Banned 1996: 280,000 Expressed Sequence Tags (ESTs) 1996: Human Gene Map Created 1996: Human DNA Sequence BeginsBan on Genetic Discrimination in WorkplaceTwo Microbial Genomes SequencedPhysical Map of Human Genome CompletedInternational Strategy Meeting on Human Genome SequencingMouse Genetic Map CompletedYeast Genome SequencedArchaea Genome SequencedHealth Insurance Discrimination Banned280,000 Expressed Sequence Tags (ESTs)Human Gene Map CreatedHuman DNA Sequence Begins 1997: Bermuda Meeting Affirms Principle of Data Release 1997: E. coli Genome Sequenced 1997: Recommendations on Genetic Testing 1998: Private Company Announces Sequencing Plan 1998: M. Tuberculosis Bacterium Sequenced 1998: Committee on Genetic Testing 1998: HGP Map Includes 30,000 Human Genes 1998: New HGP Goals for : SNP Initiative Begins 1998: Genome of Roundworm C. elegans Sequenced 1999: Full-scale Human Genome Sequencing 1999: Chromosome 22Bermuda Meeting Affirms Principle of Data ReleaseE. coli Genome SequencedRecommendations on Genetic TestingPrivate Company Announces Sequencing PlanM. Tuberculosis Bacterium SequencedCommittee on Genetic TestingHGP Map Includes 30,000 Human GenesNew HGP Goals for 2003SNP Initiative BeginsGenome of Roundworm C. elegans SequencedFull-scale Human Genome SequencingChromosome 22

: Free Access to Genomic Information 2000: Chromosome : Working Draft 2000: Drosophila and Arabidopsis genomes sequenced 2000: Executive Order Bans Genetic Descrimination in the Federal Workplace 2000: Yeast Interactome Published 2000: Fly Model of Parkinson's Disease Reported 2001: First Draft of the Human Genome Sequence Released 2001: RNAi Shuts Off Mammalian Genes 2001: FDA Approves Genetics-based Drug to Treat LeukemiaFree Access to Genomic InformationChromosome 21Working DraftDrosophila and Arabidopsis genomes sequencedExecutive Order Bans Genetic Descrimination in the Federal WorkplaceYeast Interactome PublishedFly Model of Parkinson's Disease ReportedFirst Draft of the Human Genome Sequence ReleasedRNAi Shuts Off Mammalian GenesFDA Approves Genetics-based Drug to Treat Leukemia The President and Prime Minister Blair issued a Joint Statement in an effort to ensure that the public derives the maximum possible benefit from the sequence of the human genome.

: Mouse Genome Sequenced 2002: Researchers Find Genetic Variation Associated with Prostate Cancer 2002: Rice Genome Sequenced 2002: The International HapMap Project is Announced 2002: The Genomes to Life Program is Launched 2002: Researchers Identify Gene Linked to Bipolar Disorder 2003: Human Genome Project Completed 2003: Fiftieth Anniversary of Watson and Crick's Description of the Double Helix 2003: The First National DNA Day Celebrated 2003: ENCODE Program Begins 2003: Premature Aging Gene IdentifiedMouse Genome SequencedResearchers Find Genetic Variation Associated with Prostate CancerRice Genome SequencedThe International HapMap Project is AnnouncedThe Genomes to Life Program is LaunchedResearchers Identify Gene Linked to Bipolar DisorderHuman Genome Project CompletedFiftieth Anniversary of Watson and Crick's Description of the Double HelixThe First National DNA Day CelebratedENCODE Program BeginsPremature Aging Gene Identified

The Future 2004: Rat and Chicken Genomes Sequenced 2004: FDA Approves First Microarray 2004: Refined Analysis of Complete Human Genome Sequence 2004: Surgeon General Stresses Importance of Family History 2005: Chimpanzee Genomes Sequenced 2005: HapMap Project Completed 2005: Trypanosomatid Genomes Sequenced 2005: Dog Genomes Sequenced 2006: The Cancer Genome Atlas (TCGA) Project Started 2006: Second Non-human Primate Genome is Sequenced 2006: Initiatives to Establish the Genetic and Environmental Causes of Common Diseases Launched The FutureRat and Chicken Genomes SequencedFDA Approves First MicroarrayRefined Analysis of Complete Human Genome SequenceSurgeon General Stresses Importance of Family HistoryChimpanzee Genomes SequencedHapMap Project CompletedTrypanosomatid Genomes SequencedDog Genomes SequencedThe Cancer Genome Atlas (TCGA) Project StartedSecond Non-human Primate Genome is SequencedInitiatives to Establish the Genetic and Environmental Causes of Common Diseases Launched The Future

Retos de la genómica

"If our strands of DNA were stretched out in a line, the 46 chromosomes making up the human genome would extend more than six feet [close to two metres]. If the... length of the 100 trillion cells could be stretched out, it would be... over 113 billion miles [182 billion kilometres]. That is enough material to reach to the sun and back 610 times." [Source: Centre for Integrated Genomics] The Human Genome Project is involved in determining the exact order of the DNA bases of the entire human genome. The human genome contains more than 3.2 billion base pairs and more than genes.Human Genome Projectgenes Explosión de datos. El genoma humano

Que tanta informacion hay? NCBI - National Center for Biotechnology Information Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.

Genoma: tamaño del genoma, número de genes Human Genome: 3 billion DNA base pairs and has a data size of approximately 750 MegabytesDNAbase pairsMegabytes

Mas bases de datos especializadas.

El futuro. Análisis integrados y aplicados Pilares Retos

I. Genomics to Biology. Elucidating the structure of genomes and identifying the function of the myriad encoded elements will allow connections to be made between genomics and biology and will, in turn, accelerate the exploration of all realms of the biological sciences. II. Genómica y salud La genómica encierra la promesa del desarrollo de una medicina individualizada y el manejo de ésta para cada perfil genético.

Los últimos avances en la investigación en Ciencias Biológicas están produciendo un enorme crecimiento en el volumen y la complejidad de la información biológica disponible. Las Tecnologías de la Información y las Comunicaciones son cruciales para posibilitar el almacenamiento e interpretación de estos datos en los centros de investigación de un modo eficiente y robusto Bioinformática

Pero, ¿qué es la bioinformática?

Una definición de Bioinformática Aplicación de las tecnologías de la información en Biología Molecular Esto incluye la compilación, mantenimiento, distribución, análisis y uso de las inmensas cantidades de información biológica disponibles

2 Major research areas 2.1 Sequence analysis 2.2 Genome annotation 2.3 Computational evolutionary biology 2.4 Measuring biodiversity 2.5 Analysis of gene expression 2.6 Analysis of regulation 2.7 Analysis of protein expression 2.8 Analysis of mutations in cancer 2.9 Prediction of protein structure 2.10 Comparative genomics 2.11 Modeling biological systems 2.12 High-throughput image analysis 2.13 Protein-protein docking Principales áreas de su aplicación

Major Organizations Bioinformatics Organization (Bioinformatics.Org): The Open-Access Institute EMBnet European Bioinformatics Institute European Molecular Biology Laboratory The International Society for Computational Biology National Center for Biotechnology Information National Institutes of Health homepage Open Bioinformatics Foundation: umbrella non-profit organization supporting certain open-source projects in bioinformatics Swiss Institute of Bioinformatics Wellcome Trust Sanger Institute Major Journals Algorithms in Molecular Biology Bioinformatics BMC Bioinformatics Briefings in Bioinformatics Evolutionary Bioinformatics Genome Research The International Journal of Biostatistics Journal of Computational Biology Cancer Informatics Journal of the Royal Society Interface Molecular Systems Biology PLoS Computational Biology Statistical Applications in Genetic and Molecular Biology Transactions on Computational Biology and Bioinformatics - IEEE/ACM International Journal of Bioinformatics Research and Applications List of Bioinformatics journalsList of Bioinformatics journals at Bioinformatics.fr EMBnet.NewsEMBnet.News at EMBnet.org EMBnet is the organisation world- wide bringing bioinformatics professionals to work together to serve the expanding fields of genetics and molecular...

Software tools for bioinformatics simple command-line tools, complex graphical programs, CGI Best-known algorithms: BLAST, an algorithm for determining the similarity of arbitrary sequences against other sequences, possibly from curated databases of protein or DNA sequences. EMBOSS. Software analysis package. RSAT. Regulatory Sequence Analysis Tools. Software en Bioinformática

A bioinformatics « world » for humans

My sweet home-made bioinformatics platform Complete datasets Download Do your analysis: scripts BLAST BLAT RSAT Clustalw MEME … Download and install Parsing HTML Web page only ressources Filtered datasets Download SQL queries Perl script

My nightmare (home-made) platform Complete datasets Filtered datasets Download Perl script Download SQL queries BLAST BLAT RSAT Clustalw MEME … Download and install Do your analysis: scripts Parsing HTML Web page only ressources UPDATES NEW ANNOTATION DEPENDENCIES UPDATES LIBRARIES NEW DATABASE SCHEMA

Bye bye home-made platform…

Datos masivos. Necesidad de procesarlos e integrarlos. Los datos se encuentran en distintos servidores, en distintas bases de datos, y en distintos formatos. Problema de intercambio de datos. Muchas herramientas y se encuentran en distintos servidores, en distintas formas de acceso (CGI-Forms, HTML), distintos formatos de entrada y salida, y en distintos lenguajes. Problema de interoperabilidad (comunicación entre herramientas) Problemas :

Solución al Problema de intercambio de datos. Intercambio de datos a través de un formato definido en XML. XML permite estructurar datos y documentos en forma de árboles de etiquetas con atributos. El modelo de datos XML consiste en un árbol que no distingue entre objetos y relaciones, ni tiene noción de jerarquía de clases. Si queremos semántica (significado) Lenguajes para la definición de ontologías y metadatos en la web. RDF Schema Query Language. OWL Ontology Web Language.

Solución al Problema de interoperabilidad Un servicio web (en inglés Web service) es un conjunto de protocolos y estándares que sirven para intercambiar datos entre aplicaciones. Distintas aplicaciones de software desarrolladas en lenguajes de programación diferentes, y ejecutadas sobre cualquier plataforma, pueden utilizar los servicios web para intercambiar datos en redes de ordenadores como Internet. La interoperabilidad se consigue mediante la adopción de estándares abiertos. Las organizaciones OASIS y W3C son los comités responsables de la arquitectura y reglamentación de los servicios Web.redes de ordenadoresInternetinteroperabilidadestándares abiertosOASISW3C

Programs « talking » to programs retrieve-seq -org Saccharomyces_cerevisiae -feattype CDS -type upstream -format fasta … click #!/usr/bin/perl -w RSAT server in Bruxelles login ssh Anonymous access anywhere

A future bioinformatics « world » for computers ? I have a dream…

A future bioinformatics « world » for computers ? I have a dream… Run analysis remotely Only retrieve necessary data Data always up-to-date No need for local installation A unified way to access data and programs Programs interacting with programs over the internet

Web Services to the rescue ? Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp « Although this proposal may seem a far cry from what happens now, the technology exists to make it reality. The World Wide Web consortium, with industry heavy-weights such as IBM and Microsoft, are providing an alphabet soup of standards: SOAP/XML, WSDL, UDDI and XSDL. »

What are Web Services (WS) ? Definition: A Web service is a software system designed to support interoperable machine-to- machine interaction over a network Source: W3C: Service provider (server) client network => internet PERL script run_BLAST () blastall call run_BLAST() send back the results #!/usr/bin/perl -w

SOAP-based Web Services: SOAP: Simple Object Access Protocol Standard of the W3C with specifications: messaging with XML, HTTP for transport Various types of Web services : SOAP PERL script run_BLAST () blastall #!/usr/bin/perl -w BLAST parameters $sequence $subst_matrix $threshold XML BLAST result XML $result HTTP

Various types of Web services : SOAP PERL script run_BLAST () blastall #!/usr/bin/perl -w XML blastp SWISS MHLEGRDGRR YPGAPAVELL QTSVPSGLAE LVAGKRRLPR GAGGADPSHS XML Request envelope Response envelope BLASTP [Mar ] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25: Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272: Query= query (50 letters) Database: SWISS: SWISS sequence taken from the header [Last update Mar/02/2009] 405,506 sequences; 146,168,000 total letters Searching done Score E Sequences producing significant alignments: (bits) Value sp|Q04671|P_HUMAN RecName: Full=P protein; AltName: Full=Melanoc e-22 >sp|Q04671|P_HUMAN RecName: Full=P protein; AltName: Full=Melanocyte-specific transporter protein; AltName: Full=Pink-eyed dilution protein homolog; Length = 838 Score = 104 bits (260), Expect = 1e-22, Method: Compositional matrix adjust. Identities = 50/50 (100%), Positives = 50/50 (100%) Query: 1 MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHS 50 MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHS Sbjct: 1 MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHS 50 Database: SWISS: SWISS sequence taken from the header [Last update Mar/02/2009] Posted date: Mar 2, :30 AM Number of letters in database: 146,168,000 Number of sequences in database: 405,506 Lambda K H Gapped Lambda K H Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: Number of Hits to DB: 17,615,102 Number of extensions: Number of successful extensions: 858 Number of sequences better than 10.0: 2 Number of HSP's gapped: 858 Number of HSP's successfully gapped: 2 Length of query: 50 Length of database: 146,168,000 Length adjustment: 23 Effective length of query: 27 Effective length of database: 136,841,362 Effective search space: Effective search space used: Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.2 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (21.9 bits) S2: 62 (28.5 bits)

Various types of Web services : SOAP PERL run_BLAST () blastall SOAP::Lite SOAP::WSDL XML::Compile::WSDL11 BLAST parameters XML Client serialization ZSI SOAPpy AXIS METRO XML result deserialization PHP-SOAP

Various types of Web services : SOAP PERL run_BLAST () SOAP::Lite/Apache XML BLAST result Client ? AXIS / Tomcat deserialization serialization PHP-SOAP/ Apache blastall

Various types of Web services : SOAP PERL run_BLAST () XML BLAST result Client deserialization serialization blastall PERL BLAST parameters XML Client serialization XML result deserialization XML

WSDL: Web Services Description Language: XML « a machine-readable description of the operations offered by the service » The server « introduce himself » to the clients Names of the available services (=methods) Parameters of each service (name + type) Result of each service (type) Various types of Web services : SOAP-WSDL <definitions name="RSATWS" targetNamespace="urn:RSATWS" xmlns:tns="urn:RSATWS" xmlns:xsd=" xmlns=" xmlns:soap=" xmlns:html=" xmlns:xsl=" Parameters for the operation retrieve_seq. Return type. Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client), 'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client). Default is 'both'. Organism. Words need to be underscore separated (example: Escherichia_coli_K12). A list of query genes. Return sequences for all the genes of the organism if value = 1. Incompatible with query.

WSDL: The URL of the WSDL is necessary to « consume » a SOAP/WSDL Web Service (=write a client) Allows for automatic generation of client-side libraries « client stub » => Reduce the amount of code you have to write Various types of Web services : SOAP-WSDL parameters XML Client serialization XML result deserialization Example: to write a client for RSAT Web Services in PERL - SOAP::WSDL installed PERL library « RSATWS » downloadable on RSAT Website, generated from the WSDL

Example of code for RSAT PERL Client: Various types of Web services : SOAP-WSDL #!/usr/bin/perl –w use SOAP::WSDL; use lib 'RSATWS'; use MyInterfaces::RSATWebServices::RSATWSPortType; ## new soap object my $soap=MyInterfaces::RSATWebServices::RSATWSPortType->new(); ## parameters my %args = ('format' => text); ## Send the request to the server my $som = $soap->supported_organisms({'request' => \%args}); ## Get the result unless ($som) { printf "A fault (%s) occured: %s\n", $som->get_faultcode(), $som->get_faultstring(); } else { my $results = $som->get_response(); my $result = $results -> get_client(); print "Supported organism(s): \n".$result; }

Various types of Web services : REST RESTful Web services: HTTP transport but no messaging system Can be seen as a way to retrieve resources via their URLs Most often used for databases Often not really considered as « Web Services » Example: >gi|540023|gb|U |AMU12345 Aepyceros melampus isolate am5 D-loop, partial sequence; mitochondrial ACTACCGCTATCAATATACTCCCACAAATATCAAGAGCCTTCCCAGTATTAAATTTGCTAAAATTTTAAA AATTCAATACGAACTTCACACTCCACAGCCTCACGCGAAATTAATAATACGTATTTAAATTCTAGAGTAC ATACCATGAACTATCGTTTAGTACATGAATTTACACACGTCAGCCCGATCAAATGTTTATGTACATAACA CATTATATATGTACATTTCAGTTTGTGTATATAGACATAACATTAATGTAATAAAGACATAATATGTATA TAGTACATTAATTGATTGTCCTCAAGCATATAAGCAAGTACTAGACATTCACTAGCGGTACATAGTACAT TTCATTGTTCATCGTACATAGCGCATGTCAGNCAAATCCGTTCTTGTCAACATGCATATCCCGTCCACTA GATCAC

Web Services: pros and cons Advantages Independency of languages => interoperability Standard for accessing and describing the services Improved connectivity between the programs Possibility of constructing workflows Drawbacks Independency of languages not that straightforward to make a universal server Each language has its own implementation of the standard Heavy system (SOAP/WSDL), need maintenance by service providers Efficiency => heavy network traffic + serializing/deserializing

WS everywhere Amazon Google Extensive search engine for Web Services (currently services) (alpha version, promising)

WS in Bioinformatics /query/static/eutils_help.html

Los servicios web semánticos proponen extender estas tecnologías, en vías de consolidación, con ontologías y semántica que permitan la selección, integración e invocación dinámica de servicios, dotándoles así mismo de la capacidad de reconfigurarse dinámicamente para adaptarse a los cambios (p.e. interrupción de servicios o aparición de otros más adecuados) sin intervención humana. Agregando Significado…

¿Qué son los servicios Web semánticos? Los Servicios Web Semánticos son una nueva tecnología resultante de la combinación de la Web Semántica y los Servicios Web. Servicios Web Semánticos = Servicios Web + WebSemántica

Servicios Web y Web semántica Servicios Web: Conjunto de protocolos y estándares que permiten el intercambio de datos independientemente de plataforma y lenguaje de programación. Web Semántica: Se basa en añadir semántica a los datos publicados en la Web de forma que las máquinas sean capaces de procesar la información contenida en los documentos de modo similar a como los usuarios humanos lo pueden hacer.

¿Porqué surgen los servicios Web semánticos? Existen en la actualidad una gran cantidad de servicios disponibles y esto hace inviable en tiempo y eficiencia que sea un usuario humano el que determine el servicio o servicios necesarios para satisfacer una necesidad concreta. Debido a esto surgen los Servicios Web Semánticos los cuales describen a los Servicios Web con contenido semántico de forma que el descubrimiento de servicios, su composición e invocación se pueda realizar de forma automática por parte de entidades software capaces de procesar la información semántica disponible.

Ontologia Representa las capacidades del servicio y sus restricciones de uso. Integra la semántica del servicio con su descripción. Consta de los siguientes elementos: Información funcional del servicio: entradas, salidas, precondiciones, postcondiciones Información no funcional : Categoría, Coste,Calidad de servicio

Find Relevant Genes from Online Databases Find Associations between Frequent Terms Gene Expression Analysis WorkFlows. Conectando herramientas

Example of workflow Sand et al. Nature Protocol (2008) vol. 3 (10) pp

Taverna: a workbench to design workflows

WS in bioinformatics: Utopia ? Work is on service providers Reluctancy of service providers to add/switch to WS – Takes time and human ressources to set up WS – Necessity to find people that are WS experts or willing to learn WS Lack of advertisement Lack of a global registry Various WS: SOAP/REST + BioMOBY + SOAPLAB All accessed in different ways Lack of users !!!

A future bioinformatics « world » for computers ? I still have a dream…

Acknowledgements Prof. Jacques van Helden Dr. Morgan Thomas Grupo: Luis José Muniz Rascado, Jair, Lilian, Shirley, Ale, Aura Dr. Julio Collado Vides