22 research outputs found

    Functional profiling of genome-scale experiments: new approaches leading to a systemic analysis

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 31-10-200

    GEPAS, a web-based tool for microarray data analysis and interpretation

    Get PDF
    Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org

    Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling

    Get PDF
    Babelomics is a response to the growing necessity of integrating and analyzing different types of genomic data in an environment that allows an easy functional interpretation of the results. Babelomics includes a complete suite of methods for the analysis of gene expression data that include normalization (covering most commercial platforms), pre-processing, differential gene expression (case-controls, multiclass, survival or continuous values), predictors, clustering; large-scale genotyping assays (case controls and TDTs, and allows population stratification analysis and correction). All these genomic data analysis facilities are integrated and connected to multiple options for the functional interpretation of the experiments. Different methods of functional enrichment or gene set enrichment can be used to understand the functional basis of the experiment analyzed. Many sources of biological information, which include functional (GO, KEGG, Biocarta, Reactome, etc.), regulatory (Transfac, Jaspar, ORegAnno, miRNAs, etc.), text-mining or protein–protein interaction modules can be used for this purpose. Finally a tool for the de novo functional annotation of sequences has been included in the system. This provides support for the functional analysis of non-model species. Mirrors of Babelomics or command line execution of their individual components are now possible. Babelomics is available at http://www.babelomics.org

    BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments

    Get PDF
    We present a new version of Babelomics, a complete suite of web tools for functional analysis of genome-scale experiments, with new and improved tools. New functionally relevant terms have been included such as CisRed motifs or bioentities obtained by text-mining procedures. An improved indexing has considerably speeded up several of the modules. An improved version of the FatiScan method for studying the coordinate behaviour of groups of functionally related genes is presented, along with a similar tool, the Gene Set Enrichment Analysis. Babelomics is now more oriented to test systems biology inspired hypotheses. Babelomics can be found at

    Literature mining and network analysis in Biology

    Get PDF
    Η παρούσα διπλωματική παρουσιάζει το OnTheFly2.0, ένα διαδικτυακό εργαλείο που επικεντρώνεται στην εξαγωγή και επακόλουθη ανάλυση βιοϊατρικών όρων από μεμονωμένα αρχεία. Συγκεκριμένα, το OnTheFly2.0 υποστηρίζει πολλούς διαφορετικούς επιτρέποντας τον παράλληλο χειρισμό τους. Μέσω της ενσωμάτωσης της υπηρεσίας EXTRACT υλοποιείται η Αναγνώριση Ονοματικών Οντοτήτων (Named Entity Recognition) για γονίδια/πρωτεΐνες, χημικές ουσίες, οργανισμούς, ιστούς, περιβάλλοντα, ασθένειες, φαινοτύπους και όρους οντολογίας γονιδίων (Gene Ontology terms), καθώς και η δημιουργία αναδυόμενων παραθύρων που παρέχουν πληροφορίες για τον αναγνωρισμένο όρο, συνοδευόμενες από σύνδεσμο για διάφορες βάσεις δεδομένων. Οι αναγνωρισμένες πρωτεΐνες, τα γονίδια και οι χημικές ουσίες μπορούν να επεξεργαστούν περαιτέρω μέσω αναλύσεων εμπλουτισμού για τη λειτουργικότητα και τη βιβλιογραφία ή να συσχετιστούν με ασθένειες και πρωτεϊνικές δομές. Τέλος, είναι δυνατή η απεικόνιση αλληλεπιδράσεων μεταξύ πρωτεϊνών ή μεταξύ πρωτεϊνών και χημικών ουσιών μέσω της δημιουργίας διαδραστικών δικτύων από τις βάσεις STRING και STITCH αντίστοιχα. Το OnTheFly2.0 υποστηρίζει 197 διαφορετικά είδη οργανισμών και είναι διαθέσιμο στον παρακάτω σύνδεσμο: http://onthefly.pavlopouloslab.info.The particular thesis presents OnTheFly2.0, a web-based, versatile tool dedicated to the extraction and subsequent analysis of biomedical terms from individual files. More specifically, OnTheFly2.0 supports different file formats, enabling simultaneous file handling. The integration of the EXTRACT tagging service allows the implementation of Named Entity Recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and Gene Ontology terms, as well as the generation of popup windows which provide concise, context related information about the identified term, accompanied by links to various databases. Once named entities, such as proteins, genes and chemicals are identified, they can be further explored via functional and publication enrichment analysis or be associated with diseases and protein domains reporting from protein family databases. Finally, visualization of protein-protein and protein-chemical associations is possible through the generation of interactive networks from the STRING and STITCH services, respectively. OnTheFly2.0 currently supports 197 species and is available at http://onthefly.pavlopouloslab.info

    Literature-aided interpretation of gene expression data with the weighted global test

    Get PDF
    Most methods for the interpretation of gene expression profiling experiments rely on the categorization of genes, as provided by the Gene Ontology (GO) and pathway databases. Due to the manual curation process, such databases are never up-to-date and tend to be limited in focus and coverage. Automated literature mining tools provide an attractive, alternative approach. We review how they can be employed for the interpretation of gene expression profiling experiments. We illustrate that their comprehensive scope aids the interpretation of data from domains poorly covered by GO or alternative databases, and allows for the linking of gene expression with diseases, drugs, tissues and other types of concepts. A framework for proper statistical evaluation of the associations between gene expression values and literature concepts was lacking and is now implemented in a weighted extension of global test. The weights are the literature association scores and reflect the importance of a gene for the concept of interest. In a direct comparison with classical GO-based gene sets, we show that use of literature-based associations results in the identification of much more specific GO categories. We demonstrate the possibilities for linking of gene expression data to patient survival in breast cancer and the action and metabolism of drugs. Coupling with online literature mining tools ensures transparency and allows further study of the identified associations. Literature mining tools are therefore powerful additions to the toolbox for the interpretation of high-throughput genomics data.UB – Publicatie

    Aineistojen yhdistämismenetelmiä genominlaajuisten syöpäaineistojen tulkintaan

    Get PDF
    The genetic alterations of cancer cells vary between individuals and during the progression of the disease. The advances in measurement techniques have enabled genome-scale profiling of mutations, transcription, and DNA methylation. These methods can be used to address the complexity of the disease but also raise an acute demand for the analysis of the high dimensional data sets produced. An integrative and scalable computational infrastructure is advantageous in cancer research. First, a multitude of programs and analytic steps are needed when integrating various measurement types. An efficient execution and management of such projects saves time and reduces the probability of mistakes. Second, new information and methods can be utilised with a minor effort of re-executing the workflow. Third, a formal description of the program interfaces and the workflows aids collaboration, testing, and reuse of the work done. Fourth, the number of samples available is often small in comparison with the unknown variables, such as possibly affected genes, of interest. The interpretation of new measurements in the context of existing information may limit the number of false positives when sensitive methods are needed. We have introduced new computational methods for the data integration and for the management of large and heterogeneous data sets. The suitability of the methods has been demonstrated with four cancer studies covering a wide spectrum of data from population genetics to the details of the transcriptional regulation of proteins, such as androgen receptor and forkhead box protein A1. The repeatable workflows established for these colorectal cancer, glioblastoma, and prostate cancer studies have been used to maintain up-to-date registries of results for follow-up studies.Syöpäsolujen geneettiset muutokset vaihtelevat potilaittain ja taudin edetessä. Mittausmenetelmien kehittyminen on mahdollistanut mutaatioiden, transkription, sekä DNA-metylaation genominlaajuisen kartoittamisen. Genomin kattavia menetelmiä voidaan käyttää monitekijäisten syöpäsairauksien tutkimuksessa, mutta niiden myötä on syntynyt tarve moniulotteisen tiedon tarkasteluun soveltuville menetelmille. Joitakin syöpätutkimukseen liittyviä haasteita voidaan ratkaista yhdistävällä ja skaalautuvalla laskennallisella infrastruktuurilla. Ensimmäiseksi, erilaisten mittausten yhdistämiseen tarvitaan useita sovelluksia ja tarkasteluvaiheita. Kokonaisuuden automatisoitu suoritus ja hallinta säästävät aikaa ja pienentävät virheiden mahdollisuutta. Toiseksi, uutta tietoa ja menetelmiä päästään hyödyntämään pienellä vaivalla uudelleen suorittamalla työnkulku. Kolmanneksi, ohjelmistorajapintojen ja työnkulkujen määrämuotoinen kuvaus helpottavat yhteistyötä, testausta ja tehdyn työn uudelleenkäyttöä. Neljänneksi, saatavilla olevien näytteiden lukumäärä on usein pieni verrattuna kiinnostuksen kohteena oleviin tuntemattomiin muuttujiin, kuten mahdollisesti vioittuneisiin geeneihin. Uusien mittausten tulkinta olemassa olevan tiedon yhteydessä saattaa vähentää väärien positiivisten määrää kun tarvitaan herkkiä menetelmiä. Olemme esitelleet uusia laskennallisia menetelmiä tiedon yhdistelyyn, sekä laajojen ja vaihtelevan muotoisten aineistojen käsittelyyn. Menetelmien käyttökelpoisuutta olemme havainnollistaneet soveltamalla niitä neljässä syöpätutkimuksessa, jotka liittyvät paksunsuolen syöpään, glioblastoomaan ja eturauhassyöpään. Tutkimusten aihealueet kattavat kirjon populaatiogenetiikasta transkriptiotekijöiden, kuten androgeenireseptorin ja FoxA1:n toiminnan, yksityiskohtiin. Tutkimusten puitteissa toistettavaan muotoon rakennetut työnkulut ovat tuloksineen tarjonneet ajantasaisen tietolähteen pohjaksi jatkotutkimuksille
    corecore