1,625 research outputs found

    APPLICATION OF INFORMATION SYSTEMS AND TOOLS IN BIOINFORMATICS

    Get PDF
    The pace at which scientific data is produced and disseminated has never been as high as it is currently. Modern sequencing technologies make it possible to obtain the genome of a specific organism in a few days, and the genome of a bacterial organism in less than a day, and therefore researchers from the field of life science are faced with a huge amount of data that needs to be analyzed. In this connection, various fields of science are converging with each other, giving rise to new disciplines. So, bioinformatics is one of these fields, it is a scientific discipline that has been actively developing over the past decades and uses IT tools and methods to solve problems related to the study of biological processes. In particular, a crucial role in the field of bioinformatics is played by the development of new algorithms, tools and the creation of new databases, as well as the integration of extremely large amounts of data. The rapid development of bioinformatics has made it possible to conduct modern biological research. Bioinformatics can help a biologist to extract valuable information from biological data by using tools to process them. Despite the fact that bioinformatics is a relatively new discipline, various web and computer tools already exist, most of which are freely available. This is a review article that provides an exhaustive overview of some of the tools for biological analysis available to a biologist, as well as describes the key role of information systems in this interdisciplinary field

    Centralising Labels to Distribute Data: The Regulatory Role of Genomic Consortia.

    Get PDF
    publication-status: Publishe

    Web services for transcriptomics

    Get PDF
    Transcriptomics is part of a family of disciplines focussing on high throughput molecular biology experiments. In the case of transcriptomics, scientists study the expression of genes resulting in transcripts. These transcripts can either perform a biological function themselves or function as messenger molecules containing a copy of the genetic code, which can be used by the ribosomes as templates to synthesise proteins. Over the past decade microarray technology has become the dominant technology for performing high throughput gene expression experiments. A microarray contains short sequences (oligos or probes), which are the reverse complement of fragments of the targets (transcripts or sequences derived thereof). When genes are expressed, their transcripts (or sequences derived thereof) can hybridise to these probes. Many thousand copies of a probe are immobilised in a small region on a support. These regions are called spots and a typical microarray contains thousands or sometimes even more than a million spots. When the transcripts (or sequences derived thereof) are fluorescently labelled and it is known which spots are located where on the support, a fluorescent signal in a certain region represents expression of a certain gene. For interpretation of microarray data it is essential to make sure the oligos are specific for their targets. Hence for proper probe design one needs to know all transcripts that may be expressed and how well they can hybridise with candidate oligos. Therefore oligo design requires: 1. A complete reference genome assembly. 2. Complete annotation of the genome to know which parts may be transcribed. 3. Insight in the amount of natural variation in the genomes of different individuals. 4. Knowledge on how experimental conditions influence the ability of probes to hybridise with certain transcripts. Unfortunately such complete information does not exist, but many microarrays were designed based on incomplete data nevertheless. This can lead to a variety of problems including cross-hybridisation (non-specific binding), erroneously annotated and therefore misleading probes, missing probes and orphan probes. Fortunately the amount of information on genes and their transcripts increases rapidly. Therefore, it is possible to improve the reliability of microarray data analysis by regular updates of the probe annotation using updated databases for genomes and their annotation. Several tools have been developed for this purpose, but these either used simplistic annotation strategies or did not support our species and/ or microarray platforms of interest. Therefore, we developed OligoRAP (Oligo Re- Annotation Pipeline), which is described in chapter 2. OligoRAP was designed to take advantage of amongst others annotation provided by Ensembl, which is the largest genome annotation effort in the world. Thereby OligoRAP supports most of the major animal model organisms including farm animals like chicken and cow. In addition to support for our species and array platforms of interest OligoRAP employs a new annotation strategy combining information from genome and transcript databases in a non-redundant way to get the most complete annotation possible. In chapter 3 we compared annotation generated with 3 oligo annotation pipelines including OligoRAP and investigated the effect on functional analysis of a microarray experiment involving chickens infected with Eimeria bacteria. As an example of functional analysis we investigated if up- or downregulated genes were enriched for Terms from the Gene Ontology (GO). We discovered that small differences in annotation strategy could lead to alarmingly large differences in enriched GO terms. Therefore it is important to know, which annotation strategy works best, but it was not possible to assess this due to the lack of a good reference or benchmark dataset. There are a few limited studies investigating the hybridisation potential of imperfect alignments of oligos with potential targets, but in general such data is scarce. In addition it is difficult to compare these studies due to differences in experimental setup including different hybridisation temperatures and different probe lengths. As result we cannot determine exact thresholds for the alignments of oligos with non-targets to prevent cross-hybridisation, but from these different studies we can get an idea of the range for the thresholds that would be required for optimal target specificity. Note that in these studies experimental conditions were first optimised for an optimal signal to noise ratio for hybridisation of oligos with targets. Then these conditions were used to determine the thresholds for alignments of oligos with non-targets to prevent cross-hybridisation. Chapter 4 describes a parameter sweep using OligoRAP to explore hybridisation potential thresholds from a different perspective. Given the mouse genome thresholds were determined for the largest amount of gene specific probes. Using those thresholds we then determined thresholds for optimal signal to noise ratios. Unfortunately the annotation-based thresholds we found did not fall within the range of experimentally determined thresholds; in fact they were not even close. Hence what was experimentally determined to be optimal for the technology was not in sync with what was determined to be optimal for the mouse genome. Further research will be required to determine whether microarray technology can be modified in such a way that it is better suited for gene expression experiments. The requirement of a priori information on possible targets and the lack of sufficient knowledge on how experimental conditions influence hybridisation potential can be considered the Achiles’ heels of microarray technology. Chapter 5 is a collection of 3 application notes describing other tools that can aid in analysis of transcriptomics data. Firstly, RShell, which is a plugin for the Taverna workbench allowing users to execute statistical computations remotely on R-servers. Secondly, MADMAX services, which provide quality control and normalisation of microarray data for AffyMetrix arrays. Finally, GeneIlluminator, which is a tool to disambiguate gene symbols allowing researchers to specifically retrieve literature for their genes of interest even if the gene symbols for those genes had many synonyms and homonyms. Web services High throughput experiments like those performed in transcriptomics usually require subsequent analysis with many different tools to make biological sense of the data. Installing all these tools on a single, local computer and making them compatible so users can build analysis pipelines can be very cumbersome. Therefore distributed analysis strategies have been explored extensively over the past decades. In a distributed system providers offer remote access to tools and data via the Internet allowing users to create pipelines from modules from all over the globe. Chapter 1 provides an overview of the evolution of web services, which represent the latest breed in technology for creating distributed systems. The major advantage of web services over older technology is that web services are programming language independent, Internet communication protocol independent and operating system independent. Therefore web services are very flexible and most of them are firewall-proof. Web services play a major role in the remaining chapters of this thesis: OligoRAP is a workflow entirely made from web services and the tools described in chapter 5 all provide remote programmatic access via web service interfaces. Although web services can be used to build relatively complex workflows like OligoRAP, a lack of mainly de facto standards and of user-friendly clients has limited the use of web services to bioinformaticians. A semantic web where biologists can easily link web services into complex workflows does n <br/

    Molecular phylogenetic analysis: design and implementation of scalable and reliable algorithms and verification of phylogenetic properties

    Get PDF
    El término bioinformática tiene muchas acepciones, una gran parte referentes a la bioinformática molecular: el conjunto de métodos matemáticos, estadísticos y computacionales que tienen como objetivo dar solución a problemas biológicos, haciendo uso exclusivamente de las secuencias de ADN, ARN y proteínas y su información asociada. La filogenética es el área de la bioinformática encargada del estudio de la relación evolutiva entre organismos de la misma o distintas especies. Al igual que sucedía con la definición anterior, los trabajos realizados a lo largo de esta tesis se centran en la filogenética molecular: la rama de la filogenética que analiza las mutaciones hereditarias en secuencias biológicas (principalmente ADN) para establecer dicha relación evolutiva. El resultado de este análisis se plasma en un árbol evolutivo o filogenia. Una filogenia suele representarse como un árbol con raíz, normalmente binario, en el que las hojas simbolizan los organismos existentes actualmente y, la raíz, su ancestro común. Cada nodo interno representa una mutación que ha dado lugar a una división en la clasificación de los descendientes. Las filogenias se construyen mediante procesos de inferencia en base a la información disponible, que pertenece mayoritariamente a organismos existentes hoy en día. La complejidad de este problema se ha visto reflejada en la clasificación de la mayoría de métodos propuestos para su solución como NP-duros [1-3].El caso real de aplicación de esta tesis ha sido el ADN mitocondrial. Este tipo de secuencias biológicas es relevante debido a que tiene un alto índice de mutación, por lo que incluso filogenias de organismos muy cercanos evolutivamente proporcionan datos significativos para la comunidad biológica. Además, varias mutaciones del ADN mitocondrial humano se han relacionado directamente con enfermedad y patogenias, la mayoría mortales en individuos no natos o de corta edad. En la actualidad hay más de 30000 secuencias disponibles de ADN mitocondrial humano, lo que, además de su utilidad científica, ha permitido el análisis de rendimiento de nuestras contribuciones para datos masivos (Big Data). La reciente incorporación de la bioinformática en la categoría Big Data viene respaldada por la mejora de las técnicas de digitalización de secuencias biológicas que sucedió a principios del siglo 21 [4]. Este cambio aumentó drásticamente el número de secuencias disponibles. Por ejemplo, el número de secuencias de ADN mitocondrial humano pasó de duplicarse cada cuatro años, a hacerlo en menos de dos. Por ello, un gran número de métodos y herramientas usados hasta entonces han quedado obsoletos al no ser capaces de procesar eficientemente estos nuevos volúmenes de datos.Este es motivo por el que todas las aportaciones de esta tesis han sido desarrolladas para poder tratar grandes volúmenes de datos. La contribución principal de esta tesis es un framework que permite diseñar y ejecutar automáticamente flujos de trabajo para la inferencia filogenética: PhyloFlow [5-7]. Su creación fue promovida por el hecho de que la mayoría de sistemas de inferencia filogenética existentes tienen un flujo de trabajo fijo y no se pueden modificar ni las herramientas software que los componen ni sus parámetros. Esta decisión puede afectar negativamente a la precisión del resultado si el flujo del sistema o alguno de sus componentes no está adaptado a la información biológica que se va a utilizar como entrada. Por ello, PhyloFlow incorpora un proceso de configuración que permite seleccionar tanto cada uno de los procesos que formarán parte del sistema final, como las herramientas y métodos específicos y sus parámetros. Se han incluido consejos y opciones por defecto durante el proceso de configuración para facilitar su uso, sobre todo a usuarios nóveles. Además, nuestro framework permite la ejecución desatendida de los sistemas filogenéticos generados, tanto en ordenadores de sobremesa como en plataformas hardware (clusters, computación en la nube, etc.). Finalmente, se han evaluado las capacidades de PhyloFlow tanto en la reproducción de sistemas de inferencia filogenética publicados anteriormente como en la creación de sistemas orientados a problemas intensivos como el de inferencia del ADN mitocondrial humano. Los resultados muestran que nuestro framework no solo es capaz de realizar los retos planteados, sino que, en el caso de la replicación de sistemas, la posibilidad de configurar cada elemento que los componen mejora ampliamente su aplicabilidad.Durante la implementación de PhyloFlow descubrimos varias carencias importantes en algunas bibliotecas software actuales que dificultaron la integración y gestión de las herramientas filogenéticas. Por este motivo se decidió crear la primera biblioteca software en Python para estudios de filogenética molecular: MEvoLib [8]. Esta biblioteca ha sido diseñada para proveer una sola interfaz para los conjuntos de herramientas software orientados al mismo proceso, como el multialineamiento o la inferencia de filogenias. MEvoLib incluye además configuraciones por defecto y métodos que hacen uso de conocimiento biológico específico para mejorar su precisión, adaptándose a las necesidades de cada tipo de usuario. Como última característica relevante, se ha incorporado un proceso de conversión de formatos para los ficheros de entrada y salida de cada interfaz, de forma que, si la herramienta seleccionada no soporta dicho formato, este es adaptado automáticamente. Esta propiedad facilita el uso e integración de MEvoLib en scripts y herramientas software.El estudio del caso de aplicación de PhyloFlow al ADN mitocondrial humano ha expuesto los elevados costes tanto computacionales como económicos asociados a la inferencia de grandes filogenias. Por ello, sistemas como PhyloTree [9], que infiere un tipo especial de filogenias de ADN mitocondrial humano, recalculan sus resultados con una frecuencia máxima anual. Sin embargo, como ya hemos comentado anteriormente, las técnicas de secuenciación actuales permiten la incorporación de cientos o incluso miles de secuencias biológicas nuevas cada mes. Este desfase entre productor y consumidor hace que dichas filogenias queden desactualizadas en unos pocos meses. Para solucionar este problema hemos diseñado un nuevo algoritmo que permite la actualización de una filogenia mediante la incorporación iterativa de nuevas secuencias: PHYSER [10]. Además, la propia información evolutiva se utiliza para detectar posibles mutaciones introducidas artificialmente por el proceso de secuenciación, inexistentes en la secuencia original. Las pruebas realizadas con ADN mitocondrial han probado su eficacia y eficiencia, con un coste temporal por secuencia inferior a los 20 segundos.El desarrollo de nuevas herramientas para el análisis de filogenias también ha sido una parte importante de esta tesis. En concreto, se han realizado dos aportaciones principales en este aspecto: PhyloViewer [11] y una herramienta para el análisis de la conservación [12]. PhyloViewer es un visualizador de filogenias extensivas, es decir, filogenias que poseen al menos un millar de hojas. Esta herramienta aporta una novedosa interfaz en la que se muestra el nodo seleccionado y sus nodos hijo, así como toda la información asociada a cada uno de ellos: identificador, secuencia biológica, ... Esta decisión de diseño ha sido orientada a evitar el habitual “borrón” que se produce en la mayoría de herramientas de visualización al mostrar este tipo de filogenias enteras por pantalla. Además, se ha desarrollado en una arquitectura clienteservidor, con lo que el procesamiento de la filogenia se realiza una única vez por parte el servidor. Así, se ha conseguido reducir significativamente los tiempos de carga y acceso por parte del cliente. Por otro lado, la aportación principal de nuestra herramienta para el análisis de la conservación se basa en la paralelización de los métodos clásicos aplicados en este campo, alcanzando speed-ups cercanos al teórico sin pérdida de precisión. Esto ha sido posible gracias a la implementación de dichos métodos desde cero, incorporando la paralelización a nivel de instrucción, en vez de paralelizar implementaciones existentes. Como resultado, nuestra herramienta genera un informe que contiene las conclusiones del análisis de conservación realizado. El usuario puede introducir un umbral de conservación para que el informe destaque solo aquellas posiciones que no lo cumplan. Además, existen dos tipos de informe con distinto nivel de detalle. Ambos se han diseñado para que sean comprensibles y útiles para los usuarios.Finalmente, se ha diseñado e implementado un predictor de mutaciones patógenas en ADN mitocondrial desarollado en máquinas de vectores de soporte (SVM): Mitoclass.1 [13]. Se trata del primer predictor para este tipo de secuencias biológicas. Tanto es así, que ha sido necesario crear el primer repositorio de mutaciones patógenas conocidas, mdmv.1, para poder entrenar y evaluar nuestro predictor. Se ha demostrado que Mitoclass.1 mejora la clasificación de las mutaciones frente a los predictores más conocidos y utilizados, todos ellos orientados al estudio de patogenicidad en ADN nuclear. Este éxito radica en la novedosa combinación de propiedades a evaluar por cada mutación en el proceso de clasificación. Además, otro factor a destacar es el uso de SVM frente a otras alternativas, que han sido probadas y descartadas debido a su menor capacidad de predicción para nuestro caso de aplicación.REFERENCIAS[1] L. Wang and T. Jiang, “On the complexity of multiple sequence alignment,” Journal of computational biology, vol. 1, no. 4, pp. 337–348, 1994.[2] W. H. E. Day, D. S. Johnson, and D. Sankoff, “The Computational Complexity of Inferring Rooted Phylogenies by Parsimony,” Mathematical Biosciences, vol. 81, no. 1, pp. 33–42, 1986.[3] S. Roch, “A short proof that phylogenetic tree reconstruction by maximum likelihood is hard,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol. 3, no. 1, p. 92, 2006.[4] E. R. Mardis, “The impact of next-generation sequencing technology on genetics,” Trends in genetics, vol. 24, no. 3, pp. 133–141, 2008.[5] J. Álvarez-Jarreta, G. de Miguel Casado, and E. Mayordomo, “PhyloFlow: A Fully Customizable and Automatic Workflow for Phylogeny Estimation,” in ECCB 2014, 2014.[6] J. Álvarez-Jarreta, G. de Miguel Casado, and E. Mayordomo, “PhyloFlow: A Fully Customizable and Automatic Workflow for Phylogenetic Reconstruction,” in IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1–7, IEEE, 2014.[7] J. Álvarez, R. Blanco, and E. Mayordomo, “Workflows with Model Selection: A Multilocus Approach to Phylogenetic Analysis,” in 5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011), vol. 93 of Advances in Intelligent and Soft Computing, pp. 39–47, Springer Berlin Heidelberg, 2011.[8] J. Álvarez-Jarreta and E. Ruiz-Pesini, “MEvoLib v1.0: the First Molecular Evolution Library for Python,” BMC Bioinformatics, vol. 17, no. 436, pp. 1–8, 2016.[9] M. van Oven and M. Kayser, “Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation,” Human Mutation, vol. 30, no. 2, pp. E386–E394, 2009.[10] J. Álvarez-Jarreta, E. Mayordomo, and E. Ruiz-Pesini, “PHYSER: An Algorithm to Detect Sequencing Errors from Phylogenetic Information,” in 6th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2012), pp. 105–112, 2012.[11] J. Álvarez-Jarreta and G. de Miguel Casado, “PhyloViewer: A Phylogenetic Tree Viewer for Extense Phylogenies,” in ECCB 2014, 2014.[12] F. Merino-Casallo, J. Álvarez-Jarreta, and E. Mayordomo, “Conservation in mitochondrial DNA: Parallelized estimation and alignment influence,” in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2015), pp. 1434–1440, IEEE, 2015.[13] A. Martín-Navarro, A. Gaudioso-Simón, J. Álvarez-Jarreta, J. Montoya, E. Mayordomo, and E. Ruiz-Pesini, “Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides,” BMC Bioinformatics, vol. 18, no. 158, pp. 1–11, 2017.<br /

    eHive: An Artificial Intelligence workflow system for genomic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.</p> <p>Results</p> <p>We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.</p> <p>Conclusions</p> <p>eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: <url>http://www.ensembl.org/info/docs/eHive/</url>.</p

    High-throughput biodiversity analysis: Rapid assessment of species richness and ecological interactions of Chrysomelidae (Coleoptera) in the tropics

    Get PDF
    Biodiversity assessment has been the focus of intense debate and conceptual and methodological advances in recent years. The cultural, academic and aesthetic impulses to recognise and catalogue the diversity in our surroundings, in this case of living objects, is furthermore propelled by the urgency of understanding that we may be responsible for a dramatic reduction of biodiversity, comparable in magnitude to geological mass extinctions. One of the most important advances in this attempt to characterise biodiversity has been incorporating DNA-based characters and molecular taxonomy tools to achieve faster and more efficient species delimitation and identification, even in hyperdiverse tropical biomes. In this assay we advocate for a broad understanding of Biodiversity as the inventory of species in a given environment, but also the diversity of their interactions, with both aspects being attainable using molecular markers and phylogenetic approaches. We exemplify the suitability and utility of this framework for large-scale biodiversity assessment with the results of our ongoing projects trying to characterise the communities of leaf beetles and their host plants in several tropical setups. Moreover, we propose that approaches similar to ours, establishing the inventories of two ecologically inter-related and species-rich groups of organisms, such as insect herbivores and their angiosperm host-plants, can serve as the foundational stone to anchor a comprehensive assessment of diversity, also in tropical environments, by subsequent addition of trophic levels.The ‘Fundación BBVA’ (Spain) has funded the bulk of this work thanks to their support for our large-scale biodiversity assessment initiative in Nicaraguan tropical dry forests (project BIOCON08-045, IP: JGZ). Our work in Nicaragua has benefited from a postdoctoral ‘Juan de la Cierva’ contract (Spanish Ministry of Science and Innovation, MICINN) to AP, and an AECID predoctoral studentship (Spanish Ministry of Foreign Affairs and Cooperation) and a SENESCYT scholarship (Secretariat of High Education, Science, Technology and Innovation, Ecuador) to GDC. The National Geographic Society supported most of our research in New Caledonia (project 8380-07, IP: JGZ) with help from a travel grant awarded by the Percy Sladen Memorial Fund of the Linnean Society of London to JGZ. The Spanish High Research Council (CSIC), in the framework of a cooperation agreement with the Vietnamese Academy of Sciences, supports our work in dry tropical forests of southern Vietnam (IP: JGZ) as well as a predoctoral studentship to DTN. Several EU Synthesys research stays (GB-TAF-1840, SE-TAF-1893, DE-TAF-4348) and a Mayr Travel Grant (Harvard University) as well as project CGL2008-00007/BOS (MICINN, IP: JGZ) have contributed to the discovery of a new tropical species of Calligrapha, and the latter also framed the predoctoral studentship to TM.Peer Reviewe

    Sustaining large-scale infrastructure to promote pre-competitive biomedical research: lessons from mouse genomics.

    Get PDF
    Bio-repositories and databases for biomedical research enable the efficient community-wide sharing of reagents and data. These archives play an increasingly prominent role in the generation and dissemination of bioresources and data essential for fundamental and translational research. Evidence suggests, however, that current funding and governance models, generally short-term and nationally focused, do not adequately support the role of archives in long-term, transnational endeavours to make and share high-impact resources. Our qualitative case study of the International Knockout Mouse Consortium and the International Mouse Phenotyping Consortium examines new governance mechanisms for archive sustainability. Funders and archive managers highlight in interviews that archives need stable public funding and new revenue-generation models to be sustainable. Sustainability also requires archives, journal publishers, and funders to implement appropriate incentives, associated metrics, and enforcement mechanisms to ensure that researchers use archives to deposit reagents and data to make them publicly accessible for academia and industry alike.This work was supported by the NorCOMM2 Project funded by Genome Canada [AM and TB]; the Ontario Genome Institute [TB]; and the Canadian Stem Cell Network [TB]. The authors have no competing interests. The funders of the study exerted no influence on the design and conduct of the study or on the analysis and presentation of results. We thank Lesley Dacks, Lorna Skaley and Dr Ann Flenniken for research support and project coordination. We thank the participants who took time out of their busy schedules for interviews and to review our analyses. We thank Dr Farah Huzair for feedback and the IMPC leadership for permission to use a modified version (Figure 1) of the Consortium's map of its global membership (http://www.mousephenotype.org/about-impc/impc-members). We are grateful to Drs Andy Smith and John Hancock for advice on the ELIXIR funding and governance model.This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.nbt.2015.10.00

    From genes to functional classes in the study of biological systems

    Get PDF
    BACKGROUND: With the popularisation of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. RESULTS: Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. CONCLUSION: Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at:
    corecore