248 research outputs found

    Biclustering of gene expression data by non-smooth non-negative matrix factorization

    Get PDF
    BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. RESULTS: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. CONCLUSION: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms

    GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists

    Get PDF
    We present GENECODIS, a web-based tool that integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by statistical significance. The analysis of concurrent annotations provides significant information for the biologic interpretation of high-throughput experiments and may outperform the results of standard methods for the functional analysis of gene lists. GENECODIS is publicly available at

    Discovering semantic features in the literature: a foundation for building functional associations

    Get PDF
    BACKGROUND: Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. RESULTS: We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. CONCLUSION: The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data

    GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information

    Get PDF
    GeneCodis is a web server application for functional analysis of gene lists that integrates different sources of information and finds modular patterns of interrelated annotations. This integrative approach has proved to be useful for the interpretation of high-throughput experiments and therefore a new version of the system has been developed to expand its functionality and scope. GeneCodis now expands the functional information with regulatory patterns and user-defined annotations, offering the possibility of integrating all sources of information in the same analysis. Traditional singular enrichment is now permitted and more organisms and gene identifiers have been added to the database. The application has been re-engineered to improve performance, accessibility and scalability. In addition, GeneCodis can now be accessed through a public SOAP web services interface, enabling users to perform analysis from their own scripts and workflows. The application is freely available at http://genecodis.dacya.ucm.e

    Integrated analysis of gene expression by association rules discovery

    Get PDF
    BACKGROUND: Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process. RESULTS: In this study we present a method for the integrative analysis of microarray data based on the Association Rules Discovery data mining technique. The approach integrates gene annotations and expression data to discover intrinsic associations among both data sources based on co-occurrence patterns. We applied the proposed methodology to the analysis of gene expression datasets in which genes were annotated with metabolic pathways, transcriptional regulators and Gene Ontology categories. Automatically extracted associations revealed significant relationships among these gene attributes and expression patterns, where many of them are clearly supported by recently reported work. CONCLUSION: The integration of external biological information and gene expression data can provide insights about the biological processes associated to gene expression programs. In this paper we show that the proposed methodology is able to integrate multiple gene annotations and expression data in the same analytic framework and extract meaningful associations among heterogeneous sources of data. An implementation of the method is included in the Engene software package

    SENT: semantic features in text

    Get PDF
    We present SENT (semantic features in text), a functional interpretation tool based on literature analysis. SENT uses Non-negative Matrix Factorization to identify topics in the scientific articles related to a collection of genes or their products, and use them to group and summarize these genes. In addition, the application allows users to rank and explore the articles that best relate to the topics found, helping put the analysis results into context. This approach is useful as an exploratory step in the workflow of interpreting and understanding experimental data, shedding some light into the complex underlying biological mechanisms. This tool provides a user-friendly interface via a web site, and a programmatic access via a SOAP web server. SENT is freely accessible at http://sent.dacya.ucm.es

    MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures

    Get PDF
    The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es

    bioNMF: a versatile tool for non-negative matrix factorization in biology

    Get PDF
    BACKGROUND: In the Bioinformatics field, a great deal of interest has been given to Non-negative matrix factorization technique (NMF), due to its capability of providing new insights and relevant information about the complex latent relationships in experimental data sets. This method, and some of its variants, has been successfully applied to gene expression, sequence analysis, functional characterization of genes and text mining. Even if the interest on this technique by the bioinformatics community has been increased during the last few years, there are not many available simple standalone tools to specifically perform these types of data analysis in an integrated environment. RESULTS: In this work we propose a versatile and user-friendly tool that implements the NMF methodology in different analysis contexts to support some of the most important reported applications of this new methodology. This includes clustering and biclustering gene expression data, protein sequence analysis, text mining of biomedical literature and sample classification using gene expression. The tool, which is named bioNMF, also contains a user-friendly graphical interface to explore results in an interactive manner and facilitate in this way the exploratory data analysis process. CONCLUSION: bioNMF is a standalone versatile application which does not require any special installation or libraries. It can be used for most of the multiple applications proposed in the bioinformatics field or to support new research using this method. This tool is publicly available at

    Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology

    Get PDF
    Non-negative matrix factorization is a useful tool for reducing the dimension of large datasets. This work considers simultaneous non-negative matrix factorization of multiple sources of data. In particular, we perform the first study that involves more than two datasets. We discuss the algorithmic issues required to convert the approach into a practical computational tool and apply the technique to new gene expression data quantifying the molecular changes in four tissue types due to different dosages of an experimental panPPAR agonist in mouse. This study is of interest in toxicology because, whilst PPARs form potential therapeutic targets for diabetes, it is known that they can induce serious side-effects. Our results show that the practical simultaneous non-negative matrix factorization developed here can add value to the data analysis. In particular, we find that factorizing the data as a single object allows us to distinguish between the four tissue types, but does not correctly reproduce the known dosage level groups. Applying our new approach, which treats the four tissue types as providing distinct, but related, datasets, we find that the dosage level groups are respected. The new algorithm then provides separate gene list orderings that can be studied for each tissue type, and compared with the ordering arising from the single factorization. We find that many of our conclusions can be corroborated with known biological behaviour, and others offer new insights into the toxicological effects. Overall, the algorithm shows promise for early detection of toxicity in the drug discovery process

    Effect of having private health insurance on the use of health care services: the case of Spain

    Get PDF
    Background: Several stakeholders have undertaken initiatives to propose solutions towards a more sustainable health system and Spain, as an example of a European country affected by austerity measures, is looking for ways to cut healthcare budgets. Methods: The aim of this paper is to study the effect of private health insurance on health care utilization using the latest micro-data from the European Community Household Panel (ECHP), the Spanish National Health Survey (SNHS) and the European Union Statistics on Income and Living Conditions (EU-SILC). We use matching techniques based on propensity score methods: single match, four matches, bias-adjustment and allowing for heteroskedasticity. Results: The results demonstrate that people with a private health insurance, use the public health system less than individuals without double health insurance coverage. Conclusions: Our conclusions are useful when policy makers design public-private partnership policie
    corecore