55,398 research outputs found

    AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments

    Get PDF
    The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute

    Visual and interactive exploration of point data

    Get PDF
    Point data, such as Unit Postcodes (UPC), can provide very detailed information at fine scales of resolution. For instance, socio-economic attributes are commonly assigned to UPC. Hence, they can be represented as points and observable at the postcode level. Using UPC as a common field allows the concatenation of variables from disparate data sources that can potentially support sophisticated spatial analysis. However, visualising UPC in urban areas has at least three limitations. First, at small scales UPC occurrences can be very dense making their visualisation as points difficult. On the other hand, patterns in the associated attribute values are often hardly recognisable at large scales. Secondly, UPC can be used as a common field to allow the concatenation of highly multivariate data sets with an associated postcode. Finally, socio-economic variables assigned to UPC (such as the ones used here) can be non-Normal in their distributions as a result of a large presence of zero values and high variances which constrain their analysis using traditional statistics. This paper discusses a Point Visualisation Tool (PVT), a proof-of-concept system developed to visually explore point data. Various well-known visualisation techniques were implemented to enable their interactive and dynamic interrogation. PVT provides multiple representations of point data to facilitate the understanding of the relations between attributes or variables as well as their spatial characteristics. Brushing between alternative views is used to link several representations of a single attribute, as well as to simultaneously explore more than one variable. PVT’s functionality shows how the use of visual techniques embedded in an interactive environment enable the exploration of large amounts of multivariate point data

    Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy

    Get PDF
    Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    A Role-Based Taxonomy of Human Resource Organizations

    Get PDF
    [Excerpt] An empirically-derived classification (taxonomy) of human resource departments , based on a few fundamental roles played in organizations, was developed as an alternative to the mostly speculative existing typologies. Four types emerged: the strategic partner, the strategic advisor, the operational partner, and the operational administrator. The stability of the solution and the relationships with variables not used to generate it were found satisfactory. The types show some similarities with those identified in the literature
    • 

    corecore