67,300 research outputs found

    Adaptive content mapping for internet navigation

    Get PDF
    The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database

    SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine

    Full text link
    Traditional medicine typically applies one-size-fits-all treatment for the entire patient population whereas precision medicine develops tailored treatment schemes for different patient subgroups. The fact that some factors may be more significant for a specific patient subgroup motivates clinicians and medical researchers to develop new approaches to subgroup detection and analysis, which is an effective strategy to personalize treatment. In this study, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variable

    High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

    Full text link
    We implement a master-slave parallel genetic algorithm (PGA) with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs) to implement a PGA and visualise the results using disjoint minimal spanning trees (MSTs). We demonstrate that our GPU PGA, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable due to compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.Comment: 10 pages, 5 figures, 4 tables, More thorough discussion of implementatio

    Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

    Get PDF
    Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter, λ\lambda, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

    The ROS wheel: refining ROS transcriptional footprints

    Get PDF
    In the last decade, microarray studies have delivered extensive inventories of transcriptome-wide changes in messenger RNA levels provoked by various types of oxidative stress in Arabidopsis (Arabidopsis thaliana). Previous cross-study comparisons indicated how different types of reactive oxygen species (ROS) and their subcellular accumulation sites are able to reshape the transcriptome in specific manners. However, these analyses often employed simplistic statistical frameworks that are not compatible with large-scale analyses. Here, we reanalyzed a total of 79 Affymetrix ATH1 microarray studies of redox homeostasis perturbation experiments. To create hierarchy in such a high number of transcriptomic data sets, all transcriptional profiles were clustered on the overlap extent of their differentially expressed transcripts. Subsequently, meta-analysis determined a single magnitude of differential expression across studies and identified common transcriptional footprints per cluster. The resulting transcriptional footprints revealed the regulation of various metabolic pathways and gene families. The RESPIRATORY BURST OXIDASE HOMOLOG F-mediated respiratory burst had a major impact and was a converging point among several studies. Conversely, the timing of the oxidative stress response was a determining factor in shaping different transcriptome footprints. Our study emphasizes the need to interpret transcriptomic data sets in a systematic context, where initial, specific stress triggers can converge to common, aspecific transcriptional changes. We believe that these refined transcriptional footprints provide a valuable resource for assessing the involvement of ROS in biological processes in plants
    • …
    corecore