612 research outputs found
DutchHatTrick: semantic query modeling, ConText, section detection, and match score maximization
This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports
Literature-based priors for gene regulatory networks
Motivation: The use of prior knowledge to improve gene regulatory network modelling has often been proposed. In this paper we present the first research on the massive incorporation of prior knowledge from literature for Bayesian network learning of gene networks. As the publication rate of scientific papers grows, updating online databases, which have been proposed as potential prior knowledge in past rese-arch, becomes increasingly challenging. The novelty of our approach lies in the use of gene-pair association scores that describe the over-lap in the contexts in which the genes are mentioned, generated from a large database of scientific literature, harnessing the information contained in a huge number of documents into a simple, clear format. Results: We present a method to transform such literature-based gene association scores to network prior probabilities, and apply it to learn gene sub-networks for yeast, E. coli and Human organisms. We also investigate the effect of weighting the influence of the prior know-ledge. Our findings show that literature-based priors can improve both the number of true regulatory interactions present in the network and the accuracy of expression value prediction on genes, in comparison to a network learnt solely from expression data. Networks learnt with priors also show an improved biological interpretation, with identified subnetworks that coincide with known biological pathways. Contact
Associative conceptual space-based information retrieval systems
In this `Information Era' with the availability of large collections of books, articles, journals, CD-ROMs, video films and so on, there exists an increasing need for intelligent information retrieval systems that enable users to find the information desired easily. Many attempts have been made to construct such retrieval systems, including the electronic ones used in libraries and including the search engines for the World Wide Web. In many cases, however, the so-called `precision' and `recall' of these systems leave much to be desired.
In this paper, a new AI-based retrieval system is proposed, inspired by, among other things, the WEBSOM-algorithm. However, contrary to that approach where domain knowledge is extracted from the full text of all books, we propose a system where certain specific meta-information is automatically assembled using only the index of every document. This knowledge extraction process results into a new type of concept space, the so-called Associative Conceptual Space where the `concepts' as found in all documents are clustered using a Hebbian-type of learning algorithm. Then, each document can be characterised by comparing the concepts as occurring in it to those present in the associative conceptual space. Applying these characterisations, all documents can be clustered such that semantically similar documents lie close together on a Self-Organising Map. This map can easily be inspected by its user
Efficient GPU-accelerated fitting of observational health-scaled stratified and time-varying Cox models
The Cox proportional hazards model stands as a widely-used semi-parametric
approach for survival analysis in medical research and many other fields.
Numerous extensions of the Cox model have further expanded its versatility.
Statistical computing challenges arise, however, when applying many of these
extensions with the increasing complexity and volume of modern observational
health datasets. To address these challenges, we demonstrate how to employ
massive parallelization through graphics processing units (GPU) to enhance the
scalability of the stratified Cox model, the Cox model with time-varying
covariates, and the Cox model with time-varying coefficients. First we
establish how the Cox model with time-varying coefficients can be transformed
into the Cox model with time-varying covariates when using discrete
time-to-event data. We then demonstrate how to recast both of these into a
stratified Cox model and identify their shared computational bottleneck that
results when evaluating the now segmented partial likelihood and its gradient
with respect to regression coefficients at scale. These computations mirror a
highly transformed segmented scan operation. While this bottleneck is not an
immediately obvious target for multi-core parallelization, we convert it into
an un-segmented operation to leverage the efficient many-core parallel scan
algorithm. Our massively parallel implementation significantly accelerates
model fitting on large-scale and high-dimensional Cox models with
stratification or time-varying effect, delivering an order of magnitude speedup
over traditional central processing unit-based implementations
Massive Parallelization of Massive Sample-size Survival Analysis
Large-scale observational health databases are increasingly popular for
conducting comparative effectiveness and safety studies of medical products.
However, increasing number of patients poses computational challenges when
fitting survival regression models in such studies. In this paper, we use
graphics processing units (GPUs) to parallelize the computational bottlenecks
of massive sample-size survival analyses. Specifically, we develop and apply
time- and memory-efficient single-pass parallel scan algorithms for Cox
proportional hazards models and forward-backward parallel scan algorithms for
Fine-Gray models for analysis with and without a competing risk using a cyclic
coordinate descent optimization approach We demonstrate that GPUs accelerate
the computation of fitting these complex models in large databases by
orders-of-magnitude as compared to traditional multi-core CPU parallelism. Our
implementation enables efficient large-scale observational studies involving
millions of patients and thousands of patient characteristics
- …