302 research outputs found
DutchHatTrick: semantic query modeling, ConText, section detection, and match score maximization
This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports
Associative conceptual space-based information retrieval systems
In this `Information Era' with the availability of large collections of books, articles, journals, CD-ROMs, video films and so on, there exists an increasing need for intelligent information retrieval systems that enable users to find the information desired easily. Many attempts have been made to construct such retrieval systems, including the electronic ones used in libraries and including the search engines for the World Wide Web. In many cases, however, the so-called `precision' and `recall' of these systems leave much to be desired.
In this paper, a new AI-based retrieval system is proposed, inspired by, among other things, the WEBSOM-algorithm. However, contrary to that approach where domain knowledge is extracted from the full text of all books, we propose a system where certain specific meta-information is automatically assembled using only the index of every document. This knowledge extraction process results into a new type of concept space, the so-called Associative Conceptual Space where the `concepts' as found in all documents are clustered using a Hebbian-type of learning algorithm. Then, each document can be characterised by comparing the concepts as occurring in it to those present in the associative conceptual space. Applying these characterisations, all documents can be clustered such that semantically similar documents lie close together on a Self-Organising Map. This map can easily be inspected by its user
Efficient GPU-accelerated fitting of observational health-scaled stratified and time-varying Cox models
The Cox proportional hazards model stands as a widely-used semi-parametric
approach for survival analysis in medical research and many other fields.
Numerous extensions of the Cox model have further expanded its versatility.
Statistical computing challenges arise, however, when applying many of these
extensions with the increasing complexity and volume of modern observational
health datasets. To address these challenges, we demonstrate how to employ
massive parallelization through graphics processing units (GPU) to enhance the
scalability of the stratified Cox model, the Cox model with time-varying
covariates, and the Cox model with time-varying coefficients. First we
establish how the Cox model with time-varying coefficients can be transformed
into the Cox model with time-varying covariates when using discrete
time-to-event data. We then demonstrate how to recast both of these into a
stratified Cox model and identify their shared computational bottleneck that
results when evaluating the now segmented partial likelihood and its gradient
with respect to regression coefficients at scale. These computations mirror a
highly transformed segmented scan operation. While this bottleneck is not an
immediately obvious target for multi-core parallelization, we convert it into
an un-segmented operation to leverage the efficient many-core parallel scan
algorithm. Our massively parallel implementation significantly accelerates
model fitting on large-scale and high-dimensional Cox models with
stratification or time-varying effect, delivering an order of magnitude speedup
over traditional central processing unit-based implementations
Massive Parallelization of Massive Sample-size Survival Analysis
Large-scale observational health databases are increasingly popular for
conducting comparative effectiveness and safety studies of medical products.
However, increasing number of patients poses computational challenges when
fitting survival regression models in such studies. In this paper, we use
graphics processing units (GPUs) to parallelize the computational bottlenecks
of massive sample-size survival analyses. Specifically, we develop and apply
time- and memory-efficient single-pass parallel scan algorithms for Cox
proportional hazards models and forward-backward parallel scan algorithms for
Fine-Gray models for analysis with and without a competing risk using a cyclic
coordinate descent optimization approach We demonstrate that GPUs accelerate
the computation of fitting these complex models in large databases by
orders-of-magnitude as compared to traditional multi-core CPU parallelism. Our
implementation enables efficient large-scale observational studies involving
millions of patients and thousands of patient characteristics
Adjusting for indirectly measured confounding using large-scale propensity scores
Confounding remains one of the major challenges to causal inference with
observational data. This problem is paramount in medicine, where we would like
to answer causal questions from large observational datasets like electronic
health records (EHRs). Modern medical data (such as EHRs) typically contain
tens of thousands of covariates. Such a large set carries hope that many of the
confounders are directly measured, and further hope that others are indirectly
measured through their correlation with measured covariates. How can we exploit
these large sets of covariates for causal inference? To help answer this
question, this paper examines the performance of the large-scale propensity
score (LSPS) approach on causal analysis of medical data. We demonstrate that
LSPS may adjust for indirectly measured confounders by including tens of
thousands of covariates that may be correlated with them. We present conditions
under which LSPS removes bias due to indirectly measured confounders, and we
show that LSPS may avoid bias when inadvertently adjusting for variables (like
colliders) that otherwise can induce bias. We demonstrate the performance of
LSPS with both simulated medical data and real medical data.Comment: 12 pages, 6 figure
- …