Search CORE

302 research outputs found

DutchHatTrick: semantic query modeling, ConText, section detection, and match score maximization

Author: Meij Edgar
Schuemie Martijn
Trieschnigg Dolf
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/2011
Field of study

This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports

University of Twente Research Information

International Migration, Integration and Social Cohesion online publications

Concept based document retrieval for genomics literature

Author: Kraaij Wessel
Schuemie Martijn
Trieschnigg Rudolf Berend
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2006
Field of study

University of Twente Research Information

Cross Language Information Retrieval for Biomedical Literature

Author: Kraaij Wessel
Schuemie Martijn
Trieschnigg Rudolf Berend
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2007
Field of study

University of Twente Research Information

Associative conceptual space-based information retrieval systems

Author: Berg J.H. (Jan) van den
Schuemie M.J. (Martijn)
Publication venue
Publication date: 01/01/1998
Field of study

In this `Information Era' with the availability of large collections of books, articles, journals, CD-ROMs, video films and so on, there exists an increasing need for intelligent information retrieval systems that enable users to find the information desired easily. Many attempts have been made to construct such retrieval systems, including the electronic ones used in libraries and including the search engines for the World Wide Web. In many cases, however, the so-called `precision' and `recall' of these systems leave much to be desired. In this paper, a new AI-based retrieval system is proposed, inspired by, among other things, the WEBSOM-algorithm. However, contrary to that approach where domain knowledge is extracted from the full text of all books, we propose a system where certain specific meta-information is automatically assembled using only the index of every document. This knowledge extraction process results into a new type of concept space, the so-called Associative Conceptual Space where the `concepts' as found in all documents are clustered using a Hebbian-type of learning algorithm. Then, each document can be characterised by comparing the concepts as occurring in it to those present in the associative conceptual space. Applying these characterisations, all documents can be clustered such that semantically similar documents lie close together on a Self-Organising Map. This map can easily be inspected by its user

EUR Research Repository

Erasmus University Digital Repository

Efficient GPU-accelerated fitting of observational health-scaled stratified and time-varying Cox models

Author: Schuemie Martijn J.
Suchard Marc A.
Yang Jianxiao
Publication venue
Publication date: 24/10/2023
Field of study

The Cox proportional hazards model stands as a widely-used semi-parametric approach for survival analysis in medical research and many other fields. Numerous extensions of the Cox model have further expanded its versatility. Statistical computing challenges arise, however, when applying many of these extensions with the increasing complexity and volume of modern observational health datasets. To address these challenges, we demonstrate how to employ massive parallelization through graphics processing units (GPU) to enhance the scalability of the stratified Cox model, the Cox model with time-varying covariates, and the Cox model with time-varying coefficients. First we establish how the Cox model with time-varying coefficients can be transformed into the Cox model with time-varying covariates when using discrete time-to-event data. We then demonstrate how to recast both of these into a stratified Cox model and identify their shared computational bottleneck that results when evaluating the now segmented partial likelihood and its gradient with respect to regression coefficients at scale. These computations mirror a highly transformed segmented scan operation. While this bottleneck is not an immediately obvious target for multi-core parallelization, we convert it into an un-segmented operation to leverage the efficient many-core parallel scan algorithm. Our massively parallel implementation significantly accelerates model fitting on large-scale and high-dimensional Cox models with stratification or time-varying effect, delivering an order of magnitude speedup over traditional central processing unit-based implementations

arXiv.org e-Print Archive

Massive Parallelization of Massive Sample-size Survival Analysis

Author: Schuemie Martijn J.
Suchard Marc A.
Yang Jianxiao
Publication venue
Publication date: 18/04/2022
Field of study

Large-scale observational health databases are increasingly popular for conducting comparative effectiveness and safety studies of medical products. However, increasing number of patients poses computational challenges when fitting survival regression models in such studies. In this paper, we use graphics processing units (GPUs) to parallelize the computational bottlenecks of massive sample-size survival analyses. Specifically, we develop and apply time- and memory-efficient single-pass parallel scan algorithms for Cox proportional hazards models and forward-backward parallel scan algorithms for Fine-Gray models for analysis with and without a competing risk using a cyclic coordinate descent optimization approach We demonstrate that GPUs accelerate the computation of fitting these complex models in large databases by orders-of-magnitude as compared to traditional multi-core CPU parallelism. Our implementation enables efficient large-scale observational studies involving millions of patients and thousands of patient characteristics

arXiv.org e-Print Archive

Adjusting for indirectly measured confounding using large-scale propensity scores

Author: Blei David
Hripcsak George
Schuemie Martijn
Wang Yixin
Zhang Linying
Publication venue
Publication date: 27/04/2022
Field of study

Confounding remains one of the major challenges to causal inference with observational data. This problem is paramount in medicine, where we would like to answer causal questions from large observational datasets like electronic health records (EHRs). Modern medical data (such as EHRs) typically contain tens of thousands of covariates. Such a large set carries hope that many of the confounders are directly measured, and further hope that others are indirectly measured through their correlation with measured covariates. How can we exploit these large sets of covariates for causal inference? To help answer this question, this paper examines the performance of the large-scale propensity score (LSPS) approach on causal analysis of medical data. We demonstrate that LSPS may adjust for indirectly measured confounders by including tens of thousands of covariates that may be correlated with them. We present conditions under which LSPS removes bias due to indirectly measured confounders, and we show that LSPS may avoid bias when inadvertently adjusting for variables (like colliders) that otherwise can induce bias. We demonstrate the performance of LSPS with both simulated medical data and real medical data.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive