Search CORE

181 research outputs found

Knowledge-based Biomedical Data Science 2019

Author: Callahan Tiffany J.
Hunter Lawrence E.
Pielke-Lombardo Harrison
Tripodi Ignacio J.
Publication venue
Publication date: 08/10/2019
Field of study

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

arXiv.org e-Print Archive

Report on methods of safety signal generation in paediatrics from pharmacovigilance databases

Author: Caitlin Dodd
Carmen Ferrajolo
Cassandra Nan
Federica Fregonese
Florentia Kaguelidou
Jan Bonhoeffer
Miriam Sturkenboom
Osemeke Osokogu
Thomas Verstraeten
Yolanda Brauchli
Publication venue
Publication date: 01/01/2014
Field of study

This deliverable is based on the need to develop and test methods for safety signal detection in children. Signal detection is the mainstay of detecting safety issues, but so far very few groups have specifically looked at children. We developed reference sets for positive and negative drugevent combinations and vaccine-event combinations by a systematic literature review on all combinations. We retrieved the FDA AERS database, the CDC VAERS database and EUDRAVIGILANCE database. In order to analyse the datasets we had a stepwise approach from extraction of data, cleaning (e.g. mapping MedDRA and ATC codes) and transformation into a a common data model that we defined for the spontaneous reporting databases. A statistical analysis plan was created for the testing of methods and we provided some descriptive analyses of the FAERS data. Next steps will be to complete the analyses

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Twitter Mining for Syndromic Surveillance

Author: Edo-Osagie Osagioduwa
Publication venue
Publication date: 01/09/2019
Field of study

Enormous amounts of personalised data is generated daily from social media platforms today. Twitter in particular, generates vast textual streams in real-time, accompanied with personal information. This big social media data oﬀers a potential avenue for inferring public and social patterns. This PhD thesis investigates the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance eﬀorts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a speciﬁc syndrome - asthma/diﬃculty breathing. We seek to develop means of extracting reliable signals from the Twitter signal, to be used for syndromic surveillance purposes. We begin by outlining our data collection and preprocessing methods. However, we observe that even with keyword-based data collection, many of the collected tweets are not relevant because they represent chatter, or talk of awareness instead of an individual suﬀering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. We ﬁrst develop novel features based on the emoji content of Tweets and apply semi-supervised learning techniques to ﬁlter Tweets. Next, we investigate the eﬀectiveness of deep learning at this task. We pro-pose a novel classiﬁcation algorithm based on neural language models, and compare it to existing successful and popular deep learning algorithms. Following this, we go on to propose an attentive bi-directional Recurrent Neural Network architecture for ﬁltering Tweets which also oﬀers additional syndromic surveillance utility by identifying keywords among syndromic Tweets. In doing so, we are not only able to detect alarms, but also have some clues into what the alarm involves. Lastly, we look towards optimizing the Twitter syndromic surveillance pipeline by selecting the best possible keywords to be supplied to the Twitter API. We developed algorithms to intelligently and automatically select keywords such that the quality, in terms of relevance, and quantity of Tweets collected is maximised

University of East Anglia digital repository

Recommended from our members

Fast, Scalable, and Accurate Algorithms for Time-Series Analysis

Author: Paparrizos Ioannis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs. For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis. For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings

Columbia University Academic Commons

Front-Line Physicians' Satisfaction with Information Systems in Hospitals

Author: Junttila Kristiina
Peltonen Laura-Maria
Salanterä Sanna
Publication venue: 'IOS Press'
Publication date: 01/01/2018
Field of study

Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

The Corpus Expansion Toolkit: finding what we want on the web

Author: Pay Jack Frederick
Publication venue
Publication date: 13/08/2020
Field of study

This thesis presents the Corpus Expansion Toolkit (CET), a generally applicable toolkit that allows researchers to build domain-specific corpora from the web. The main purpose of the work presented in this thesis and the development of the CET is to provide a solution to discovering desired content on the web from possibly unknown locations or a poorly defined domain. Using an iterative process, the CET is able to solve the problem of discovering domain-specific online content and expand a corpus using only a very small number of example documents or characteristic phrases taken from the target domain. Using a human-in-the-loop strategy and a chain of discrete software components the CET also allows the concept of a domain to be iteratively defined using the very online resources used to expand the original corpus. The CET combines feature extraction, search, web crawling and machine learning methods to collected, store, filter and perform information extraction on collected documents. Using a small number of example ‘seed’ documents the CET is able to expand the original corpus by finding more relevant documents from the web and provide a number of tools to support their analysis. This thesis presents a case study-based methodology that introduces the various contributions and components of the CET through the discussion of five case studies covering a wide variety of domains and requirements that the CET has been applied. These case studies hope to illustrate three main use cases, listed below, where the CET is applicable: 1. Domain known – source known 2. Domain known – source unknown 3. Domain unknown – source unknown First, use cases where the sites for document collection are known and the topic of research is clearly defined. Second, instances where the topic of research is clearly defined but where to find relevant documents on the web is unknown. Third, the most extreme use case, where the domain is poorly defined or unknown to the researcher and the location of the information is also unknown. This thesis presents a solution that allows researchers to begin with very little information on a specific topic and iteratively build a clear conception of a domain and translate that to a computational system

Sussex Research Online

Using spontaneously generated online patient experiences to improve healthcare : A case study using Modafinil

Author: Walsh Julia
Publication venue
Publication date
Field of study

Background Acknowledged issues with the RCT focus of EBM and recognition of the value of patient input have created a need for new methods of knowledge generation that can give the depth of qualitative studies but on a much larger scale. Almost half of the global population uses social media regularly, with increasing numbers of people using online spaces as either a first- or second-line health information and exchange resource. Estimates suggest the volume of online health related data grew by 300% between 2017 and 2020. As a data source, this unstructured freeform textual data is a form of patient generated health data, containing a mass of patient centred, contextually grounded detail about the perceptions and health concerns of those who post online. Methods for analysing it are at an early stage of development, but it is seen as having potential to add to clinical understanding, either by augmenting existing knowledge, or in aiding understanding of real-world usage of healthcare interventions and services. Objectives To explore how large-scale analysis of SGOPE can help with understanding patient perspectives of their conditions, symptoms, and self-management behaviours, assess the effectiveness of interventions, contribute to the process of knowledge and evidence creation, and consequently help healthcare systems improve outcomes in the most efficient manner. A secondary aim is to contribute to the development of methods that can be generalised across other interventions or services. Methods Using Modafinil as a case study, a multistage approach was taken. First, an exploratory study, comparing both qualitative and basic NLP techniques was undertaken on a small sample of 260 posts to identify topics, evaluate effectiveness and identify perceived causal text. An umbrella scoping review was then undertaken exploring how and for what purposes SGOPE data is currently being used within healthcare research. Findings from both then guided the main study, which used a variety of unsupervised NLP tools to explore the main dataset of over 69k posts. Individual methods were compared against each other. Results from both studies were compared and for evaluation. Results In contrast to the existing inconclusive systematic review evidence for Modafinil for anything other than narcolepsy, both studies found that Modafinil is seen as by posters as effective in treating fatigue and cognition symptoms in a wide range of conditions. Both identified the topics mentioned in the data, although more work needs to be done to develop the NLP methods to achieve a greater depth of understanding. The first study identified eight themes within the posts: reason for taking, impact of symptoms, acquisition, dosage, side-effects, comparison with other interventions, effectiveness, and quality of life outcomes. Effectiveness of Modafinil was found to be 68% positive, 12% mixed and 18% negative. Expressions of causal belief were identified. In the main study, effectiveness was measured with sentiment analysis, with all methods showing strong positive sentiment. Topic modelling identified groups of themes. Linguistic techniques extracted phrases indicating causality. Various analysis methods were compared to develop a method that could be generalised across other health topics

Warwick Research Archives Portal Repository