181 research outputs found
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Report on methods of safety signal generation in paediatrics from pharmacovigilance databases
This deliverable is based on the need to develop and test methods for safety signal detection in
children. Signal detection is the mainstay of detecting safety issues, but so far very few groups
have specifically looked at children. We developed reference sets for positive and negative drugevent
combinations and vaccine-event combinations by a systematic literature review on all
combinations. We retrieved the FDA AERS database, the CDC VAERS database and
EUDRAVIGILANCE database. In order to analyse the datasets we had a stepwise approach from
extraction of data, cleaning (e.g. mapping MedDRA and ATC codes) and transformation into a a
common data model that we defined for the spontaneous reporting databases. A statistical
analysis plan was created for the testing of methods and we provided some descriptive analyses
of the FAERS data. Next steps will be to complete the analyses
Twitter Mining for Syndromic Surveillance
Enormous amounts of personalised data is generated daily from social media platforms today. Twitter in particular, generates vast textual streams in real-time, accompanied with personal information. This big social media data oļ¬ers a potential avenue for inferring public and social patterns. This PhD thesis investigates the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance eļ¬orts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a speciļ¬c syndrome - asthma/diļ¬culty breathing. We seek to develop means of extracting reliable signals from the Twitter signal, to be used for syndromic surveillance purposes. We begin by outlining our data collection and preprocessing methods. However, we observe that even with keyword-based data collection, many of the collected tweets are not relevant because they represent chatter, or talk of awareness instead of an individual suļ¬ering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. We ļ¬rst develop novel features based on the emoji content of Tweets and apply semi-supervised learning techniques to ļ¬lter Tweets. Next, we investigate the eļ¬ectiveness of deep learning at this task. We pro-pose a novel classiļ¬cation algorithm based on neural language models, and compare it to existing successful and popular deep learning algorithms. Following this, we go on to propose an attentive bi-directional Recurrent Neural Network architecture for ļ¬ltering Tweets which also oļ¬ers additional syndromic surveillance utility by identifying keywords among syndromic Tweets. In doing so, we are not only able to detect alarms, but also have some clues into what the alarm involves. Lastly, we look towards optimizing the Twitter syndromic surveillance pipeline by selecting the best possible keywords to be supplied to the Twitter API. We developed algorithms to intelligently and automatically select keywords such that the quality, in terms of relevance, and quantity of Tweets collected is maximised
Recommended from our members
Fast, Scalable, and Accurate Algorithms for Time-Series Analysis
Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings
Front-Line Physicians' Satisfaction with Information Systems in Hospitals
Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
Recommended from our members
The Corpus Expansion Toolkit: finding what we want on the web
This thesis presents the Corpus Expansion Toolkit (CET), a generally applicable toolkit that allows researchers to build domain-specific corpora from the web. The main purpose of the work presented in this thesis and the development of the CET is to provide a solution to discovering desired content on the web from possibly unknown locations or a poorly defined domain. Using an iterative process, the CET is able to solve the problem of discovering domain-specific online content and expand a corpus using only a very small number of example documents or characteristic phrases taken from the target domain. Using a human-in-the-loop strategy and a chain of discrete software components the CET also allows the concept of a domain to be iteratively defined using the very online resources used to expand the original corpus. The CET combines feature extraction, search, web crawling and machine learning methods to collected, store, filter and perform information extraction on collected documents. Using a small number of example āseedā documents the CET is able to expand the original corpus by finding more relevant documents from the web and provide a number of tools to support their analysis. This thesis presents a case study-based methodology that introduces the various contributions and components of the CET through the discussion of five case studies covering a wide variety of domains and requirements that the CET has been applied. These case studies hope to illustrate three main use cases, listed below, where the CET is applicable:
1. Domain known ā source known
2. Domain known ā source unknown
3. Domain unknown ā source unknown
First, use cases where the sites for document collection are known and the topic of research is clearly defined. Second, instances where the topic of research is clearly defined but where to find relevant documents on the web is unknown. Third, the most extreme use case, where the domain is poorly defined or unknown to the researcher and the location of the information is also unknown. This thesis presents a solution that allows researchers to begin with very little information on a specific topic and iteratively build a clear conception of a domain and translate that to a computational system
Using spontaneously generated online patient experiences to improve healthcare : A case study using Modafinil
Background
Acknowledged issues with the RCT focus of EBM and recognition of the value of patient input have created a need for new methods of knowledge generation that can give the depth of qualitative studies but on a much larger scale. Almost half of the global population uses social media regularly, with increasing numbers of people using online spaces as either a first- or second-line health information and exchange resource. Estimates suggest the volume of online health related data grew by 300% between 2017 and 2020. As a data source, this unstructured freeform textual data is a form of patient generated health data, containing a mass of patient centred, contextually grounded detail about the perceptions and health concerns of those who post online. Methods for analysing it are at an early stage of development, but it is seen as having potential to add to clinical understanding, either by augmenting existing knowledge, or in aiding understanding of real-world usage of healthcare interventions and services.
Objectives
To explore how large-scale analysis of SGOPE can help with understanding patient perspectives of their conditions, symptoms, and self-management behaviours, assess the effectiveness of interventions, contribute to the process of knowledge and evidence creation, and consequently help healthcare systems improve outcomes in the most efficient manner. A secondary aim is to contribute to the development of methods that can be generalised across other interventions or services.
Methods
Using Modafinil as a case study, a multistage approach was taken. First, an exploratory study, comparing both qualitative and basic NLP techniques was undertaken on a small sample of 260 posts to identify topics, evaluate effectiveness and identify perceived causal text. An umbrella scoping review was then undertaken exploring how and for what purposes SGOPE data is currently being used within healthcare research. Findings from both then guided the main study, which used a variety of unsupervised NLP tools to explore the main dataset of over 69k posts. Individual methods were compared against each other. Results from both studies were compared and for evaluation.
Results
In contrast to the existing inconclusive systematic review evidence for Modafinil for anything other than narcolepsy, both studies found that Modafinil is seen as by posters as effective in treating fatigue and cognition symptoms in a wide range of conditions. Both identified the topics mentioned in the data, although more work needs to be done to develop the NLP methods to achieve a greater depth of understanding. The first study identified eight themes within the posts: reason for taking, impact of symptoms, acquisition, dosage, side-effects, comparison with other interventions, effectiveness, and quality of life outcomes. Effectiveness of Modafinil was found to be 68% positive, 12% mixed and 18% negative. Expressions of causal belief were identified. In the main study, effectiveness was measured with sentiment analysis, with all methods showing strong positive sentiment. Topic modelling identified groups of themes. Linguistic techniques extracted phrases indicating causality. Various analysis methods were compared to develop a method that could be generalised across other health topics
International Society for Disease Surveillance Conference 2011: Building the Future of Public Health Surveillance: Building the Future of Public Health Surveillance
Daniel Reidpath - ORCID: 0000-0002-8796-0420 https://orcid.org/0000-0002-8796-04204pubpub1117
- ā¦