3,687 research outputs found
Big data analytics for preventive medicine
© 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations
Discovering Higher-order SNP Interactions in High-dimensional Genomic Data
In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise
Clinical Text Classification with Rule-based Features and Knowledge-guided Convolutional Neural Networks
Clinical text classification is an important problem in medical natural
language processing. Existing studies have conventionally focused on rules or
knowledge sources-based feature engineering, but only a few have exploited
effective feature learning capability of deep learning methods. In this study,
we propose a novel approach which combines rule-based features and
knowledge-guided deep learning techniques for effective disease classification.
Critical Steps of our method include identifying trigger phrases, predicting
classes with very few examples using trigger phrases and training a
convolutional neural network with word embeddings and Unified Medical Language
System (UMLS) entity embeddings. We evaluated our method on the 2008
Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge.
The results show that our method outperforms the state of the art methods.Comment: arXiv admin note: text overlap with arXiv:1806.04820 by other author
Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse domain of patient outcomes. Patient records have continually become more widely available through various initiatives enabling open access whilst maintaining critical patient privacy. In spite of such progress, health records remain not widely adopted within the current clinical statistical analysis domain due to challenging issues derived from such “big data”.Deep learning based temporal modelling approaches present an ideal solution to health record challenges through automated self-optimisation of representation learning, able to man-ageably compose the high-dimensional domain of patient records into data representations able to model complex data associations. Such representations can serve to condense and reduce dimensionality to emphasise feature sparsity and importance through novel embedded feature selection approaches. Accordingly, application towards patient records enable complex mod-elling and analysis of the full domain of clinical features to select biomarkers of predictive relevance.Firstly, we propose a novel entropy regularised neural network ensemble able to highlight risk factors associated with hospitalisation risk of individuals with dementia. The application of which, was able to reduce a large domain of unique medical events to a small set of relevant risk factors able to maintain hospitalisation discrimination.Following on, we continue our work on ensemble architecture approaches with a novel cas-cading LSTM ensembles to predict severe sepsis onset within critical patients in an ICU critical care centre. We demonstrate state-of-the-art performance capabilities able to outperform that of current related literature.Finally, we propose a novel embedded feature selection application dubbed 1D convolu-tion feature selection using sparsity regularisation. Said methodology was evaluated on both domains of dementia and sepsis prediction objectives to highlight model capability and generalisability. We further report a selection of potential biomarkers for the aforementioned case study objectives highlighting clinical relevance and potential novelty value for future clinical analysis.Accordingly, we demonstrate the effective capability of embedded feature selection ap-proaches through the application of temporal based deep learning architectures in the discovery of effective biomarkers across a variety of challenging clinical applications
Global analysis of SNPs, proteins and protein-protein interactions: approaches for the prioritisation of candidate disease genes.
PhDUnderstanding the etiology of complex disease remains a challenge in biology. In recent
years there has been an explosion in biological data, this study investigates machine
learning and network analysis methods as tools to aid candidate disease gene prioritisation,
specifically relating to hypertension and cardiovascular disease.
This thesis comprises four sets of analyses: Firstly, non synonymous single nucleotide
polymorphisms (nsSNPs) were analysed in terms of sequence and structure based properties
using a classifier to provide a model for predicting deleterious nsSNPs. The degree
of sequence conservation at the nsSNP position was found to be the single best attribute
but other sequence and structural attributes in combination were also useful. Predictions
for nsSNPs within Ensembl have been made publicly available.
Secondly, predicting protein function for proteins with an absence of experimental
data or lack of clear similarity to a sequence of known function was addressed. Protein
domain attributes based on physicochemical and predicted structural characteristics
of the sequence were used as input to classifiers for predicting membership of large and
diverse protein superfamiles from the SCOP database. An enrichment method was investigated
that involved adding domains to the training dataset that are currently absent
from SCOP. This analysis resulted in improved classifier accuracy, optimised classifiers
achieved 66.3% for single domain proteins and 55.6% when including domains from
multi domain proteins. The domains from superfamilies with low sequence similarity,
share global sequence properties enabling applications to be developed which compliment
profile methods for detecting distant sequence relationships.
Thirdly, a topological analysis of the human protein interactome was performed. The
results were combined with functional annotation and sequence based properties to build
models for predicting hypertension associated proteins. The study found that predicted
hypertension related proteins are not generally associated with network hubs and do
not exhibit high clustering coefficients. Despite this, they tend to be closer and better
connected to other hypertension proteins on the interaction network than would be expected
by chance. Classifiers that combined PPI network, amino acid sequence and functional
properties produced a range of precision and recall scores according to the applied
3
weights.
Finally, interactome properties of proteins implicated in cardiovascular disease and
cancer were studied. The analysis quantified the influential (central) nature of each protein
and defined characteristics of functional modules and pathways in which the disease
proteins reside. Such proteins were found to be enriched 2 fold within proteins that are influential
(p<0.05) in the interactome. Additionally, they cluster in large, complex, highly
connected communities, acting as interfaces between multiple processes more often than
expected. An approach to prioritising disease candidates based on this analysis was proposed.
Each analyses can provide some new insights into the effort to identify novel disease
related proteins for cardiovascular disease
- …