351 research outputs found
Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark
In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimer’s disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimer’s disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimer’s patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model
An Investigation on Disease Diagnosis and Prediction by Using Modified K-Mean clustering and Combined CNN and ELM Classification Techniques
Data analysis is important for managing a lot of knowledge in the healthcare industry. The older medical study favored prediction over processing and assimilating a massive volume of hospital data. The precise research of health data becomes advantageous for early disease identification and patient treatment as a result of the tremendous knowledge expansion in the biological and healthcare fields. But when there are gaps in the medical data, the accuracy suffers. The use of K-means algorithm is modest and efficient to perform. It is appropriate for processing vast quantities of continuous, high-dimensional numerical data. However, the number of clusters in the given dataset must be predetermined for this technique, and choosing the right K is frequently challenging. The cluster centers chosen in the first phase have an impact on the clustering results as well. To overcome this drawback in k-means to modify the initialization and centroid steps in classification technique with combining (Convolutional neural network) CNN and ELM (extreme learning machine) technique is used. To increase this work, disease risk prediction using repository dataset is proposed. We use different types of machine learning algorithm for predicting disease using structured data. The prediction accuracy of using proposed hybrid model is 99.8% which is more than SVM (support vector machine), KNN (k-nearest neighbors), AB (AdaBoost algorithm) and CKN-CNN (consensus K-nearest neighbor algorithm and convolution neural network)
Recommended from our members
Computing resources sensitive parallelization of neural neworks for large scale diabetes data modelling, diagnosis and prediction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Diabetes has become one of the most severe deceases due to an increasing number of diabetes patients globally. A large amount of digital data on diabetes has been collected through various channels. How to utilize these data sets to help doctors to make a decision on diagnosis, treatment and prediction of diabetic patients poses many challenges to the research community. The thesis investigates mathematical models with a focus on neural networks for large scale diabetes data modelling and analysis by utilizing modern computing technologies such as grid computing and cloud computing. These computing technologies provide users with an inexpensive way to have access to extensive computing resources over the Internet for solving data and computationally intensive problems. This thesis evaluates the performance of seven representative machine learning techniques in classification of diabetes data and the results show that neural network produces the best accuracy in classification but incurs high overhead in data training. As a result, the thesis develops MRNN, a parallel neural network model based on the MapReduce programming model which has become an enabling technology in support of data intensive applications in the clouds.
By partitioning the diabetic data set into a number of equally sized data blocks, the workload in training is distributed among a number of computing nodes for speedup in data training. MRNN is first evaluated in small scale experimental environments using 12 mappers and subsequently is evaluated in large scale simulated environments using up to 1000 mappers. Both the experimental and simulations results have shown the effectiveness of MRNN in classification, and its high scalability in data training.
MapReduce does not have a sophisticated job scheduling scheme for heterogonous computing environments in which the computing nodes may have varied computing capabilities. For this purpose, this thesis develops a load balancing scheme based on genetic algorithms with an aim to balance the training workload among heterogeneous computing nodes. The nodes with more computing capacities will receive more MapReduce jobs for execution. Divisible load theory is employed to guide the evolutionary process of the genetic algorithm with an aim to achieve fast convergence. The proposed load balancing scheme is evaluated in large scale simulated MapReduce environments with varied levels of heterogeneity using different sizes of data sets. All the results show that the genetic algorithm based load balancing scheme significantly reduce the makespan in job execution in comparison with the time consumed without load balancing.This work is funded by the EPSRC and China Market Association
Enhancing health risk prediction with deep learning on big data and revised fusion node paradigm
With recent advances in health systems, the amount of health data is expanding rapidly in various formats. This data originates from many new sources including digital records, mobile devices, and wearable health devices. Big health data offers more opportunities for health data analysis and enhancement of health services via innovative approaches. The objective of this research is to develop a framework to enhance health prediction with the revised fusion node and deep learning paradigms. Fusion node is an information fusion model for constructing prediction systems. Deep learning involves the complex application of machine-learning algorithms, such as Bayesian fusions and neural network, for data extraction and logical inference. Deep learning, combined with information fusion paradigms, can be utilized to provide more comprehensive and reliable predictions from big health data. Based on the proposed framework, an experimental system is developed as an illustration for the framework implementatio
Big data analytics for preventive medicine
© 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations
Clinical Prediction on ML based Internet of Things for E-Health Care System
Machine learning (ML) is a powerful method for uncovering hidden patterns in data from the Internet of Things (IoT). These hybrid solutions intelligently improve decision-making in a variety of fields, including education, security, business, and healthcare. IoT uses machine learning to uncover hidden patterns in bulk data, allowing for better forecasting and referral systems. IoT and machine learning have been embraced in healthcare so that automated computers may generate medical records, anticipate diagnoses, and, most critically, monitor patients in real time. On different databases, different ML algorithms work differently. The overall outcomes may be influenced by the variance in anticipated results. In the clinical decision-making process, there is a lot of variation in prognostic results. As a result, it's critical to comprehend the various machine learning methods utilised to handle IoT data in the healthcare industry. Machine learning of adaptive neuro fuzzy inference system (ANFIS) algorithms is being used to monitor human health in this suggested effort. The UCI database is used for initial training and validation of machine learning systems. Using the IoT system, the test phase collects the person's heart rate, blood pressure, and temperature. The test stage assesses if the sensor data obtained by the IoT framework can predict any irregularities in the health state. To evaluate the accuracy of the forecast %, statistical analysis is performed on cloud data acquired from the IoT. Other routines are derived from K-neighbour results
Business Analytics Using Predictive Algorithms
In today's data-driven business landscape, organizations strive to extract actionable insights and make informed decisions using their vast data. Business analytics, combining data analysis, statistical modeling, and predictive algorithms, is crucial for transforming raw data into meaningful information. However, there are gaps in the field, such as limited industry focus, algorithm comparison, and data quality challenges. This work aims to address these gaps by demonstrating how predictive algorithms can be applied across business domains for pattern identification, trend forecasting, and accurate predictions. The report focuses on sales forecasting and topic modeling, comparing the performance of various algorithms including Linear Regression, Random Forest Regression, XGBoost, LSTMs, and ARIMA. It emphasizes the importance of data preprocessing, feature selection, and model evaluation for reliable sales forecasts, while utilizing S-BERT, UMAP, and HDBScan unsupervised algorithms for extracting valuable insights from unstructured textual data
- …