4,644 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Big Data Analytics for Complex Systems

    Get PDF
    The evolution of technology in all fields led to the generation of vast amounts of data by modern systems. Using data to extract information, make predictions, and make decisions is the current trend in artificial intelligence. The advancement of big data analytics tools made accessing and storing data easier and faster than ever, and machine learning algorithms help to identify patterns in and extract information from data. The current tools and machines in health, computer technologies, and manufacturing can generate massive raw data about their products or samples. The author of this work proposes a modern integrative system that can utilize big data analytics, machine learning, super-computer resources, and industrial health machines’ measurements to build a smart system that can mimic the human intelligence skills of observations, detection, prediction, and decision-making. The applications of the proposed smart systems are included as case studies to highlight the contributions of each system. The first contribution is the ability to utilize big data revolutionary and deep learning technologies on production lines to diagnose incidents and take proper action. In the current digital transformational industrial era, Industry 4.0 has been receiving researcher attention because it can be used to automate production-line decisions. Reconfigurable manufacturing systems (RMS) have been widely used to reduce the setup cost of restructuring production lines. However, the current RMS modules are not linked to the cloud for online decision-making to take the proper decision; these modules must connect to an online server (super-computer) that has big data analytics and machine learning capabilities. The online means that data is centralized on cloud (supercomputer) and accessible in real-time. In this study, deep neural networks are utilized to detect the decisive features of a product and build a prediction model in which the iFactory will make the necessary decision for the defective products. The Spark ecosystem is used to manage the access, processing, and storing of the big data streaming. This contribution is implemented as a closed cycle, which for the best of our knowledge, no one in the literature has introduced big data analysis using deep learning on real-time applications in the manufacturing system. The code shows a high accuracy of 97% for classifying the normal versus defective items. The second contribution, which is in Bioinformatics, is the ability to build supervised machine learning approaches based on the gene expression of patients to predict proper treatment for breast cancer. In the trial, to personalize treatment, the machine learns the genes that are active in the patient cohort with a five-year survival period. The initial condition here is that each group must only undergo one specific treatment. After learning about each group (or class), the machine can personalize the treatment of a new patient by diagnosing the patients’ gene expression. The proposed model will help in the diagnosis and treatment of the patient. The future work in this area involves building a protein-protein interaction network with the selected genes for each treatment to first analyze the motives of the genes and target them with the proper drug molecules. In the learning phase, a couple of feature-selection techniques and supervised standard classifiers are used to build the prediction model. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges around 100%. The third contribution is the ability to build semi-supervised learning for the breast cancer survival treatment that advances the second contribution. By understanding the relations between the classes, we can design the machine learning phase based on the similarities between classes. In the proposed research, the researcher used the Euclidean matrix distance among each survival treatment class to build the hierarchical learning model. The distance information that is learned through a non-supervised approach can help the prediction model to select the classes that are away from each other to maximize the distance between classes and gain wider class groups. The performance measurement of this approach shows a slight improvement from the second model. However, this model reduced the number of discriminative genes from 47 to 37. The model in the second contribution studies each class individually while this model focuses on the relationships between the classes and uses this information in the learning phase. Hierarchical clustering is completed to draw the borders between groups of classes before building the classification models. Several distance measurements are tested to identify the best linkages between classes. Most of the nodes show a high-performance measurement where accuracy, sensitivity, specificity, and F-measure ranges from 90% to 100%. All the case study models showed high-performance measurements in the prediction phase. These modern models can be replicated for different problems within different domains. The comprehensive models of the newer technologies are reconfigurable and modular; any newer learning phase can be plugged-in at both ends of the learning phase. Therefore, the output of the system can be an input for another learning system, and a newer feature can be added to the input to be considered for the learning phase

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

    Big data analytics for preventive medicine

    Get PDF
    © 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations

    Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges

    Full text link
    Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process

    Machine-Learning-Powered Cyber-Physical Systems

    Get PDF
    In the last few years, we witnessed the revolution of the Internet of Things (IoT) paradigm and the consequent growth of Cyber-Physical Systems (CPSs). IoT devices, which include a plethora of smart interconnected sensors, actuators, and microcontrollers, have the ability to sense physical phenomena occurring in an environment and provide copious amounts of heterogeneous data about the functioning of a system. As a consequence, the large amounts of generated data represent an opportunity to adopt artificial intelligence and machine learning techniques that can be used to make informed decisions aimed at the optimization of such systems, thus enabling a variety of services and applications across multiple domains. Machine learning processes and analyses such data to generate a feedback, which represents a status the environment is in. A feedback given to the user in order to make an informed decision is called an open-loop feedback. Thus, an open-loop CPS is characterized by the lack of an actuation directed at improving the system itself. A feedback used by the system itself to actuate a change aimed at optimizing the system itself is called a closed-loop feedback. Thus, a closed-loop CPS pairs feedback based on sensing data with an actuation that impacts the system directly. In this dissertation, we propose several applications in the context of CPS. We propose open-loop CPSs designed for the early prediction, diagnosis, and persistency detection of Bovine Respiratory Disease (BRD) in dairy calves, and for gait activity recognition in horses.These works use sensor data, such as pedometers and automated feeders, to perform valuable real-field data collection. Data are then processed by a mix of state-of-the-art approaches as well as novel techniques, before being fed to machine learning algorithms for classification, which informs the user on the status of their animals. Our work further evaluates a variety of trade-offs. In the context of BRD, we adopt optimization techniques to explore the trade-offs of using sensor data as opposed to manual examination performed by domain experts. Similarly, we carry out an extensive analysis on the cost-accuracy trade-offs, which farmers can adopt to make informed decisions on their barn investments. In the context of horse gait recognition we evaluate the benefits of lighter classifications algorithms to improve energy and storage usage, and their impact on classification accuracy. With respect to closed-loop CPS we proposes an incentive-based demand response approach for Heating Ventilation and Air Conditioning (HVAC) designed for peak load reduction in the context of smart grids. Specifically, our approach uses machine learning to process power data from smart thermostats deployed in user homes, along with their personal temperature preferences. Our machine learning models predict power savings due to thermostat changes, which are then plugged into our optimization problem that uses auction theory coupled with behavioral science. This framework selects the set of users who fulfill the power saving requirement, while minimizing financial incentives paid to the users, and, as a consequence, their discomfort. Our work on BRD has been published on IEEE DCOSS 2022 and Frontiers in Animal Science. Our work on gait recognition has been published on IEEE SMARTCOMP 2019 and Elsevier PMC 2020, and our work on energy management and energy prediction has been published on IEEE PerCom 2022 and IEEE SMARTCOMP 2022. Several other works are under submission when this thesis was written, and are included in this document as well

    Effective and Secure Healthcare Machine Learning System with Explanations Based on High Quality Crowdsourcing Data

    Get PDF
    Affordable cloud computing technologies allow users to efficiently outsource, store, and manage their Personal Health Records (PHRs) and share with their caregivers or physicians. With this exponential growth of the stored large scale clinical data and the growing need for personalized care, researchers are keen on developing data mining methodologies to learn efficient hidden patterns in such data. While studies have shown that those progresses can significantly improve the performance of various healthcare applications for clinical decision making and personalized medicine, the collected medical datasets are highly ambiguous and noisy. Thus, it is essential to develop a better tool for disease progression and survival rate predictions, where dataset needs to be cleaned before it is used for predictions and useful feature selection techniques need to be employed before prediction models can be constructed. In addition, having predictions without explanations prevent medical personnel and patients from adopting such healthcare deep learning models. Thus, any prediction models must come with some explanations. Finally, despite the efficiency of machine learning systems and their outstanding prediction performance, it is still a risk to reuse pre-trained models since most machine learning modules that are contributed and maintained by third parties lack proper checking to ensure that they are robust to various adversarial attacks. We need to design mechanisms for detection such attacks. In this thesis, we focus on addressing all the above issues: (i) Privacy Preserving Disease Treatment & Complication Prediction System (PDTCPS): A privacy-preserving disease treatment, complication prediction scheme (PDTCPS) is proposed, which allows authorized users to conduct searches for disease diagnosis, personalized treatments, and prediction of potential complications. (ii) Incentivizing High Quality Crowdsourcing Data For Disease Prediction: A new incentive model with individual rationality and platform profitability features is developed to encourage different hospitals to share high quality data so that better prediction models can be constructed. We also explore how data cleaning and feature selection techniques affect the performance of the prediction models. (iii) Explainable Deep Learning Based Medical Diagnostic System: A deep learning based medical diagnosis system (DL-MDS) is present which integrates heterogeneous medical data sources to produce better disease diagnosis with explanations for authorized users who submit their personalized health related queries. (iv) Attacks on RNN based Healthcare Learning Systems and Their Detection & Defense Mechanisms: Potential attacks on Recurrent Neural Network (RNN) based ML systems are identified and low-cost detection & defense schemes are designed to prevent such adversarial attacks. Finally, we conduct extensive experiments using both synthetic and real-world datasets to validate the feasibility and practicality of our proposed systems

    New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification

    Get PDF
    In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP
    • …
    corecore