1,195 research outputs found
Data Mining
The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining
Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities
With the increasing amount of spatial-temporal~(ST) ocean data, numerous
spatial-temporal data mining (STDM) studies have been conducted to address
various oceanic issues, e.g., climate forecasting and disaster warning.
Compared with typical ST data (e.g., traffic data), ST ocean data is more
complicated with some unique characteristics, e.g., diverse regionality and
high sparsity. These characteristics make it difficult to design and train STDM
models. Unfortunately, an overview of these studies is still missing, hindering
computer scientists to identify the research issues in ocean while discouraging
researchers in ocean science from applying advanced STDM techniques. To remedy
this situation, we provide a comprehensive survey to summarize existing STDM
studies in ocean. Concretely, we first summarize the widely-used ST ocean
datasets and identify their unique characteristics. Then, typical ST ocean data
quality enhancement techniques are discussed. Next, we classify existing STDM
studies for ocean into four types of tasks, i.e., prediction, event detection,
pattern mining, and anomaly detection, and elaborate the techniques for these
tasks. Finally, promising research opportunities are highlighted. This survey
will help scientists from the fields of both computer science and ocean science
have a better understanding of the fundamental concepts, key techniques, and
open challenges of STDM in ocean
Machine Learning Methods To Identify Hidden Phenotypes In The Electronic Health Record
The widespread adoption of Electronic Health Records (EHRs) means an unprecedented amount of patient treatment and outcome data is available to researchers. Research is a tertiary priority in the EHR, where the priorities are patient care and billing. Because of this, the data is not standardized or formatted in a manner easily adapted to machine learning approaches. Data may be missing for a large variety of reasons ranging from individual input styles to differences in clinical decision making, for example, which lab tests to issue. Few patients are annotated at a research quality, limiting sample size and presenting a moving gold standard. Patient progression over time is key to understanding many diseases but many machine learning algorithms require a snapshot, at a single time point, to create a usable vector form. In this dissertation, we develop new machine learning methods and computational workflows to extract hidden phenotypes from the Electronic Health Record (EHR). In Part 1, we use a semi-supervised deep learning approach to compensate for the low number of research quality labels present in the EHR. In Part 2, we examine and provide recommendations for characterizing and managing the large amount of missing data inherent to EHR data. In Part 3, we present an adversarial approach to generate synthetic data that closely resembles the original data while protecting subject privacy. We also introduce a workflow to enable reproducible research even when data cannot be shared. In Part 4, we introduce a novel strategy to first extract sequential data from the EHR and then demonstrate the ability to model these sequences with deep learning
A Comprehensive Survey on Rare Event Prediction
Rare event prediction involves identifying and forecasting events with a low
probability using machine learning and data analysis. Due to the imbalanced
data distributions, where the frequency of common events vastly outweighs that
of rare events, it requires using specialized methods within each step of the
machine learning pipeline, i.e., from data processing to algorithms to
evaluation protocols. Predicting the occurrences of rare events is important
for real-world applications, such as Industry 4.0, and is an active research
area in statistical and machine learning. This paper comprehensively reviews
the current approaches for rare event prediction along four dimensions: rare
event data, data processing, algorithmic approaches, and evaluation approaches.
Specifically, we consider 73 datasets from different modalities (i.e.,
numerical, image, text, and audio), four major categories of data processing,
five major algorithmic groupings, and two broader evaluation approaches. This
paper aims to identify gaps in the current literature and highlight the
challenges of predicting rare events. It also suggests potential research
directions, which can help guide practitioners and researchers.Comment: 44 page
Knowledge-infused Deep Learning Enables Interpretable Landslide Forecasting
Forecasting how landslides will evolve over time or whether they will fail is
a challenging task due to a variety of factors, both internal and external.
Despite their considerable potential to address these challenges, deep learning
techniques lack interpretability, undermining the credibility of the forecasts
they produce. The recent development of transformer-based deep learning offers
untapped possibilities for forecasting landslides with unprecedented
interpretability and nonlinear feature learning capabilities. Here, we present
a deep learning pipeline that is capable of predicting landslide behavior
holistically, which employs a transformer-based network called LFIT to learn
complex nonlinear relationships from prior knowledge and multiple source data,
identifying the most relevant variables, and demonstrating a comprehensive
understanding of landslide evolution and temporal patterns. By integrating
prior knowledge, we provide improvement in holistic landslide forecasting,
enabling us to capture diverse responses to various influencing factors in
different local landslide areas. Using deformation observations as proxies for
measuring the kinetics of landslides, we validate our approach by training
models to forecast reservoir landslides in the Three Gorges Reservoir and
creeping landslides on the Tibetan Plateau. When prior knowledge is
incorporated, we show that interpretable landslide forecasting effectively
identifies influential factors across various landslides. It further elucidates
how local areas respond to these factors, making landslide behavior and trends
more interpretable and predictable. The findings from this study will
contribute to understanding landslide behavior in a new way and make the
proposed approach applicable to other complex disasters influenced by internal
and external factors in the future
Machine Learning in Manufacturing towards Industry 4.0: From ‘For Now’ to ‘Four-Know’
While attracting increasing research attention in science and technology, Machine Learning (ML) is playing a critical role in the digitalization of manufacturing operations towards Industry 4.0. Recently, ML has been applied in several fields of production engineering to solve a variety of tasks with different levels of complexity and performance. However, in spite of the enormous number of ML use cases, there is no guidance or standard for developing ML solutions from ideation to deployment. This paper aims to address this problem by proposing an ML application roadmap for the manufacturing industry based on the state-of-the-art published research on the topic. First, this paper presents two dimensions for formulating ML tasks, namely, ’Four-Know’ (Know-what, Know-why, Know-when, Know-how) and ’Four-Level’ (Product, Process, Machine, System). These are used to analyze ML development trends in manufacturing. Then, the paper provides an implementation pipeline starting from the very early stages of ML solution development and summarizes the available ML methods, including supervised learning methods, semi-supervised methods, unsupervised methods, and reinforcement methods, along with their typical applications. Finally, the paper discusses the current challenges during ML applications and provides an outline of possible directions for future developments
A Review on Brain Tumor Segmentation Based on Deep Learning Methods with Federated Learning Techniques
Brain tumors have become a severe medical complication in recent years due to their high fatality rate. Radiologists segment the tumor manually, which is time-consuming, error-prone, and expensive. In recent years, automated segmentation based on deep learning has demonstrated promising results in solving computer vision problems such as image classification and segmentation. Brain tumor segmentation has recently become a prevalent task in medical imaging to determine the tumor location, size, and shape using automated methods. Many researchers have worked on various machine and deep learning approaches to determine the most optimal solution using the convolutional methodology. In this review paper, we discuss the most effective segmentation techniques based on the datasets that are widely used and publicly available. We also proposed a survey of federated learning methodologies to enhance global segmentation performance and ensure privacy. A comprehensive literature review is suggested after studying more than 100 papers to generalize the most recent techniques in segmentation and multi-modality information. Finally, we concentrated on unsolved problems in brain tumor segmentation and a client-based federated model training strategy. Based on this review, future researchers will understand the optimal solution path to solve these issues
Advances in Computational Intelligence Applications in the Mining Industry
This book captures advancements in the applications of computational intelligence (artificial intelligence, machine learning, etc.) to problems in the mineral and mining industries. The papers present the state of the art in four broad categories: mine operations, mine planning, mine safety, and advances in the sciences, primarily in image processing applications. Authors in the book include both researchers and industry practitioners
Building Machines That Learn and Think Like People
Recent progress in artificial intelligence (AI) has renewed interest in
building systems that learn and think like people. Many advances have come from
using deep neural networks trained end-to-end in tasks such as object
recognition, video games, and board games, achieving performance that equals or
even beats humans in some respects. Despite their biological inspiration and
performance achievements, these systems differ from human intelligence in
crucial ways. We review progress in cognitive science suggesting that truly
human-like learning and thinking machines will have to reach beyond current
engineering trends in both what they learn, and how they learn it.
Specifically, we argue that these machines should (a) build causal models of
the world that support explanation and understanding, rather than merely
solving pattern recognition problems; (b) ground learning in intuitive theories
of physics and psychology, to support and enrich the knowledge that is learned;
and (c) harness compositionality and learning-to-learn to rapidly acquire and
generalize knowledge to new tasks and situations. We suggest concrete
challenges and promising routes towards these goals that can combine the
strengths of recent neural network advances with more structured cognitive
models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary
proposals (until Nov. 22, 2016).
https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar
- …