2,610 research outputs found
Towards automatic pulmonary nodule management in lung cancer screening with deep learning
The introduction of lung cancer screening programs will produce an
unprecedented amount of chest CT scans in the near future, which radiologists
will have to read in order to decide on a patient follow-up strategy. According
to the current guidelines, the workup of screen-detected nodules strongly
relies on nodule size and nodule type. In this paper, we present a deep
learning system based on multi-stream multi-scale convolutional networks, which
automatically classifies all nodule types relevant for nodule workup. The
system processes raw CT data containing a nodule without the need for any
additional information such as nodule segmentation or nodule size and learns a
representation of 3D data by analyzing an arbitrary number of 2D views of a
given nodule. The deep learning system was trained with data from the Italian
MILD screening trial and validated on an independent set of data from the
Danish DLCST screening trial. We analyze the advantage of processing nodules at
multiple scales with a multi-stream convolutional network architecture, and we
show that the proposed deep learning system achieves performance at classifying
nodule type that surpasses the one of classical machine learning approaches and
is within the inter-observer variability among four experienced human
observers.Comment: Published on Scientific Report
Separation of pulsar signals from noise with supervised machine learning algorithms
We evaluate the performance of four different machine learning (ML)
algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ),
Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of
pulsars from radio frequency interference (RFI) and other sources of noise,
using a dataset obtained from the post-processing of a pulsar search pi peline.
This dataset was previously used for cross-validation of the SPINN-based
machine learning engine, used for the reprocessing of HTRU-S survey data
arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique
(SMOTE) to deal with high class imbalance in the dataset. We report a variety
of quality scores from all four of these algorithms on both the non-SMOTE and
SMOTE datasets. For all the above ML methods, we report high accuracy and
G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances
using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum
Relevance approach to report algorithm-agnostic feature ranking. From these
methods, we find that the signal to noise of the folded profile to be the best
feature. We find that all the ML algorithms report FPRs about an order of
magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for
the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and
Computin
Skewed Evolving Data Streams Classification with Actionable Knowledge Extraction using Data Approximation and Adaptive Classification Framework
Skewed evolving data stream (SEDS) classification is a challenging research problem for online streaming data applications. The fundamental challenges in streaming data classification are class imbalance and concept drift. However, recently, either independently or together, the two topics have received enough attention; the data redundancy while performing stream data mining and classification remains unexplored. Moreover, the existing solutions for the classification of SEDSs have focused on solving concept drift and/or class imbalance problems using the sliding window mechanism, which leads to higher computational complexity and data redundancy problems. To end this, we propose a novel Adaptive Data Stream Classification (ADSC) framework for solving the concept drift, class imbalance, and data redundancy problems with higher computational and classification efficiency. Data approximation, adaptive clustering, classification, and actionable knowledge extraction are the major phases of ADSC. For the purpose of approximating unique items in the data stream with data pre-processing during the data approximation phase, we develop the Flajolet Martin (FM) algorithm. The periodically approximated tuples are grouped into distinct classes using an adaptive clustering algorithm to address the problem of concept drift and class imbalance. In the classification phase, the supervised classifiers are employed to classify the unknown incoming data streams into either of the classes discovered by the adaptive clustering algorithm. We then extract the actionable knowledge using classified skewed evolved data stream information for the end user decision-making process. The ADSC framework is empirically assessed utilizing two streaming datasets regarding classification and computing efficiency factors. The experimental results shows the better efficiency of the proposed ADSC framework as compared with existing classification methods
Dynamic Data Mining: Methodology and Algorithms
Supervised data stream mining has become an important and challenging data mining task in modern
organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples
and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions.
To address these three challenges, this thesis proposes the novel dynamic data mining (DDM)
methodology by effectively applying supervised ensemble models to data stream mining. DDM can be
loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired
by the idea that although the underlying concepts in a data stream are time-varying, their distinctions
can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in
order to classify incoming examples of similar concepts.
First, following the general paradigm of DDM, we examine the different concept-drifting stream
mining scenarios and propose corresponding effective and efficient data mining algorithms.
• To address concept drift caused merely by changes of variable distributions, which we term
pseudo concept drift, base models built on categorized streaming data are organized and
selected in line with their corresponding variable distribution characteristics.
• To address concept drift caused by changes of variable and class joint distributions, which we
term true concept drift, an effective data categorization scheme is introduced. A group of
working models is dynamically organized and selected for reacting to the drifting concept.
Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by
DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce
easily six effective algorithms for mining data streams with skewed class distributions.
In addition, we also introduce a new ensemble model approach for batch learning, following the same
methodology. Both theoretical and empirical studies demonstrate its effectiveness.
Future work would be targeted at improving the effectiveness and efficiency of the proposed
algorithms. Meantime, we would explore the possibilities of using the integration framework to solve
other open stream mining research problems
Stochasticity from function -- why the Bayesian brain may need no noise
An increasing body of evidence suggests that the trial-to-trial variability
of spiking activity in the brain is not mere noise, but rather the reflection
of a sampling-based encoding scheme for probabilistic computing. Since the
precise statistical properties of neural activity are important in this
context, many models assume an ad-hoc source of well-behaved, explicit noise,
either on the input or on the output side of single neuron dynamics, most often
assuming an independent Poisson process in either case. However, these
assumptions are somewhat problematic: neighboring neurons tend to share
receptive fields, rendering both their input and their output correlated; at
the same time, neurons are known to behave largely deterministically, as a
function of their membrane potential and conductance. We suggest that spiking
neural networks may, in fact, have no need for noise to perform sampling-based
Bayesian inference. We study analytically the effect of auto- and
cross-correlations in functionally Bayesian spiking networks and demonstrate
how their effect translates to synaptic interaction strengths, rendering them
controllable through synaptic plasticity. This allows even small ensembles of
interconnected deterministic spiking networks to simultaneously and
co-dependently shape their output activity through learning, enabling them to
perform complex Bayesian computation without any need for noise, which we
demonstrate in silico, both in classical simulation and in neuromorphic
emulation. These results close a gap between the abstract models and the
biology of functionally Bayesian spiking networks, effectively reducing the
architectural constraints imposed on physical neural substrates required to
perform probabilistic computing, be they biological or artificial
Adaptive Online Sequential ELM for Concept Drift Tackling
A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.Comment: Hindawi Publishing. Computational Intelligence and Neuroscience
Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016,
Accepted 17 May 2016. Special Issue on "Advances in Neural Networks and
Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering
Applications". Academic Editor: Stefan Hauf
On the performance of deep learning models for time series classification in streaming
Processing data streams arriving at high speed requires the development of
models that can provide fast and accurate predictions. Although deep neural
networks are the state-of-the-art for many machine learning tasks, their
performance in real-time data streaming scenarios is a research area that has
not yet been fully addressed. Nevertheless, there have been recent efforts to
adapt complex deep learning models for streaming tasks by reducing their
processing rate. The design of the asynchronous dual-pipeline deep learning
framework allows to predict over incoming instances and update the model
simultaneously using two separate layers. The aim of this work is to assess the
performance of different types of deep architectures for data streaming
classification using this framework. We evaluate models such as multi-layer
perceptrons, recurrent, convolutional and temporal convolutional neural
networks over several time-series datasets that are simulated as streams. The
obtained results indicate that convolutional architectures achieve a higher
performance in terms of accuracy and efficiency.Comment: Paper submitted to the 15th International Conference on Soft
Computing Models in Industrial and Environmental Applications (SOCO 2020
- …