11,444 research outputs found
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
Machine Learning Classification of SDSS Transient Survey Images
We show that multiple machine learning algorithms can match human performance
in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS)
supernova survey into real objects and artefacts. This is a first step in any
transient science pipeline and is currently still done by humans, but future
surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate
fully machine-enabled solutions. Using features trained from eigenimage
analysis (principal component analysis, PCA) of single-epoch g, r and
i-difference images, we can reach a completeness (recall) of 96 per cent, while
only incorrectly classifying at most 18 per cent of artefacts as real objects,
corresponding to a precision (purity) of 84 per cent. In general, random
forests performed best, followed by the k-nearest neighbour and the SkyNet
artificial neural net algorithms, compared to other methods such as na\"ive
Bayes and kernel support vector machine. Our results show that PCA-based
machine learning can match human success levels and can naturally be extended
by including multiple epochs of data, transient colours and host galaxy
information which should allow for significant further improvements, especially
at low signal-to-noise.Comment: 14 pages, 8 figures. In this version extremely minor adjustments to
the paper were made - e.g. Figure 5 is now easier to view in greyscal
Myoelectric forearm prostheses: State of the art from a user-centered perspective
User acceptance of myoelectric forearm prostheses is currently low. Awkward control, lack of feedback, and difficult training are cited as primary reasons. Recently, researchers have focused on exploiting the new possibilities offered by advancements in prosthetic technology. Alternatively, researchers could focus on prosthesis acceptance by developing functional requirements based on activities users are likely to perform. In this article, we describe the process of determining such requirements and then the application of these requirements to evaluating the state of the art in myoelectric forearm prosthesis research. As part of a needs assessment, a workshop was organized involving clinicians (representing end users), academics, and engineers. The resulting needs included an increased number of functions, lower reaction and execution times, and intuitiveness of both control and feedback systems. Reviewing the state of the art of research in the main prosthetic subsystems (electromyographic [EMG] sensing, control, and feedback) showed that modern research prototypes only partly fulfill the requirements. We found that focus should be on validating EMG-sensing results with patients, improving simultaneous control of wrist movements and grasps, deriving optimal parameters for force and position feedback, and taking into account the psychophysical aspects of feedback, such as intensity perception and spatial acuity
Rate-distortion Balanced Data Compression for Wireless Sensor Networks
This paper presents a data compression algorithm with error bound guarantee
for wireless sensor networks (WSNs) using compressing neural networks. The
proposed algorithm minimizes data congestion and reduces energy consumption by
exploring spatio-temporal correlations among data samples. The adaptive
rate-distortion feature balances the compressed data size (data rate) with the
required error bound guarantee (distortion level). This compression relieves
the strain on energy and bandwidth resources while collecting WSN data within
tolerable error margins, thereby increasing the scale of WSNs. The algorithm is
evaluated using real-world datasets and compared with conventional methods for
temporal and spatial data compression. The experimental validation reveals that
the proposed algorithm outperforms several existing WSN data compression
methods in terms of compression efficiency and signal reconstruction. Moreover,
an energy analysis shows that compressing the data can reduce the energy
expenditure, and hence expand the service lifespan by several folds.Comment: arXiv admin note: text overlap with arXiv:1408.294
Automated X-ray image analysis for cargo security: Critical review and future promise
We review the relatively immature field of automated image analysis for X-ray cargo imagery. There is increasing demand for automated analysis methods that can assist in the inspection and selection of containers, due to the ever-growing volumes of traded cargo and the increasing concerns that customs- and security-related threats are being smuggled across borders by organised crime and terrorist networks. We split the field into the classical pipeline of image preprocessing and image understanding. Preprocessing includes: image manipulation; quality improvement; Threat Image Projection (TIP); and material discrimination and segmentation. Image understanding includes: Automated Threat Detection (ATD); and Automated Contents Verification (ACV). We identify several gaps in the literature that need to be addressed and propose ideas for future research. Where the current literature is sparse we borrow from the single-view, multi-view, and CT X-ray baggage domains, which have some characteristics in common with X-ray cargo
APPLICATIONS OF MACHINE LEARNING IN MICROBIAL FORENSICS
Microbial ecosystems are complex, with hundreds of members interacting with each other and the environment. The intricate and hidden behaviors underlying these interactions make research questions challenging – but can be better understood through machine learning. However, most machine learning that is used in microbiome work is a black box form of investigation, where accurate predictions can be made, but the inner logic behind what is driving prediction is hidden behind nontransparent layers of complexity.
Accordingly, the goal of this dissertation is to provide an interpretable and in-depth machine learning approach to investigate microbial biogeography and to use micro-organisms as novel tools to detect geospatial location and object provenance (previous known origin). These contributions follow with a framework that allows extraction of interpretable metrics and actionable insights from microbiome-based machine learning models. The first part of this work provides an overview of machine learning in the context of microbial ecology, human microbiome studies and environmental monitoring – outlining common practice and shortcomings. The second part of this work demonstrates a field study to demonstrate how machine learning can be used to characterize patterns in microbial biogeography globally – using microbes from ports located around the world. The third part of this work studies the persistence and stability of natural microbial communities from the environment that have colonized objects (vessels) and stay attached as they travel through the water. Finally, the last part of this dissertation provides a robust framework for investigating the microbiome. This framework provides a reasonable understanding of the data being used in microbiome-based machine learning and allows researchers to better apprehend and interpret results.
Together, these extensive experiments assist an understanding of how to carry an in-silico design that characterizes candidate microbial biomarkers from real world settings to a rapid, field deployable diagnostic assay. The work presented here provides evidence for the use of microbial forensics as a toolkit to expand our basic understanding of microbial biogeography, microbial community stability and persistence in complex systems, and the ability of machine learning to be applied to downstream molecular detection platforms for rapid and accurate detection
Deep Learning Approaches for Seagrass Detection in Multispectral Imagery
Seagrass forms the basis for critically important marine ecosystems. Seagrass is an important factor to balance marine ecological systems, and it is of great interest to monitor its distribution in different parts of the world. Remote sensing imagery is considered as an effective data modality based on which seagrass monitoring and quantification can be performed remotely. Traditionally, researchers utilized multispectral satellite images to map seagrass manually. Automatic machine learning techniques, especially deep learning algorithms, recently achieved state-of-the-art performances in many computer vision applications. This dissertation presents a set of deep learning models for seagrass detection in multispectral satellite images. It also introduces novel domain adaptation approaches to adapt the models for new locations and for temporal image series. In Chapter 3, I compare a deep capsule network (DCN) with a deep convolutional neural network (DCNN) for seagrass detection in high-resolution multispectral satellite images. These methods are tested on three satellite images in Florida coastal areas and obtain comparable performances. In addition, I also propose a few-shot deep learning strategy to transfer knowledge learned by DCN from one location to the others for seagrass detection. In Chapter 4, I develop a semi-supervised domain adaptation method to generalize a trained DCNN model to multiple locations for seagrass detection. First, the model utilizes a generative adversarial network (GAN) to align marginal distribution of data in the source domain to that in the target domain using unlabeled data from both domains. Second, it uses a few labeled samples from the target domain to align class-specific data distributions between the two. The model achieves the best results in 28 out of 36 scenarios as compared to other state-of-the-art domain adaptation methods. In Chapter 5, I develop a semantic segmentation method for seagrass detection in multispectral time-series images. First, I train a state-of-the-art image segmentation method using an active learning approach where I use the DCNN classifier in the loop. Then, I develop an unsupervised domain adaptation (UDA) algorithm to detect seagrass across temporal images. I also extend our unsupervised domain adaptation work for seagrass detection across locations. In Chapter 6, I present an automated bathymetry estimation model based on multispectral satellite images. Bathymetry refers to the depth of the ocean floor and contributes a predominant role in identifying marine species in seawater. Accurate bathymetry information of coastal areas will facilitate seagrass detection by reducing false positives because seagrass usually do not grow beyond a certain depth. However, bathymetry information of most parts of the world is obsolete or missing. Traditional bathymetry measurement systems require extensive labor efforts. I utilize an ensemble machine learning-based approach to estimate bathymetry based on a few in-situ sonar measurements and evaluate the proposed model in three coastal locations in Florida
On the use of pairwise distance learning for brain signal classification with limited observations
The increasing access to brain signal data using electroencephalography creates new opportunities to study electrophysiological brain activity and perform ambulatory diagnoses of neurological disorders. This work proposes a pairwise distance learning approach for schizophrenia classification relying on the spectral properties of the signal. To be able to handle clinical trials with a limited number of observations (i.e. case and/or control individuals), we propose a Siamese neural network architecture to learn a discriminative feature space from pairwise combinations of observations per channel. In this way, the multivariate order of the signal is used as a form of data augmentation, further supporting the network generalization ability. Convolutional layers with parameters learned under a cosine contrastive loss are proposed to adequately explore spectral images derived from the brain signal. The proposed approach for schizophrenia diagnostic was tested on reference clinical trial data under resting-state protocol, achieving 0.95 ± 0.05 accuracy, 0.98 ± 0.02 sensitivity and 0.92 ± 0.07 specificity. Results show that the features extracted using the proposed neural network are remarkably superior than baselines to diagnose schizophrenia (+20pp in accuracy and sensitivity), suggesting the existence of non-trivial electrophysiological brain patterns able to capture discriminative neuroplasticity profiles among individuals. The code is available on Github: https://github.com/DCalhas/siamese_schizophrenia_eeg.Peer ReviewedPostprint (author's final draft
- …