103,460 research outputs found
Correlating neural and symbolic representations of language
Analysis methods which enable us to better understand the representations and
functioning of neural models of language are increasingly needed as deep
learning becomes the dominant approach in NLP. Here we present two methods
based on Representational Similarity Analysis (RSA) and Tree Kernels (TK) which
allow us to directly quantify how strongly the information encoded in neural
activation patterns corresponds to information represented by symbolic
structures such as syntax trees. We first validate our methods on the case of a
simple synthetic language for arithmetic expressions with clearly defined
syntax and semantics, and show that they exhibit the expected pattern of
results. We then apply our methods to correlate neural representations of
English sentences with their constituency parse trees.Comment: ACL 201
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography
Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics.
ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms.
ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases.
ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology
Learning discriminative tree edit similarities for linear classification — Application to melody recognition
Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, measures based on the edit distance are widely used, and there exist a few methods for learning them from data. In this context, we recently proposed GESL (Bellet et al., 2012 [3]), an approach to string edit similarity learning based on loss minimization which offers theoretical guarantees as to the generalization ability and discriminative power of the learned similarities. In this paper, we argue that GESL, which has been originally dedicated to deal with strings, can be extended to trees and lead to powerful and competitive similarities. We illustrate this claim on a music recognition task, namely melody classification, where each piece is represented as a tree modeling its structure as well as rhythm and pitch information. The results show that GESL outperforms standard as well as probabilistically-learned edit distances and that it is able to describe consistently the underlying melodic similarity model.This work was supported by a grant from CPER Nord-Pas de Calais/FEDER DATA Advanced data science and technologies 2015-2020 and the Spanish Ministerio de Economía y Competitividad project TIMuL (No. TIN2013--48152--C2--1--R supported by UE FEDER funds)
ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks
Hash codes are efficient data representations for coping with the ever
growing amounts of data. In this paper, we introduce a random forest semantic
hashing scheme that embeds tiny convolutional neural networks (CNN) into
shallow random forests, with near-optimal information-theoretic code
aggregation among trees. We start with a simple hashing scheme, where random
trees in a forest act as hashing functions by setting `1' for the visited tree
leaf, and `0' for the rest. We show that traditional random forests fail to
generate hashes that preserve the underlying similarity between the trees,
rendering the random forests approach to hashing challenging. To address this,
we propose to first randomly group arriving classes at each tree split node
into two groups, obtaining a significantly simplified two-class classification
problem, which can be handled using a light-weight CNN weak learner. Such
random class grouping scheme enables code uniqueness by enforcing each class to
share its code with different classes in different trees. A non-conventional
low-rank loss is further adopted for the CNN weak learners to encourage code
consistency by minimizing intra-class variations and maximizing inter-class
distance for the two random class groups. Finally, we introduce an
information-theoretic approach for aggregating codes of individual trees into a
single hash code, producing a near-optimal unique hash for each class. The
proposed approach significantly outperforms state-of-the-art hashing methods
for image retrieval tasks on large-scale public datasets, while performing at
the level of other state-of-the-art image classification techniques while
utilizing a more compact and efficient scalable representation. This work
proposes a principled and robust procedure to train and deploy in parallel an
ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201
Käyttäjien jäljittäminen ja kannusteiden hallinta älykkäissä liikennejärjestelmissä
A system for offering incentives for ecological modes of transport is presented. The main focus is on the verification of claims of having taken a trip on such a mode of transport. Three components are presented for the task of travel mode identification: A system to select features, a means to measure a GPS (Global Positioning System) trace's similarity to a bus route, and finally a machine-learning approach to the actual identification.
Feature selection is carried out by sorting the features according to statistical significance, and eliminating correlating features. The novel features considered are skewnesses, kurtoses, auto- and cross correlations, and spectral components of speed and acceleration. Of these, only spectral components are found to be particularly useful in classification.
Bus route similarity is measured by using a novel indexing structure called MBR-tree, short for "Multiple Bounding Rectangle", to find the most similar bus traces. The MBR-tree is an expansion of the R-tree for sequences of bounding rectangles, based on an estimation method for longest common subsequence that uses such sequences. A second option of decomposing traces to sequences of direction-distance-duration-triples and indexing them in an M-tree using edit distance with real penalty is considered but shown to perform poorly.
For machine learning, the methods considered are Bayes classification, random forest, and feedforward neural networks with and without autoencoders. Autoencoder neural networks are shown to perform perplexingly poorly, but the other methods perform close to the state-of-the-art.
Methods for obfuscating the user's location, and constructing secure electronic coupons, are also discussed
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
- …