788 research outputs found
Further results on dissimilarity spaces for hyperspectral images RF-CBIR
Content-Based Image Retrieval (CBIR) systems are powerful search tools in
image databases that have been little applied to hyperspectral images.
Relevance feedback (RF) is an iterative process that uses machine learning
techniques and user's feedback to improve the CBIR systems performance. We
pursued to expand previous research in hyperspectral CBIR systems built on
dissimilarity functions defined either on spectral and spatial features
extracted by spectral unmixing techniques, or on dictionaries extracted by
dictionary-based compressors. These dissimilarity functions were not suitable
for direct application in common machine learning techniques. We propose to use
a RF general approach based on dissimilarity spaces which is more appropriate
for the application of machine learning algorithms to the hyperspectral
RF-CBIR. We validate the proposed RF method for hyperspectral CBIR systems over
a real hyperspectral dataset.Comment: In Pattern Recognition Letters (2013
High-throughput variable-to-fixed entropy codec using selective, stochastic code forests
Efficient high-throughput (HT) compression algorithms are paramount to meet the stringent constraints of present and upcoming data storage, processing, and transmission systems. In particular, latency, bandwidth and energy requirements are critical for those systems. Most HT codecs are designed to maximize compression speed, and secondarily to minimize compressed lengths. On the other hand, decompression speed is often equally or more critical than compression speed, especially in scenarios where decompression is performed multiple times and/or at critical parts of a system. In this work, an algorithm to design variable-to-fixed (VF) codes is proposed that prioritizes decompression speed. Stationary Markov analysis is employed to generate multiple, jointly optimized codes (denoted code forests). Their average compression efficiency is on par with the state of the art in VF codes, e.g., within 1% of Yamamoto et al.\u27s algorithm. The proposed code forest structure enables the implementation of highly efficient codecs, with decompression speeds 3.8 times faster than other state-of-the-art HT entropy codecs with equal or better compression ratios for natural data sources. Compared to these HT codecs, the proposed forests yields similar compression efficiency and speeds
How good are detection proposals, really?
Current top performing Pascal VOC object detectors employ detection proposals
to guide the search for objects thereby avoiding exhaustive sliding window
search across images. Despite the popularity of detection proposals, it is
unclear which trade-offs are made when using them during object detection. We
provide an in depth analysis of ten object proposal methods along with four
baselines regarding ground truth annotation recall (on Pascal VOC 2007 and
ImageNet 2013), repeatability, and impact on DPM detector performance. Our
findings show common weaknesses of existing methods, and provide insights to
choose the most adequate method for different settings
Triangle mesh compression and homological spanning forests
Triangle three-dimensional meshes have been widely used to represent 3D objects in several applications. These meshes are usually surfaces that require a huge amount of resources when they are stored, processed or transmitted. Therefore, many algorithms proposing an efficient compression of these meshes have been developed since the early 1990s. In this paper we propose a lossless method that compresses the connectivity of the mesh by using a valence-driven approach. Our algorithm introduces an improvement over the currently available valence-driven methods, being able to deal with triangular surfaces of arbitrary topology and encoding, at the same time, the topological information of the mesh by using Homological Spanning Forests. We plan to develop in the future (geo-topological) image analysis and processing algorithms, that directly work with the compressed data
SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis
In this paper, we propose a novel approach, called SENATUS, for joint traffic
anomaly detection and root-cause analysis. Inspired from the concept of a
senate, the key idea of the proposed approach is divided into three stages:
election, voting and decision. At the election stage, a small number of
\nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{,
which are used} to represent approximately the total (usually huge) set of
traffic flows. In the voting stage, anomaly detection is applied on the senator
flows and the detected anomalies are correlated to identify the most possible
anomalous time bins. Finally in the decision stage, a machine learning
technique is applied to the senator flows of each anomalous time bin to find
the root cause of the anomalies. We evaluate SENATUS using traffic traces
collected from the Pan European network, GEANT, and compare against another
approach which detects anomalies using lossless compression of traffic
histograms. We show the effectiveness of SENATUS in diagnosing anomaly types:
network scans and DoS/DDoS attacks
Towards the text compression based feature extraction in high impedance fault detection
High impedance faults of medium voltage overhead lines with covered conductors can be identified by the presence of partial discharges. Despite it is a subject of research for more than 60 years, online partial discharges detection is always a challenge, especially in environment with heavy background noise. In this paper, a new approach for partial discharge pattern recognition is presented. All results were obtained on data, acquired from real 22 kV medium voltage overhead power line with covered conductors. The proposed method is based on a text compression algorithm and it serves as a signal similarity estimation, applied for the first time on partial discharge pattern. Its relevancy is examined by three different variations of classification model. The improvement gained on an already deployed model proves its quality.Web of Science1211art. no. 214
Permutation Decision Trees
Decision Tree is a well understood Machine Learning model that is based on
minimizing impurities in the internal nodes. The most common impurity measures
are Shannon entropy and Gini impurity. These impurity measures are insensitive
to the order of training data and hence the final tree obtained is invariant to
any permutation of the data. This leads to a serious limitation in modeling
data instances that have order dependencies. In this work, we propose the use
of Effort-To-Compress (ETC) - a complexity measure, for the first time, as an
impurity measure. Unlike Shannon entropy and Gini impurity, structural impurity
based on ETC is able to capture order dependencies in the data, thus obtaining
potentially different decision trees for different permutations of the same
data instances (Permutation Decision Trees). We then introduce the notion of
Permutation Bagging achieved using permutation decision trees without the need
for random feature selection and sub-sampling. We compare the performance of
the proposed permutation bagged decision trees with Random Forests. Our model
does not assume that the data instances are independent and identically
distributed. Potential applications include scenarios where a temporal order
present in the data instances is to be respected.Comment: 12 pages, 10 figure
- …