121 research outputs found
Fast & Confident Probabilistic Categorization
We describe NRC's submission to the Anomaly Detection/Text Mining competition organised at the Text Mining Workshop 2007. This submission relies on a straightforward implementation of the probabilistic categoriser described in (Gaussier et al., ECIR'02). This categoriser is adapted to handle multiple labelling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labelling confidence. This technique achieves a score of 1.689 on the test data
Lag space estimation in time series modelling
The purpose of this contribution is to investigate some techniques for finding the relevant lag-space, i.e. input information, for time series modelling. This is an important aspect of time series modelling, as it conditions the design of the model through the regressor vector a.k.a. input layer in a neural network. We give a rough description of the problem, insist on the concept of generalisation, and propose a generalisation-based method. We compare it to a non-parametric test, and carry out experiments, both on the well-known H'enon map, and on a real data set. 1. INTRODUCTION Let us assume that a time series is obtained from a mapping X t = f (X t\Gammau 1 ; X t\Gammau 2 ; : : : ; X t\Gammau m ). The m delays can include long term dependencies, in order to take into account e.g. some seasonality. The (u i ) are the primary dependencies, the smallest set of sufficient, not necessarily consecutive delays. All other dependencies are obtained through a combination of mappings and a..
Extracting the relevant delays in time series modelling
. In this contribution, we suggest a convenient way to use generalisation error to extract the relevant delays from a timevarying process, i.e. the delays that lead to the best prediction performance. We design a generalisation-based algorithm that takes its inspiration from traditional variable selection, and more precisely stepwise forward selection. The method is compared to other forward selection schemes, as well as to a non-parametric tests aimed at estimating the embedding dimension of time series. The final application extends these results to the efficient estimation of FIR filters on some real data. OVERVIEW In system identification as well as in time series modelling, the choice of the inputs to our model plays a crucial role. In order to obtain good performance, one shall model future behaviour from a set of relevant past measurements. An insufficient amount of inputs will prevent the model from capturing the underlying mapping. On the other hand, including irrelevant inpu..
On the use of a pruning prior for neural networks
. We adress the problem of using a regularization prior that prunes unnecessary weights in a neural network architecture. This prior provides a convenient alternative to traditional weight-decay. Two examples are studied to support this method and illustrate its use. First we use the sunspots benchmark problem as an example of time series processing. Then we adress the problem of system identification on a small artificial system. OVERVIEW It is well known that the use of a regularization term during optimization improves the general accuracy of the model obtained. In the case of neural networks, regularization is most often used through the addition of a weight-decay term to the cost function in order to improve the generalization abilities of the solution [5]. Other methods for improving these abilities include pruning, along the lines of OBD [6]. These techniques have been applied to a wide variety of problems, including time series and system identification. In this paper, we ana..
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Adaptive Regularization of Neural Networks Using Conjugate Gradient
Recently we suggested a regularization scheme which iteratively adapts regularization parameters by minimizing validation error using simple gradient descent. In this contribution we present an improved algorithm based on the conjugate gradient technique. Numerical experiments with feed-forward neural networks successfully demonstrate improved generalization ability and lower computational cost. 1. INTRODUCTION Neural networks are flexible tools for regression, timeseries modeling and pattern recognition which find expression in universal approximation theorems [6]. The risk of over-fitting on noisy data is of major concern in neural network design, as exemplified by the bias-variance dilemma, see e.g., [5]. Using regularization serves two purposes: first, it remedies numerical instabilities during training by imposing smoothness on the cost function; secondly, regularization is a tool for reducing variance by introducing extra bias. The overall goal is to minimize the generalization..
Co-occurrence Models in Music Genre Classification
Music genre classification has been investigated using many different methods, but most of them build on probabilistic models of feature vectors xr which only represent the short time segment with index r of the song. Here, three different co-occurrence models are proposed which instead consider the whole song as an integrated part of the probabilistic model. This was achieved by considering a song as a set of independent co-occurrences (s, xr) (s is the song index) instead of just a set of independent (xr)’s. The models were tested against two baseline classification methods on a difficult 11 genre data set with a variety of modern music. The basis was a so-called AR feature representation of the music. Besides the benefit of having proper probabilistic models of the whole song, the lowest classification test errors were found using one of the proposed models. 1
- …