145 research outputs found

    MULTIPLE-INSTANCE AND ONE-CLASS RULE-BASED ALGORITHMS

    Get PDF
    In this work we developed rule-based algorithms for multiple-instance learning and one-class learning problems, namely, the mi-DS and OneClass-DS algorithms. Multiple-Instance Learning (MIL) is a variation of classical supervised learning where there is a need to classify bags (collection) of instances instead of single instances. The bag is labeled positive if at least one of its instances is positive, otherwise it is negative. One-class learning problem is also known as outlier or novelty detection problem. One-class classifiers are trained on data describing only one class and are used in situations where data from other classes are not available, and also for highly unbalanced data sets. Extensive comparisons and statistical testing of the two algorithms show that they generate models that perform on par with other state-of-the-art algorithms

    Sample selection via clustering to construct support vector-like classifiers

    Get PDF
    This paper explores the possibility of constructing RBF classifiers which, somewhat like support vector machines, use a reduced number of samples as centroids, by means of selecting samples in a direct way. Because sample selection is viewed as a hard computational problem, this selection is done after a previous vector quantization: this way obtaining also other similar machines using centroids selected from those that are learned in a supervised manner. Several forms of designing these machines are considered, in particular with respect to sample selection; as well as some different criteria to train them. Simulation results for well-known classification problems show very good performance of the corresponding designs, improving that of support vector machines and reducing substantially their number of units. This shows that our interest in selecting samples (or centroids) in an efficient manner is justified. Many new research avenues appear from these experiments and discussions, as suggested in our conclusions.Publicad

    A random forest algorithm to improve the Lee–Carter mortality forecasting: impact on q-forward

    Get PDF
    Increased life expectancy in developed countries has led researchers to pay more attention to mortality projection to anticipate changes in mortality rates. Following the scheme proposed in Deprez et al. (Eur Actuar J 7(2):337–352, 2017) and extended by Levantesi and Pizzorusso (Risks 7(1):26, 2019), we propose a novel approach based on the combination of random forest and two-dimensional P-spline, allowing for accurate mortality forecasting. This approach firstly provides a diagnosis of the limits of the Lee–Carter mortality model through the application of the random forest estimator to the ratio between the observed deaths and their estimated values given by a certain model, while the two-dimensional P-spline are used to smooth and project the random forest estimator in the forecasting phase. Further considerations are devoted to assessing the demographic consistency of the results. The model accuracy is evaluated by an out-of-sample test. Finally, we analyze the impact of our model on the pricing of q-forward contracts. All the analyses have been carried out on several countries by using data from the Human Mortality Database and considering the Lee–Carter model

    Leveraging Label Information for Multimodal Emotion Recognition

    Full text link
    Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance.Comment: Accepted by Interspeech 202

    Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

    Full text link
    While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an undersampled\textit{undersampled} balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. In the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory, the test accuracy of robust neural network classifiers is constrained by the number of minority samples

    A supervised learning algorithm for learning precise timing of multiple spikes in multilayer spiking neural networks

    Get PDF
    There is a biological evidence to prove information is coded through precise timing of spikes in the brain. However, training a population of spiking neurons in a multilayer network to fire at multiple precise times remains a challenging task. Delay learning and the effect of a delay on weight learning in a spiking neural network (SNN) have not been investigated thoroughly. This paper proposes a novel biologically plausible supervised learning algorithm for learning precisely timed multiple spikes in a multilayer SNNs. Based on the spike-timing-dependent plasticity learning rule, the proposed learning method trains an SNN through the synergy between weight and delay learning. The weights of the hidden and output neurons are adjusted in parallel. The proposed learning method captures the contribution of synaptic delays to the learning of synaptic weights. Interaction between different layers of the network is realized through biofeedback signals sent by the output neurons. The trained SNN is used for the classification of spatiotemporal input patterns. The proposed learning method also trains the spiking network not to fire spikes at undesired times which contribute to misclassification. Experimental evaluation on benchmark data sets from the UCI machine learning repository shows that the proposed method has comparable results with classical rate-based methods such as deep belief network and the autoencoder models. Moreover, the proposed method can achieve higher classification accuracies than single layer and a similar multilayer SNN

    Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework

    Get PDF
    In recent years, a variety of research areas have contributed to a set of related problems with rare event, anomaly, novelty and outlier detection terms as the main actors. These multiple research areas have created a mix-up between terminology and problems. In some research, similar problems have been named differently; while in some other works, the same term has been used to describe different problems. This confusion between terms and problems causes the repetition of research and hinders the advance of the field. Therefore, a standardization is imperative. The goal of this paper is to underline the differences between each term, and organize the area by looking at all these terms under the umbrella of supervised classification. Therefore, a one-to-one assignment of terms to learning scenarios is proposed. In fact, each learning scenario is associated with the term most frequently used in the literature. In order to validate this proposal, a set of experiments retrieving papers from Google Scholar, ACM Digital Library and IEEE Xplore has been carried out
    • …
    corecore