Search CORE

16 research outputs found

Classifier Fusion for SVM-Based Multimedia Semantic Indexing

Author: Ayache Stéphane
Gensel Jérôme
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

International audienceConcept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Combining several modalities, features or concepts is one of the key issues for bridging the gap between signal and semantics. In this pa- per, we present three fusion schemes inspired from the classical early and late fusion schemes. First, we present a kernel-based fusion scheme which takes advantage of the kernel basis of classifiers such as SVMs. Second, we integrate a new normalization process into the early fusion scheme. Third, we present a contextual late fusion scheme to merge classification scores of several concepts. We conducted experiments in the framework of the official TRECVID'06 evaluation campaign and we obtained signif- icant improvements with the proposed fusion schemes relatively to usual fusion schemes

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Regularized Regression Problem in hyper-RKHS for Learning Kernels

Author: Huang Xiaolin
Liu Fanghui
Shi Lei
Suykens Johan A. K.
Yang Jie
Publication venue
Publication date: 06/11/2020
Field of study

This paper generalizes the two-stage kernel learning framework, illustrates its utility for kernel learning and out-of-sample extensions, and proves {asymptotic} convergence results for the introduced kernel learning model. Algorithmically, we extend target alignment by hyper-kernels in the two-stage kernel learning framework. The associated kernel learning task is formulated as a regression problem in a hyper-reproducing kernel Hilbert space (hyper-RKHS), i.e., learning on the space of kernels itself. To solve this problem, we present two regression models with bivariate forms in this space, including kernel ridge regression (KRR) and support vector regression (SVR) in the hyper-RKHS. By doing so, it provides significant model flexibility for kernel learning with outstanding performance in real-world applications. Specifically, our kernel learning framework is general, that is, the learned underlying kernel can be positive definite or indefinite, which adapts to various requirements in kernel learning. Theoretically, we study the convergence behavior of these learning algorithms in the hyper-RKHS and derive the learning rates. Different from the traditional approximation analysis in RKHS, our analyses need to consider the non-trivial independence of pairwise samples and the characterisation of hyper-RKHS. To the best of our knowledge, this is the first work in learning theory to study the approximation performance of regularized regression problem in hyper-RKHS.Comment: 25 pages, 3 figure

arXiv.org e-Print Archive

Kernel Discriminant Analysis Using Triangular Kernel for Semantic Scene Classification

Author: F. Yan
J. Kittler
K. Mikolajczyk
M. A. Tahir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Semantic scene classification is a challenging research problem that aims to categorise images into semantic classes such as beaches, sunsets or mountains. This prob-lem can be formulated as multi-labeled classification prob-lem where an image can belong to more than one concep-tual class such as sunsets and beaches at the same time. Re-cently, Kernel Discriminant Analysis combined with spec-tral regression (SR-KDA) has been successfully used for face, text and spoken letter recognition. But SR-KDA method works only with positive definite symmetric matri-ces. In this paper, we have modified this method to support both definite and indefinite symmetric matrices. The main idea is to use LDLT decomposition instead of Cholesky decomposition. The modified SR-KDA is applied to scene database involving 6 concepts. We validate the advocated approach and demonstrate that it yields significant perfor-mance gains when conditionally positive definite triangular kernel is used instead of positive definite symmetric kernels such as linear, polynomial or RBF. The results also indicate performance gains when compared with the state-of-the art multi-label methods for semantic scene classification.

CiteSeerX

Crossref

A pipeline and comparative study of 12 machine learning models for text classification

Author: Angione Claudio
Occhipinti Annalisa
Rogers Louis
Publication venue
Publication date: 04/04/2022
Field of study

Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.Comment: This article has been accepted for publication in Expert Systems with Applications, April 2022. Published by Elsevier. All data, models, and code used in this work are available on GitHub at https://github.com/Angione-Lab/12-machine-learning-models-for-text-classificatio

arXiv.org e-Print Archive

Teeside University's Research Repository

A pipeline and comparative study of 12 machine learning models for text classification

Author: Angione Claudio
Occhipinti Annalisa
Rogers Louis
Publication venue
Publication date: 06/04/2022
Field of study

Teeside University's Research Repository

Speeding up active relevance feedback with approximate kNN retrieval for hyperplane queries

Author: Arya
Berg
Chapelle
Crucianu
Ferecatu
Geusebroek
Li
Panda
Panda
Samet
Schölkopf
Schölkopf
Schölkopf
Tao
Tao
Tao
Tax
van Rijsbergen
Publication venue: 'Wiley'
Publication date: 01/08/2008
Field of study

In content-based image retrieval, relevance feedback (RF) is a prominent method for reducing the semantic gap between the low-level features describing the content and the usually higher-level meaning of user's target. Recent RF methods are able to identify complex target classes after relatively few feedback iterations. However, because the computational complexity of such methods is linear in the size of the database, retrieval can be quite slow on very large databases. To address this scalability issue for active learning-based RF, we put forward a method that consists in the construction of an index in the feature space associated to a kernel function and in performing approximate kNN hyperplane queries with this feature space index. The experimental evaluation performed on two image databases show that a significant speedup can be achieved at the expense of a limited increase in the number of feedback rounds

Crossref

Hal-Diderot

HAL - UPEC / UPEM