Search CORE

120,609 research outputs found

A Hybrid Classification Method for Database Contents Analysis

Author: Al Shehabi Shadi
Lamirel Jean-Charles
Toussaint Yannick
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceThe hybridisation of different classification and mining techniques coming from different areas such as the numeric and the symbolic worlds can produce a significant enhancement of the overall classification and retrieval performance in a Data Mining or Information Retrieval context.This paper introduces an experimental methodology to match an explicative structure issued from a symbolic classification to a numerical classification. The classification models used in the experiment are a boolean lattice on the symbolic side and a Kohonen Self Organising Map model (SOM) on the numerical side

INRIA a CCSD electronic archive server

Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis

Author: Estrin Deborah
Hsieh Cheng-Kang
Yang Longqi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/12/2015
Field of study

User preference profiling is an important task in modern online social networks (OSN). With the proliferation of image-centric social platforms, such as Pinterest, visual contents have become one of the most informative data streams for understanding user preferences. Traditional approaches usually treat visual content analysis as a general classification problem where one or more labels are assigned to each image. Although such an approach simplifies the process of image analysis, it misses the rich context and visual cues that play an important role in people's perception of images. In this paper, we explore the possibilities of learning a user's latent visual preferences directly from image contents. We propose a distance metric learning method based on Deep Convolutional Neural Networks (CNN) to directly extract similarity information from visual contents and use the derived distance metric to mine individual users' fine-grained visual preferences. Through our preliminary experiments using data from 5,790 Pinterest users, we show that even for the images within the same category, each user possesses distinct and individually-identifiable visual preferences that are consistent over their lifetime. Our results underscore the untapped potential of finer-grained visual preference profiling in understanding users' preferences.Comment: 2015 IEEE 15th International Conference on Data Mining Workshop

arXiv.org e-Print Archive

Crossref

Information Theoretic Approach Based on Entropy for Classification of Bioacoustics Signals

Author: Abdul Hamid Ahmad
Ho Chong Mun
Jedol Dayou
Mohd. Noh Dalimin
Ng Chee Han
Sithi V. Muniandy
Publication venue: 'AIP Publishing'
Publication date: 01/01/2010
Field of study

A new hybrid method for automated frog sound identification by incorporating entropy and spectral centroid concept is proposed. Entropy has important physical implications as the amount of “disorder” of a system. This study explores the use of various definitions ofentropies such as the Shannon entropy, Kolmogorov‐Rényi entropy and Tsallis entropy as measure of information contents or complexity for the purpose of the pattern recognitionof bioacoustics signal. Each of these definitions of entropies characterizes different aspects of the signal. The entropies are combined with other standard pattern recognition tools such as the Fourier spectral analysis to form a hybrid spectral‐entropic classification scheme. The efficiency of the system is tested using a database of sound syllables are obtained from a number of species of Microhylidae frogs. Nonparametric k‐NN classifier is used to recognize the frog species based on the spectral‐entropic features. The result showed that the k‐NN classifier based on the selected features is able to identify the species of the frogs with relativity good accuracy compared to features relying on spectral contents alone. The robustness of the developed system is also tested for different noise levels

UMS Institutional Repository

An agent-driven semantical identifier using radial basis neural networks and reinforcement learning

Author: Napoli Christian
Pappalardo Giuseppe
Tramontana Emiliano
Publication venue
Publication date: 01/01/2014
Field of study

Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution,for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright preservation. This work proposes an innovative multi-agent driven machine learning technique that has been developed for authorship attribution. By means of a preprocessing for word-grouping and time-period related analysis of the common lexicon, we determine a bias reference level for the recurrence frequency of the words within analysed texts, and then train a Radial Basis Neural Networks (RBPNN)-based classifier to identify the correct author. The main advantage of the proposed approach lies in the generality of the semantic analysis, which can be applied to different contexts and lexical domains, without requiring any modification. Moreover, the proposed system is able to incorporate an external input, meant to tune the classifier, and then self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts

Author: Amancio Diego R.
Costa Luciano da F.
Oliveira Jr. Osvaldo N.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2013
Field of study

There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between the various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies

arXiv.org e-Print Archive

Elsevier - Publisher Connector

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)