Search CORE

579 research outputs found

The Ideal Candidate. Analysis of Professional Competences through Text Mining of Job Offers

Author: E. DI MEGLIO
GRASSIA MARIA GABRIELLA
M. MISURACA
Publication venue: place:HEIDELBERG
Publication date: 01/01/2007
Field of study

The aim of this paper is to propose analytical tools for identifying peculiar aspects of job market for graduates. We propose a strategy for dealing with daa tat have different source and nature

Archivio della ricerca - Università degli studi di Napoli Federico II

On the Application of Data Mining to Official Data

Author: Gheitanchi Shahin
Hassani Hossein
Yeganegi Mohammad Reza
Publication venue
Publication date
Field of study

Retrieving valuable knowledge and statistical patterns from official data has a great potential in supporting strategic policy making. Data Mining (DM) techniques are well-known for providing flexible and efficient analytical tools for data processing. In this paper, we provide an introduction to applications of DM to official statistics and flag the important issues and challenges. Considering recent advancements in software projects for DM, we propose intelligent data control system design and specifications as an example of DM application in official data processing.Data mining, Official data, Intelligent data control system

Research Papers in Economics

3rd Workshop in Symbolic Data Analysis: book of abstracts

Author: Arroyo Gallardo Javier, ed.
Brito Paula, ed.
Maté Carlos, ed.
Noirhomme-Fraiture Monique, ed.
Publication venue
Publication date: 01/01/2012
Field of study

This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

Docta Complutense

Data Mining and Official Statistics: The Past, the Present and the Future

Author: Hassani H.
Saporta G.
Silva E.S.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2014
Field of study

Along with the increasing availability of large databases under the purview of National Statistical Institutes, the application of data mining techniques to official statistics is now a hot topic that is far more important at present than it was ever before. Presented in this article is a thorough review of published work to date on the application of data mining in official statistics, and on identification of the techniques that have been explored. In addition, the importance of data mining to official statistics is flagged and a summary of the challenges that have hindered its development over the course of the last two decades is presented

UAL Research Online

HAL Descartes

Hal-Diderot

On clustering interval data with different scales of measures : experimental results

Author: Bacelar-Nicolau Helena
Nicolau Fernando C.
Silva Osvaldo
Sousa Áurea
Publication venue: 'ABC Journals'
Publication date: 01/02/2015
Field of study

This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.Symbolic Data Analysis can be defined as the extension of standard data analysis to more complex data tables. We illustrate the application of the Ascendant Hierarchical Cluster Analysis (AHCA) to a symbolic data set (with a known structure) in the field of the automobile industry (car data set), in which objects are described by variables whose values are intervals of the real data set (interval variables). The AHCA of thirty-three car models, described by eight interval variables (with different scales of measure), was based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. We applied three probabilistic aggregation criteria in the scope of the VL methodology (V for Validity, L for Linkage). Moreover, we compare the achieved results with those obtained by other authors, and with a priori partition into four clusters defined by the category (Utilitarian, Berlina, Sporting and Luxury) to which the car belong. We used the global statistics of levels (STAT) to evaluate the obtained partitions

Repositório da Universidade dos Açores

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification

Author: Lu Jiaheng
Wang Wei
Xu Xin
Publication venue
Publication date: 01/07/2018
Field of study

It is well-known that the values of symbolic variables may take various forms such as an interval, a set of stochastic measurements of some underlying patterns or qualitative multi-values and so on. However, the majority of existing work in symbolic data analysis still focuses on interval values. Although some pioneering work in stochastic pattern based symbolic data and mixture of symbolic variables has been explored, it still lacks flexibility and computation efficiency to make full use of the distinctive individual symbolic variables. Therefore, we bring forward a novel hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy for complex symbolic data and apply it to emitter identification. Extensive experiments indicate that our method has outperformed its peers in both computational efficiency and emitter identification accuracy.Peer reviewe

Helsingin yliopiston digitaalinen arkisto