579 research outputs found

    The Ideal Candidate. Analysis of Professional Competences through Text Mining of Job Offers

    Get PDF
    The aim of this paper is to propose analytical tools for identifying peculiar aspects of job market for graduates. We propose a strategy for dealing with daa tat have different source and nature

    On the Application of Data Mining to Official Data

    Get PDF
    Retrieving valuable knowledge and statistical patterns from official data has a great potential in supporting strategic policy making. Data Mining (DM) techniques are well-known for providing flexible and efficient analytical tools for data processing. In this paper, we provide an introduction to applications of DM to official statistics and flag the important issues and challenges. Considering recent advancements in software projects for DM, we propose intelligent data control system design and specifications as an example of DM application in official data processing.Data mining, Official data, Intelligent data control system

    3rd Workshop in Symbolic Data Analysis: book of abstracts

    Get PDF
    This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

    Data Mining and Official Statistics: The Past, the Present and the Future

    Full text link
    Along with the increasing availability of large databases under the purview of National Statistical Institutes, the application of data mining techniques to official statistics is now a hot topic that is far more important at present than it was ever before. Presented in this article is a thorough review of published work to date on the application of data mining in official statistics, and on identification of the techniques that have been explored. In addition, the importance of data mining to official statistics is flagged and a summary of the challenges that have hindered its development over the course of the last two decades is presented

    On clustering interval data with different scales of measures : experimental results

    Get PDF
    This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.Symbolic Data Analysis can be defined as the extension of standard data analysis to more complex data tables. We illustrate the application of the Ascendant Hierarchical Cluster Analysis (AHCA) to a symbolic data set (with a known structure) in the field of the automobile industry (car data set), in which objects are described by variables whose values are intervals of the real data set (interval variables). The AHCA of thirty-three car models, described by eight interval variables (with different scales of measure), was based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. We applied three probabilistic aggregation criteria in the scope of the VL methodology (V for Validity, L for Linkage). Moreover, we compare the achieved results with those obtained by other authors, and with a priori partition into four clusters defined by the category (Utilitarian, Berlina, Sporting and Luxury) to which the car belong. We used the global statistics of levels (STAT) to evaluate the obtained partitions

    Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

    Full text link
    In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

    Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification

    Get PDF
    It is well-known that the values of symbolic variables may take various forms such as an interval, a set of stochastic measurements of some underlying patterns or qualitative multi-values and so on. However, the majority of existing work in symbolic data analysis still focuses on interval values. Although some pioneering work in stochastic pattern based symbolic data and mixture of symbolic variables has been explored, it still lacks flexibility and computation efficiency to make full use of the distinctive individual symbolic variables. Therefore, we bring forward a novel hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy for complex symbolic data and apply it to emitter identification. Extensive experiments indicate that our method has outperformed its peers in both computational efficiency and emitter identification accuracy.Peer reviewe
    corecore