3,167 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Re-mining item associations: methodology and a case study in apparel retailing

    Get PDF
    Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques

    Extracting user spatio-temporal profiles from location based social networks

    Get PDF
    Report de RecercaLocation Based Social Networks (LBSN) like Twitter or Instagram are a good source for user spatio-temporal behavior. These social network provide a low rate sampling of user's location information during large intervals of time that can be used to discover complex behaviors, including mobility profiles, points of interest or unusual events. This information is important for different domains like mobility route planning, touristic recommendation systems or city planning. Other approaches have used the data from LSBN to categorize areas of a city depending on the categories of the places that people visit or to discover user behavioral patterns from their visits. The aim of this paper is to analyze how the spatio-temporal behavior of a large number of users in a well limited geographical area can be segmented in different profiles. These behavioral profiles are obtained by means of clustering algorithms that show the different behaviors that people have when living and visiting a city. The data analyzed was obtained from the public data feeds of Twitter and Instagram inside the area of the city of Barcelona for a period of several months. The analysis of these data shows that these kind of algorithms can be successfully applied to data from any city (or any general area) to discover useful profiles that can be described on terms of the city singular places and areas and their temporal relationships. These profiles can be used as a basis for making decisions in different application domains, specially those related with mobility inside and outside a city.Preprin

    PRESISTANT: Learning based assistant for data pre-processing

    Get PDF
    Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

    Visualising computational intelligence through converting data into formal concepts

    Get PDF

    DGD Gallery: Storage, sharing, and publication of digital research data

    Get PDF
    We describe a project, called the "Discretization in Geometry and Dynamics Gallery", or DGD Gallery for short, whose goal is to store geometric data and to make it publicly available. The DGD Gallery offers an online web service for the storage, sharing, and publication of digital research data.Comment: 19 pages, 8 figures, to appear in "Advances in Discrete Differential Geometry", ed. A. I. Bobenko, Springer, 201
    corecore