37,969 research outputs found

    PRESISTANT: Learning based assistant for data pre-processing

    Get PDF
    Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

    WSCDL to WSBPEL: A Case Study of ATL-based Transformation

    Get PDF
    The ATLAS Transformation Language (ATL) is a hybrid transformation language that combines declarative and imperative programming elements and provides means to define model transformations. Most transformations using ATL reported in the literature show a simplified use of ATL, and often involve a single transformation. However, in more realistic situations, multiple transformations may be necessary, especially in case the original input/output models are not represented in the metametamodeling representation expected by the transformation engine. In this paper, we discuss a model transformation from service choreography (WSCDL) to service orchestration (WSBPEL), which cannot be performed in a single ATL transformation due to the mismatch between the concrete XML syntax of these languages and the metametamodeling representation expected by the ATL transformation engine. This requires auxiliary transformations in which this mismatch is resolved. In principle, the required auxiliary transformations can be implemented using XSLT or a general-purpose programming language like Java. However, in our case study, we evaluate the use of ATL to perform these transformations. We exploit ATL by leveraging the ATL's XML\ud injection and the XML extraction mechanisms to perform the overall transformation in terms of a transformation chain

    Representation Independent Analytics Over Structured Data

    Full text link
    Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, there is no guarantee that current database analytics algorithms will still provide the correct insights, no matter what structures are chosen to organize the database. Because these algorithms tend to be highly effective over some choices of structure, such as that of the databases used to validate them, but not so effective with others, database analytics has largely remained the province of experts who can find the desired forms for these algorithms. We argue that in order to make database analytics usable, we should use or develop algorithms that are effective over a wide range of choices of structural organizations. We introduce the notion of representation independence, study its fundamental properties for a wide range of data analytics algorithms, and empirically analyze the amount of representation independence of some popular database analytics algorithms. Our results indicate that most algorithms are not generally representation independent and find the characteristics of more representation independent heuristics under certain representational shifts

    Data mining based cyber-attack detection

    Get PDF
    • …
    corecore