37,969 research outputs found
PRESISTANT: Learning based assistant for data pre-processing
Data pre-processing is one of the most time consuming and relevant steps in a
data analysis process (e.g., classification task). A given data pre-processing
operator (e.g., transformation) can have positive, negative or zero impact on
the final result of the analysis. Expert users have the required knowledge to
find the right pre-processing operators. However, when it comes to non-experts,
they are overwhelmed by the amount of pre-processing operators and it is
challenging for them to find operators that would positively impact their
analysis (e.g., increase the predictive accuracy of a classifier). Existing
solutions either assume that users have expert knowledge, or they recommend
pre-processing operators that are only "syntactically" applicable to a dataset,
without taking into account their impact on the final analysis. In this work,
we aim at providing assistance to non-expert users by recommending data
pre-processing operators that are ranked according to their impact on the final
analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the
impact of pre-processing operators on the performance (e.g., predictive
accuracy) of 5 different classification algorithms, such as J48, Naive Bayes,
PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the
recommendations provided by our tool, show that PRESISTANT can effectively help
non-experts in order to achieve improved results in their analytical tasks
WSCDL to WSBPEL: A Case Study of ATL-based Transformation
The ATLAS Transformation Language (ATL) is a hybrid transformation language that combines declarative and imperative programming elements and provides means to define model transformations. Most transformations using ATL reported in the literature show a simplified use of ATL, and often involve a single transformation. However, in more realistic situations, multiple transformations may be necessary, especially in case the original input/output models are not represented in the metametamodeling representation expected by the transformation engine. In this paper, we discuss a model transformation from service choreography (WSCDL) to service orchestration (WSBPEL), which cannot be performed in a single ATL transformation due to the mismatch between the concrete XML syntax of these languages and the metametamodeling representation expected by the ATL transformation engine. This requires auxiliary transformations in which this mismatch is resolved. In principle, the required auxiliary transformations can be implemented using XSLT or a general-purpose programming language like Java. However, in our case study, we evaluate the use of ATL to perform these transformations. We exploit ATL by leveraging the ATL's XML\ud
injection and the XML extraction mechanisms to perform the overall transformation in terms of a transformation chain
Representation Independent Analytics Over Structured Data
Database analytics algorithms leverage quantifiable structural properties of
the data to predict interesting concepts and relationships. The same
information, however, can be represented using many different structures and
the structural properties observed over particular representations do not
necessarily hold for alternative structures. Thus, there is no guarantee that
current database analytics algorithms will still provide the correct insights,
no matter what structures are chosen to organize the database. Because these
algorithms tend to be highly effective over some choices of structure, such as
that of the databases used to validate them, but not so effective with others,
database analytics has largely remained the province of experts who can find
the desired forms for these algorithms. We argue that in order to make database
analytics usable, we should use or develop algorithms that are effective over a
wide range of choices of structural organizations. We introduce the notion of
representation independence, study its fundamental properties for a wide range
of data analytics algorithms, and empirically analyze the amount of
representation independence of some popular database analytics algorithms. Our
results indicate that most algorithms are not generally representation
independent and find the characteristics of more representation independent
heuristics under certain representational shifts
- …