Search CORE

9,109 research outputs found

Automated data pre-processing via meta-learning

Author: A Guazzelli
A Kalousis
D Pyle
F Serban
J Vanschoren
J-U Kietz
M Hall
MA Munson
SF Crone
T Dasu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Creating nationally-consistent health information: engaging with the national health information committees

Author
Publication venue: Australian Institute of Health and Welfare
Publication date
Field of study

This document provides guidance on engaging with the national processes responsible for health information and data standards. It has been developed to ensure data collected are consistent, accurate and useful for policy, planning and program management. Summary In a health system dispersed across the states and territories by the Australian Government, strong governance arrangements are needed to ensure that health information, collected under different health administrations, are consistent and therefore accurate and useful for policy, planning and program management. The National Health Information Agreement, signed in 2011 by all jurisdictions and the national health agencies associated with health information, provides the overarching framework for the governance of national data collections. Governance mechanisms for many aspects of health information are established under the Standing Council on Health (SCoH), with particular committees vested with delegated authority to endorse national standards and definitions. This document provides guidance on engaging with the national processes responsible for health information and data standards. In particular, it describes how those developing data about some aspect of health can obtain assistance with, and/or endorsement of, their data development work

Analysis and Policy Observatory (APO)

PRESISTANT: Learning based assistant for data pre-processing

Author: Abelló Alberto
Aluja-Banet Tomàs
Bilalli Besim
Wrembel Robert
Publication venue
Publication date: 02/03/2018
Field of study

Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

An event distribution platform for recommending cultural activities

Author: Coppens Sam
De Pessemier Toon
Dooms Simon
Geebelen Kristof
Mannens Erik
Martens Luc
Publication venue: Ghent University, Department of Information technology
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Business intelligence systems and user's parameters: an application to a documents' database

Author: Afolabi Babajide
Thiery Odile
Publication venue
Publication date: 30/07/2005
Field of study

This article presents earlier results of our research works in the area of modeling Business Intelligence Systems. The basic idea of this research area is presented first. We then show the necessity of including certain users' parameters in Information systems that are used in Business Intelligence systems in order to integrate a better response from such systems. We identified two main types of attributes that can be missing from a base and we showed why they needed to be included. A user model that is based on a cognitive user evolution is presented. This model when used together with a good definition of the information needs of the user (decision maker) will accelerate his decision making process

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Recommendation, collaboration and social search

Author: Nichols David M.
Twidale Michael B.
Publication venue: 'Facet Publishing'
Publication date: 01/01/2011
Field of study

This chapter considers the social component of interactive information retrieval: what is the role of other people in searching and browsing? For simplicity we begin by considering situations without computers. After all, you can interactively retrieve information without a computer; you just have to interact with someone or something else. Such an analysis can then help us think about the new forms of collaborative interactions that extend our conceptions of information search, made possible by the growth of networked ubiquitous computing technology. Information searching and browsing have often been conceptualized as a solitary activity, however they always have a social component. We may talk about 'the' searcher or 'the' user of a database or information resource. Our focus may be on individual uses and our research may look at individual users. Our experiments may be designed to observe the behaviors of individual subjects. Our models and theories derived from our empirical analyses may focus substantially or exclusively on an individual's evolving goals, thoughts, beliefs, emotions and actions. Nevertheless there are always social aspects of information seeking and use present, both implicitly and explicitly. We start by summarizing some of the history of information access with an emphasis on social and collaborative interactions. Then we look at the nature of recommendations, social search and interfaces to support collaboration between information seekers. Following this we consider how the design of interactive information systems is influenced by their social elements

Research Commons@Waikato