Search CORE

302,397 research outputs found

PRESISTANT: Learning based assistant for data pre-processing

Author: Abelló Alberto
Aluja-Banet Tomàs
Bilalli Besim
Wrembel Robert
Publication venue
Publication date: 02/03/2018
Field of study

Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Classification of information systems research revisited: A keyword analysis approach

Author: Dwivedi YK
Lal B
Mustafee N
Williams MD
Publication venue: 'Association for Information Systems'
Publication date: 01/01/2009
Field of study

A number of studies have previously been conducted on keyword analysis in order to provide a comprehensive scheme to classify information systems (IS) research. However, these studies appeared prior to 1994, and IS research has clearly developed substantially since then with the emergence of areas such as electronic commerce, electronic government, electronic health and numerous others. Furthermore, the majority of European IS outlets - such as the European Journal of Information Systems and Information Systems Journal - were founded in the early 1990s, and keywords from these journals were not included in any previous work. Given that a number of studies have raised the issue of differences in European and North American IS research topics and approaches, it is arguable that any such analysis must consider sources from both locations to provide a representative and balanced view of IS classification. Moreover, it has also been argued that there is a need for further work in order to create a comprehensive keyword classification scheme reflecting the current state of the art. Consequently, the aim of this paper is to present the results of a keyword analysis utilizing keywords appearing in major peer-reviewed IS publications after the year 1990 through to 2007. This aim is realized by means of the two following objectives: (1) collect all keywords appearing in 24 peer reviewed IS journals after 1990; and (2) identify keywords not included in the previous IS keyword classification scheme. This paper also describes further research required in order to place new keywords in appropriate IS research categories. The paper makes an incremental contribution toward a contemporary means of classifying IS research. This work is important and useful for researchers in understanding the area and evolution of the IS field and also has implications for improving information search and retrieval activities

Nottingham Trent Institutional Repository (IRep)

Brunel University Research Archive

AIS Electronic Library (AISeL)

Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

Author: Ma Lin
Malinowski Mateusz
Roberts Kirk
Stanley Clayton
Publication venue
Publication date: 25/05/2019
Field of study

Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA, and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201

arXiv.org e-Print Archive

Crossref

Tailored retrieval of health information from the web for facilitating communication and empowerment of elderly people

Author: Alfano Marco
Helfert Markus
Lenzitti Biagio
Taibi Davide
Publication venue: 'Scitepress'
Publication date: 01/01/2020
Field of study

A patient, nowadays, acquires health information from the Web mainly through a “human-to-machine” communication process with a generic search engine. This, in turn, affects, positively or negatively, his/her empowerment level and the “human-to-human” communication process that occurs between a patient and a healthcare professional such as a doctor. A generic communication process can be modelled by considering its syntactic-technical, semantic-meaning, and pragmatic-effectiveness levels and an efficacious communication occurs when all the communication levels are fully addressed. In the case of retrieval of health information from the Web, although a generic search engine is able to work at the syntactic-technical level, the semantic and pragmatic aspects are left to the user and this can be challenging, especially for elderly people. This work presents a custom search engine, FACILE, that works at the three communication levels and allows to overcome the challenges confronted during the search process. A patient can specify his/her information requirements in a simple way and FACILE will retrieve the “right” amount of Web content in a language that he/she can easily understand. This facilitates the comprehension of the found information and positively affects the empowerment process and communication with healthcare professionals

Crossref

Irish Universities

DCU Online Research Access Service

A Novel Approach for Learning How to Automatically Match Job Offers and Candidate Profiles

Author: Martinez-Gil Jorge
Paoletti Alejandra Lorena
Pichler Mario
Publication venue
Publication date: 07/09/2017
Field of study

Automatic matching of job offers and job candidates is a major problem for a number of organizations and job applicants that if it were successfully addressed could have a positive impact in many countries around the world. In this context, it is widely accepted that semi-automatic matching algorithms between job and candidate profiles would provide a vital technology for making the recruitment processes faster, more accurate and transparent. In this work, we present our research towards achieving a realistic matching approach for satisfactorily addressing this challenge. This novel approach relies on a matching learning solution aiming to learn from past solved cases in order to accurately predict the results in new situations. An empirical study shows us that our approach is able to beat solutions with no learning capabilities by a wide margin.Comment: 15 pages, 6 figure

arXiv.org e-Print Archive