302,397 research outputs found
PRESISTANT: Learning based assistant for data pre-processing
Data pre-processing is one of the most time consuming and relevant steps in a
data analysis process (e.g., classification task). A given data pre-processing
operator (e.g., transformation) can have positive, negative or zero impact on
the final result of the analysis. Expert users have the required knowledge to
find the right pre-processing operators. However, when it comes to non-experts,
they are overwhelmed by the amount of pre-processing operators and it is
challenging for them to find operators that would positively impact their
analysis (e.g., increase the predictive accuracy of a classifier). Existing
solutions either assume that users have expert knowledge, or they recommend
pre-processing operators that are only "syntactically" applicable to a dataset,
without taking into account their impact on the final analysis. In this work,
we aim at providing assistance to non-expert users by recommending data
pre-processing operators that are ranked according to their impact on the final
analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the
impact of pre-processing operators on the performance (e.g., predictive
accuracy) of 5 different classification algorithms, such as J48, Naive Bayes,
PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the
recommendations provided by our tool, show that PRESISTANT can effectively help
non-experts in order to achieve improved results in their analytical tasks
Recommended from our members
Classification of information systems research revisited: A keyword analysis approach
A number of studies have previously been conducted on keyword analysis in order to provide a comprehensive scheme to classify information systems (IS) research. However, these studies appeared prior to 1994, and IS research has clearly developed substantially since then with the emergence of areas such as electronic commerce, electronic government, electronic health and numerous others. Furthermore, the majority of European IS outlets - such as the European Journal of Information Systems and Information Systems Journal - were founded in the early 1990s, and keywords from these journals were not included in any previous work. Given that a number of studies have raised the issue of differences in European and North American IS research topics and approaches, it is arguable that any such analysis must consider sources from both locations to provide a representative and balanced view of IS classification. Moreover, it has also been argued that there is a need for further work in order to create a comprehensive keyword classification scheme reflecting the current state of the art. Consequently, the aim of this paper is to present the results of a keyword analysis utilizing keywords appearing in major peer-reviewed IS publications after the year 1990 through to 2007. This aim is realized by means of the two following objectives: (1) collect all keywords appearing in 24 peer reviewed IS journals after 1990; and (2) identify keywords not included in the previous IS keyword classification scheme. This paper also describes further research required in order to place new keywords in appropriate IS research categories. The paper makes an incremental contribution toward a contemporary means of classifying IS research. This work is important and useful for researchers in understanding the area and evolution of the IS field and also has implications for improving information search and retrieval activities
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Question categorization and expert retrieval methods have been crucial for
information organization and accessibility in community question & answering
(CQA) platforms. Research in this area, however, has dealt with only the text
modality. With the increasing multimodal nature of web content, we focus on
extending these methods for CQA questions accompanied by images. Specifically,
we leverage the success of representation learning for text and images in the
visual question answering (VQA) domain, and adapt the underlying concept and
architecture for automated category classification and expert retrieval on
image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of
Yahoo! Answers.
To the best of our knowledge, this is the first work to tackle the
multimodality challenge in CQA, and to adapt VQA models for tasks on a more
ecologically valid source of visual questions. Our analysis of the differences
between visual QA and community QA data drives our proposal of novel
augmentations of an attention method tailored for CQA, and use of auxiliary
tasks for learning better grounding features. Our final model markedly
outperforms the text-only and VQA model baselines for both tasks of
classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201
Tailored retrieval of health information from the web for facilitating communication and empowerment of elderly people
A patient, nowadays, acquires health information from the Web mainly through a “human-to-machine”
communication process with a generic search engine. This, in turn, affects, positively or negatively, his/her
empowerment level and the “human-to-human” communication process that occurs between a patient and a
healthcare professional such as a doctor. A generic communication process can be modelled by considering
its syntactic-technical, semantic-meaning, and pragmatic-effectiveness levels and an efficacious
communication occurs when all the communication levels are fully addressed. In the case of retrieval of health
information from the Web, although a generic search engine is able to work at the syntactic-technical level,
the semantic and pragmatic aspects are left to the user and this can be challenging, especially for elderly
people. This work presents a custom search engine, FACILE, that works at the three communication levels
and allows to overcome the challenges confronted during the search process. A patient can specify his/her
information requirements in a simple way and FACILE will retrieve the “right” amount of Web content in a
language that he/she can easily understand. This facilitates the comprehension of the found information and
positively affects the empowerment process and communication with healthcare professionals
A Novel Approach for Learning How to Automatically Match Job Offers and Candidate Profiles
Automatic matching of job offers and job candidates is a major problem for a
number of organizations and job applicants that if it were successfully
addressed could have a positive impact in many countries around the world. In
this context, it is widely accepted that semi-automatic matching algorithms
between job and candidate profiles would provide a vital technology for making
the recruitment processes faster, more accurate and transparent. In this work,
we present our research towards achieving a realistic matching approach for
satisfactorily addressing this challenge. This novel approach relies on a
matching learning solution aiming to learn from past solved cases in order to
accurately predict the results in new situations. An empirical study shows us
that our approach is able to beat solutions with no learning capabilities by a
wide margin.Comment: 15 pages, 6 figure
- …