12,981 research outputs found
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
EGO: a personalised multimedia management tool
The problems of Content-Based Image Retrieval (CBIR) sys- tems can be attributed to the semantic gap between the low-level data representation and the high-level concepts the user associates with images, on the one hand, and the time-varying and often vague nature of the underlying information need, on the other. These problems can be addressed by improving the interaction between the user and the system. In this paper, we sketch the development of CBIR interfaces, and introduce our view on how to solve some of the problems of the studied interfaces. To address the semantic gap and long-term multifaceted information needs, we propose a "retrieval in context" system. EGO is a tool for the management of image collections, supporting the user through personalisation and adaptation. We will describe how it learns from the user's personal organisation, allowing it to recommend relevant images to the user. The recommendation algorithm is detailed, which is based on relevance feedback techniques
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Towards memory supporting personal information management tools
In this article we discuss re-retrieving personal information objects and relate the task to recovering from lapse(s) in memory. We propose that fundamentally it is lapses in memory that impede users from successfully re-finding the information they need. Our hypothesis is that by learning more about memory lapses in non-computing contexts and how people cope and recover from these lapses, we can better inform the design of PIM tools and improve the user's ability to re-access and re-use objects. We describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, we present a series of principles that we hypothesize will improve the design of personal information management tools. This hypothesis is validated by an evaluation of a tool for managing personal photographs, which was designed with respect to our findings. The evaluation suggests that users' performance when re-finding objects can be improved by building personal information management tools to support characteristics of human memory
On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow
Abundant data is the key to successful machine learning. However, supervised
learning requires annotated data that are often hard to obtain. In a
classification task with limited resources, Active Learning (AL) promises to
guide annotators to examples that bring the most value for a classifier. AL can
be successfully combined with self-training, i.e., extending a training set
with the unlabelled examples for which a classifier is the most certain. We
report our experiences on using AL in a systematic manner to train an SVM
classifier for Stack Overflow posts discussing performance of software
components. We show that the training examples deemed as the most valuable to
the classifier are also the most difficult for humans to annotate. Despite
carefully evolved annotation criteria, we report low inter-rater agreement, but
we also propose mitigation strategies. Finally, based on one annotator's work,
we show that self-training can improve the classification accuracy. We conclude
the paper by discussing implication for future text miners aspiring to use AL
and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International
Conference on Evaluation and Assessment in Software Engineering, 201
A study of existing Ontologies in the IoT-domain
Several domains have adopted the increasing use of IoT-based devices to
collect sensor data for generating abstractions and perceptions of the real
world. This sensor data is multi-modal and heterogeneous in nature. This
heterogeneity induces interoperability issues while developing cross-domain
applications, thereby restricting the possibility of reusing sensor data to
develop new applications. As a solution to this, semantic approaches have been
proposed in the literature to tackle problems related to interoperability of
sensor data. Several ontologies have been proposed to handle different aspects
of IoT-based sensor data collection, ranging from discovering the IoT sensors
for data collection to applying reasoning on the collected sensor data for
drawing inferences. In this paper, we survey these existing semantic ontologies
to provide an overview of the recent developments in this field. We highlight
the fundamental ontological concepts (e.g., sensor-capabilities and
context-awareness) required for an IoT-based application, and survey the
existing ontologies which include these concepts. Based on our study, we also
identify the shortcomings of currently available ontologies, which serves as a
stepping stone to state the need for a common unified ontology for the IoT
domain.Comment: Submitted to Elsevier JWS SI on Web semantics for the Internet/Web of
Thing
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
- …