92 research outputs found
Head-tracked stereo viewing with two-handed 3D interaction for animated character construction
In this paper, we demonstrate a new interactive 3D desktop metaphor based on two-handed 3D direct manipulation registered with head-tracked stereo viewing. In our configuration, a six-degree-of-freedom head-tracker and CrystalEyes shutter glasses are used to produce stereo images that dynamically follow the user head motion. 3D virtual objects can be made to appear at a fixed location in physical space which the user may view from different angles by moving his head. The user interacts with the simulated 3D environment using both hands simultaneously. The left hand, controlling a Spaceball, is used for 3D navigation and object movement, while the right hand, holding a 3D mouse, is used to manipulate through a virtual tool metaphor, the objects appearing in front of the screen because of negative parallax. In this way, both incremental and absolute interactive input techniques are provided by the system. Hand-eye coordination is made possible by registration between virtual and physical space, allowing a variety of complex 3D tasks to be performed more easily and more rapidly than is possible using traditional interactive techniques. The system has been tested using both Polhemus Fastrak and Logitech ultrasonic input devices for tracking the head and 3D mouse.197-206Pubblicat
TREC Incident Streams: Finding Actionable Information on Social Media
The Text Retrieval Conference (TREC) Incident Streams track is a new initiative that aims to mature social
media-based emergency response technology. This initiative advances the state of the art in this area through an
evaluation challenge, which attracts researchers and developers from across the globe. The 2018 edition of the track
provides a standardized evaluation methodology, an ontology of emergency-relevant social media information types,
proposes a scale for information criticality, and releases a dataset containing fifteen test events and approximately
20,000 labeled tweets. Analysis of this dataset reveals a significant amount of actionable information on social
media during emergencies (> 10%). While this data is valuable for emergency response efforts, analysis of the
39 state-of-the-art systems demonstrate a performance gap in identifying this data. We therefore find the current
state-of-the-art is insufficient for emergency respondersā requirements, particularly for rare actionable information
for which there is little prior training data available
Incident Streams 2019: Actionable Insights and How to Find Them
The ubiquity of mobile internet-enabled devices combined with wide-spread social media use during emergencies is posing new challenges for response personnel. In particular, service operators are now expected to monitor these online channels to extract actionable insights and answer questions from the public. A lack of adequate tools makes this monitoring impractical at the scale of many emergencies. The TREC Incident Streams (TREC-IS) track drives research into solving this technology gap by bringing together academia and industry to develop techniques for extracting actionable insights from social media streams during emergencies. This paper covers the second year of TREC-IS, hosted in 2019 with two editions, 2019-A and 2019-B, contributing 12 new events and approximately 20,000 new tweets across 25 information categories, with 15 research groups participating across the world. This paper provides an overview of these new editions, actionable insights from data labelling, and the automated techniques employed by participant systems that appear most effective
TREC Incident Streams: Finding Actionable Information on Social Media
The Text Retrieval Conference (TREC) Incident Streams track is a new initiative that aims to mature social
media-based emergency response technology. This initiative advances the state of the art in this area through an
evaluation challenge, which attracts researchers and developers from across the globe. The 2018 edition of the track
provides a standardized evaluation methodology, an ontology of emergency-relevant social media information types,
proposes a scale for information criticality, and releases a dataset containing fifteen test events and approximately
20,000 labeled tweets. Analysis of this dataset reveals a significant amount of actionable information on social
media during emergencies (> 10%). While this data is valuable for emergency response efforts, analysis of the
39 state-of-the-art systems demonstrate a performance gap in identifying this data. We therefore find the current
state-of-the-art is insufficient for emergency respondersā requirements, particularly for rare actionable information
for which there is little prior training data available
HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do
In this paper we propose a new evaluation challenge and direction in the area
of High-level Video Understanding. The challenge we are proposing is designed
to test automatic video analysis and understanding, and how accurately systems
can comprehend a movie in terms of actors, entities, events and their
relationship to each other. A pilot High-Level Video Understanding (HLVU)
dataset of open source movies were collected for human assessors to build a
knowledge graph representing each of them. A set of queries will be derived
from the knowledge graph to test systems on retrieving relationships among
actors, as well as reasoning and retrieving non-visual concepts. The objective
is to benchmark if a computer system can "understand" non-explicit but obvious
relationships the same way humans do when they watch the same movies. This is
long-standing problem that is being addressed in the text domain and this
project moves similar research to the video domain. Work of this nature is
foundational to future video analytics and video understanding technologies.
This work can be of interest to streaming services and broadcasters hoping to
provide more intuitive ways for their customers to interact with and consume
video content
Knowledge-aware Complementary Product Representation Learning
Learning product representations that reflect complementary relationship
plays a central role in e-commerce recommender system. In the absence of the
product relationships graph, which existing methods rely on, there is a need to
detect the complementary relationships directly from noisy and sparse customer
purchase activities. Furthermore, unlike simple relationships such as
similarity, complementariness is asymmetric and non-transitive. Standard usage
of representation learning emphasizes on only one set of embedding, which is
problematic for modelling such properties of complementariness. We propose
using knowledge-aware learning with dual product embedding to solve the above
challenges. We encode contextual knowledge into product representation by
multi-task learning, to alleviate the sparsity issue. By explicitly modelling
with user bias terms, we separate the noise of customer-specific preferences
from the complementariness. Furthermore, we adopt the dual embedding framework
to capture the intrinsic properties of complementariness and provide geometric
interpretation motivated by the classic separating hyperplane theory. Finally,
we propose a Bayesian network structure that unifies all the components, which
also concludes several popular models as special cases. The proposed method
compares favourably to state-of-art methods, in downstream classification and
recommendation tasks. We also develop an implementation that scales efficiently
to a dataset with millions of items and customers
Separate and Attend in Personal Email Search
In personal email search, user queries often impose different requirements on
different aspects of the retrieved emails. For example, the query "my recent
flight to the US" requires emails to be ranked based on both textual contents
and recency of the email documents, while other queries such as "medical
history" do not impose any constraints on the recency of the email. Recent deep
learning-to-rank models for personal email search often directly concatenate
dense numerical features (e.g., document age) with embedded sparse features
(e.g., n-gram embeddings). In this paper, we first show with a set of
experiments on synthetic datasets that direct concatenation of dense and sparse
features does not lead to the optimal search performance of deep neural ranking
models. To effectively incorporate both sparse and dense email features into
personal email search ranking, we propose a novel neural model, SepAttn.
SepAttn first builds two separate neural models to learn from sparse and dense
features respectively, and then applies an attention mechanism at the
prediction level to derive the final prediction from these two models. We
conduct a comprehensive set of experiments on a large-scale email search
dataset, and demonstrate that our SepAttn model consistently improves the
search quality over the baseline models.Comment: WSDM 202
On enhancing the robustness of timeline summarization test collections
Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a āgoodā summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%
Evaluating epistemic uncertainty under incomplete assessments
The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison
- ā¦