5,357 research outputs found
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
Automated construction and analysis of political networks via open government and media sources
We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network.Peer ReviewedPostprint (author's final draft
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Visual object category discovery in images and videos
textThe current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori.
I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization.
To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions.
I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision.Electrical and Computer Engineerin
- …