14,828 research outputs found
How Part-of-Speech Tags Affect Text Retrieval and Filtering Performance
Natural language processing (NLP) applied to information retrieval (IR) and
filtering problems may assign part-of-speech tags to terms and, more generally,
modify queries and documents. Analytic models can predict the performance of a
text filtering system as it incorporates changes suggested by NLP, allowing us
to make precise statements about the average effect of NLP operations on IR.
Here we provide a model of retrieval and tagging that allows us to both compute
the performance change due to syntactic parsing and to allow us to understand
what factors affect performance and how. In addition to a prediction of
performance with tags, upper and lower bounds for retrieval performance are
derived, giving the best and worst effects of including part-of-speech tags.
Empirical grounds for selecting sets of tags are considered.Comment: uuencoded and compressed postscrip
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
Exploratory Analysis of Highly Heterogeneous Document Collections
We present an effective multifaceted system for exploratory analysis of
highly heterogeneous document collections. Our system is based on intelligently
tagging individual documents in a purely automated fashion and exploiting these
tags in a powerful faceted browsing framework. Tagging strategies employed
include both unsupervised and supervised approaches based on machine learning
and natural language processing. As one of our key tagging strategies, we
introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
KERA extracts topic-representative terms from individual documents in a purely
unsupervised fashion and is revealed to be significantly more effective than
state-of-the-art methods. Finally, we evaluate our system in its ability to
help users locate documents pertaining to military critical technologies buried
deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery
and Data Minin
Data-driven Job Search Engine Using Skills and Company Attribute Filters
According to a report online, more than 200 million unique users search for
jobs online every month. This incredibly large and fast growing demand has
enticed software giants such as Google and Facebook to enter this space, which
was previously dominated by companies such as LinkedIn, Indeed and
CareerBuilder. Recently, Google released their "AI-powered Jobs Search Engine",
"Google For Jobs" while Facebook released "Facebook Jobs" within their
platform. These current job search engines and platforms allow users to search
for jobs based on general narrow filters such as job title, date posted,
experience level, company and salary. However, they have severely limited
filters relating to skill sets such as C++, Python, and Java and company
related attributes such as employee size, revenue, technographics and
micro-industries. These specialized filters can help applicants and companies
connect at a very personalized, relevant and deeper level. In this paper we
present a framework that provides an end-to-end "Data-driven Jobs Search
Engine". In addition, users can also receive potential contacts of recruiters
and senior positions for connection and networking opportunities. The high
level implementation of the framework is described as follows: 1) Collect job
postings data in the United States, 2) Extract meaningful tokens from the
postings data using ETL pipelines, 3) Normalize the data set to link company
names to their specific company websites, 4) Extract and ranking the skill
sets, 5) Link the company names and websites to their respective company level
attributes with the EVERSTRING Company API, 6) Run user-specific search queries
on the database to identify relevant job postings and 7) Rank the job search
results. This framework offers a highly customizable and highly targeted search
experience for end users.Comment: 8 pages, 10 figures, ICDM 201
Recommendation, collaboration and social search
This chapter considers the social component of interactive information retrieval: what is the role of other people in searching and browsing? For simplicity we begin by considering situations without computers. After all, you can interactively retrieve information without a computer; you just have to interact with someone or something else. Such an analysis can then help us think about the new forms of collaborative interactions that extend our conceptions of information search, made possible by the growth of networked ubiquitous computing technology.
Information searching and browsing have often been conceptualized as a solitary activity, however they always have a social component. We may talk about 'the' searcher or 'the' user of a database or information resource. Our focus may be on individual uses and our research may look at individual users. Our experiments may be designed to observe the behaviors of individual subjects. Our models and theories derived from our empirical analyses may focus substantially or exclusively on an individual's evolving goals, thoughts, beliefs, emotions and actions. Nevertheless there are always social aspects of information seeking and use present, both implicitly and explicitly.
We start by summarizing some of the history of information access with an emphasis on social and collaborative interactions. Then we look at the nature of recommendations, social search and interfaces to support collaboration between information seekers. Following this we consider how the design of interactive information systems is influenced by their social elements
A Bayesian Approach toward Active Learning for Collaborative Filtering
Collaborative filtering is a useful technique for exploiting the preference
patterns of a group of users to predict the utility of items for the active
user. In general, the performance of collaborative filtering depends on the
number of rated examples given by the active user. The more the number of rated
examples given by the active user, the more accurate the predicted ratings will
be. Active learning provides an effective way to acquire the most informative
rated examples from active users. Previous work on active learning for
collaborative filtering only considers the expected loss function based on the
estimated model, which can be misleading when the estimated model is
inaccurate. This paper takes one step further by taking into account of the
posterior distribution of the estimated model, which results in more robust
active learning algorithm. Empirical studies with datasets of movie ratings
show that when the number of ratings from the active user is restricted to be
small, active learning methods only based on the estimated model don't perform
well while the active learning method using the model distribution achieves
substantially better performance.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
- …