16,771 research outputs found
Learning Visual Features from Snapshots for Web Search
When applying learning to rank algorithms to Web search, a large number of
features are usually designed to capture the relevance signals. Most of these
features are computed based on the extracted textual elements, link analysis,
and user logs. However, Web pages are not solely linked texts, but have
structured layout organizing a large variety of elements in different styles.
Such layout itself can convey useful visual information, indicating the
relevance of a Web page. For example, the query-independent layout (i.e., raw
page layout) can help identify the page quality, while the query-dependent
layout (i.e., page rendered with matched query words) can further tell rich
structural information (e.g., size, position and proximity) of the matching
signals. However, such visual information of layout has been seldom utilized in
Web search in the past. In this work, we propose to learn rich visual features
automatically from the layout of Web pages (i.e., Web page snapshots) for
relevance ranking. Both query-independent and query-dependent snapshots are
considered as the new inputs. We then propose a novel visual perception model
inspired by human's visual search behaviors on page viewing to extract the
visual features. This model can be learned end-to-end together with traditional
human-crafted features. We also show that such visual features can be
efficiently acquired in the online setting with an extended inverted indexing
scheme. Experiments on benchmark collections demonstrate that learning visual
features from Web page snapshots can significantly improve the performance of
relevance ranking in ad-hoc Web retrieval tasks.Comment: CIKM 201
Predicting Rising Follower Counts on Twitter Using Profile Information
When evaluating the cause of one's popularity on Twitter, one thing is
considered to be the main driver: Many tweets. There is debate about the kind
of tweet one should publish, but little beyond tweets. Of particular interest
is the information provided by each Twitter user's profile page. One of the
features are the given names on those profiles. Studies on psychology and
economics identified correlations of the first name to, e.g., one's school
marks or chances of getting a job interview in the US. Therefore, we are
interested in the influence of those profile information on the follower count.
We addressed this question by analyzing the profiles of about 6 Million Twitter
users. All profiles are separated into three groups: Users that have a first
name, English words, or neither of both in their name field. The assumption is
that names and words influence the discoverability of a user and subsequently
his/her follower count. We propose a classifier that labels users who will
increase their follower count within a month by applying different models based
on the user's group. The classifiers are evaluated with the area under the
receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy,
NY, US
Deep Multi-view Learning to Rank
We study the problem of learning to rank from multiple information sources.
Though multi-view learning and learning to rank have been studied extensively
leading to a wide range of applications, multi-view learning to rank as a
synergy of both topics has received little attention. The aim of the paper is
to propose a composite ranking method while keeping a close correlation with
the individual rankings simultaneously. We present a generic framework for
multi-view subspace learning to rank (MvSL2R), and two novel solutions are
introduced under the framework. The first solution captures information of
feature mappings from within each view as well as across views using
autoencoder-like networks. Novel feature embedding methods are formulated in
the optimization of multi-view unsupervised and discriminant autoencoders.
Moreover, we introduce an end-to-end solution to learning towards both the
joint ranking objective and the individual rankings. The proposed solution
enhances the joint ranking with minimum view-specific ranking loss, so that it
can achieve the maximum global view agreements in a single optimization
process. The proposed method is evaluated on three different ranking problems,
i.e. university ranking, multi-view lingual text ranking and image data
ranking, providing superior results compared to related methods.Comment: Published at IEEE TKD
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
- …