3,110 research outputs found
Ranking News-Quality Multimedia
News editors need to find the photos that best illustrate a news piece and
fulfill news-media quality standards, while being pressed to also find the most
recent photos of live events. Recently, it became common to use social-media
content in the context of news media for its unique value in terms of immediacy
and quality. Consequently, the amount of images to be considered and filtered
through is now too much to be handled by a person. To aid the news editor in
this process, we propose a framework designed to deliver high-quality,
news-press type photos to the user. The framework, composed of two parts, is
based on a ranking algorithm tuned to rank professional media highly and a
visual SPAM detection module designed to filter-out low-quality media. The core
ranking algorithm is leveraged by aesthetic, social and deep-learning semantic
features. Evaluation showed that the proposed framework is effective at finding
high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and
a classification precision of 70%.Comment: To appear in ICMR'1
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
Next Generation of Product Search and Discovery
Online shopping has become an important part of people’s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumers’ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize users’ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users.
This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the user’s overhead in locating the information of value is reduced, and the user’s experience of seeking for useful product information is optimized
Search Behavior Prediction: A Hypergraph Perspective
Although the bipartite shopping graphs are straightforward to model search
behavior, they suffer from two challenges: 1) The majority of items are
sporadically searched and hence have noisy/sparse query associations, leading
to a \textit{long-tail} distribution. 2) Infrequent queries are more likely to
link to popular items, leading to another hurdle known as
\textit{disassortative mixing}. To address these two challenges, we go beyond
the bipartite graph to take a hypergraph perspective, introducing a new
paradigm that leverages \underline{auxiliary} information from anonymized
customer engagement sessions to assist the \underline{main task} of query-item
link prediction. This auxiliary information is available at web scale in the
form of search logs. We treat all items appearing in the same customer session
as a single hyperedge. The hypothesis is that items in a customer session are
unified by a common shopping interest. With these hyperedges, we augment the
original bipartite graph into a new \textit{hypergraph}. We develop a
\textit{\textbf{D}ual-\textbf{C}hannel \textbf{A}ttention-Based
\textbf{H}ypergraph Neural Network} (\textbf{DCAH}), which synergizes
information from two potentially noisy sources (original query-item edges and
item-item hyperedges). In this way, items on the tail are better connected due
to the extra hyperedges, thereby enhancing their link prediction performance.
We further integrate DCAH with self-supervised graph pre-training and/or
DropEdge training, both of which effectively alleviate disassortative mixing.
Extensive experiments on three proprietary E-Commerce datasets show that DCAH
yields significant improvements of up to \textbf{24.6\% in mean reciprocal rank
(MRR)} and \textbf{48.3\% in recall} compared to GNN-based baselines. Our
source code is available at
\url{https://github.com/amazon-science/dual-channel-hypergraph-neural-network}.Comment: WSDM 202
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
- …