3,282 research outputs found
Document Summarization Using NMF and Pseudo Relevance Feedback Based on K-Means Clustering
According to the increment of accessible text data source on the internet, it has increased the necessity of the automatic text document summarization. However, the performance of the automatic methods might be poor because the semantic gap between high level user's summary requirement and low level vector representation of machine exists. In this paper, to overcome that problem, we propose a new document summarization method using a pseudo relevance feedback based on clustering method and NMF (non-negative matrix factorization). Relevance feedback is effective technique to minimize the semantic gap of information processing, but the general relevance feedback needs an intervention of a user. Additionally, the refined query without user interference by pseudo relevance feedback may be biased. The proposed method provides an automatic relevance judgment to reformulate query using the clustering method for minimizing a bias of query expansion. The method also can improve the quality of document summarization since the summarized documents are influenced by the semantic features of documents and the expanded query. The experimental results demonstrate that the proposed method achieves better performance than the other document summarization methods
Relevance-based Word Embedding
Learning a high-dimensional dense representation for vocabulary terms, also
known as a word embedding, has recently attracted much attention in natural
language processing and information retrieval tasks. The embedding vectors are
typically learned based on term proximity in a large corpus. This means that
the objective in well-known word embedding algorithms, e.g., word2vec, is to
accurately predict adjacent word(s) for a given word or context. However, this
objective is not necessarily equivalent to the goal of many information
retrieval (IR) tasks. The primary objective in various IR tasks is to capture
relevance instead of term proximity, syntactic, or even semantic similarity.
This is the motivation for developing unsupervised relevance-based word
embedding models that learn word representations based on query-document
relevance information. In this paper, we propose two learning models with
different objective functions; one learns a relevance distribution over the
vocabulary set for each query, and the other classifies each term as belonging
to the relevant or non-relevant class for each query. To train our models, we
used over six million unique queries and the top ranked documents retrieved in
response to each query, which are assumed to be relevant to the query. We
extrinsically evaluate our learned word representation models using two IR
tasks: query expansion and query classification. Both query expansion
experiments on four TREC collections and query classification experiments on
the KDD Cup 2005 dataset suggest that the relevance-based word embedding models
significantly outperform state-of-the-art proximity-based embedding models,
such as word2vec and GloVe.Comment: to appear in the proceedings of The 40th International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR '17
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde
Sparse Transfer Learning for Interactive Video Search Reranking
Visual reranking is effective to improve the performance of the text-based
video search. However, existing reranking algorithms can only achieve limited
improvement because of the well-known semantic gap between low level visual
features and high level semantic concepts. In this paper, we adopt interactive
video search reranking to bridge the semantic gap by introducing user's
labeling effort. We propose a novel dimension reduction tool, termed sparse
transfer learning (STL), to effectively and efficiently encode user's labeling
information. STL is particularly designed for interactive video search
reranking. Technically, it a) considers the pair-wise discriminative
information to maximally separate labeled query relevant samples from labeled
query irrelevant ones, b) achieves a sparse representation for the subspace to
encodes user's intention by applying the elastic net penalty, and c) propagates
user's labeling information from labeled samples to unlabeled samples by using
the data distribution knowledge. We conducted extensive experiments on the
TRECVID 2005, 2006 and 2007 benchmark datasets and compared STL with popular
dimension reduction algorithms. We report superior performance by using the
proposed STL based interactive video search reranking.Comment: 17 page
Frequency response modeling and control of flexible structures: Computational methods
The dynamics of vibrations in flexible structures can be conventiently modeled in terms of frequency response models. For structural control such models capture the distributed parameter dynamics of the elastic structural response as an irrational transfer function. For most flexible structures arising in aerospace applications the irrational transfer functions which arise are of a special class of pseudo-meromorphic functions which have only a finite number of right half place poles. Computational algorithms are demonstrated for design of multiloop control laws for such models based on optimal Wiener-Hopf control of the frequency responses. The algorithms employ a sampled-data representation of irrational transfer functions which is particularly attractive for numerical computation. One key algorithm for the solution of the optimal control problem is the spectral factorization of an irrational transfer function. The basis for the spectral factorization algorithm is highlighted together with associated computational issues arising in optimal regulator design. Options for implementation of wide band vibration control for flexible structures based on the sampled-data frequency response models is also highlighted. A simple flexible structure control example is considered to demonstrate the combined frequency response modeling and control algorithms
Joint Topic-Semantic-aware Social Recommendation for Online Voting
Online voting is an emerging feature in social networks, in which users can
express their attitudes toward various issues and show their unique interest.
Online voting imposes new challenges on recommendation, because the propagation
of votings heavily depends on the structure of social networks as well as the
content of votings. In this paper, we investigate how to utilize these two
factors in a comprehensive manner when doing voting recommendation. First, due
to the fact that existing text mining methods such as topic model and semantic
model cannot well process the content of votings that is typically short and
ambiguous, we propose a novel Topic-Enhanced Word Embedding (TEWE) method to
learn word and document representation by jointly considering their topics and
semantics. Then we propose our Joint Topic-Semantic-aware social Matrix
Factorization (JTS-MF) model for voting recommendation. JTS-MF model calculates
similarity among users and votings by combining their TEWE representation and
structural information of social networks, and preserves this
topic-semantic-social similarity during matrix factorization. To evaluate the
performance of TEWE representation and JTS-MF model, we conduct extensive
experiments on real online voting dataset. The results prove the efficacy of
our approach against several state-of-the-art baselines.Comment: The 26th ACM International Conference on Information and Knowledge
Management (CIKM 2017
- …