8,264 research outputs found
Ranking-based Deep Cross-modal Hashing
Cross-modal hashing has been receiving increasing interests for its low
storage cost and fast query speed in multi-modal data retrievals. However, most
existing hashing methods are based on hand-crafted or raw level features of
objects, which may not be optimally compatible with the coding process.
Besides, these hashing methods are mainly designed to handle simple pairwise
similarity. The complex multilevel ranking semantic structure of instances
associated with multiple labels has not been well explored yet. In this paper,
we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH
firstly uses the feature and label information of data to derive a
semi-supervised semantic ranking list. Next, to expand the semantic
representation power of hand-crafted features, RDCMH integrates the semantic
ranking information into deep cross-modal hashing and jointly optimizes the
compatible parameters of deep feature representations and of hashing functions.
Experiments on real multi-modal datasets show that RDCMH outperforms other
competitive baselines and achieves the state-of-the-art performance in
cross-modal retrieval applications
Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
Bipartite ranking is a fundamental ranking problem that learns to order
relevant instances ahead of irrelevant ones. The pair-wise approach for
bi-partite ranking construct a quadratic number of pairs to solve the problem,
which is infeasible for large-scale data sets. The point-wise approach, albeit
more efficient, often results in inferior performance. That is, it is difficult
to conduct bipartite ranking accurately and efficiently at the same time. In
this paper, we develop a novel active sampling scheme within the pair-wise
approach to conduct bipartite ranking efficiently. The scheme is inspired from
active learning and can reach a competitive ranking performance while focusing
only on a small subset of the many pairs during training. Moreover, we propose
a general Combined Ranking and Classification (CRC) framework to accurately
conduct bipartite ranking. The framework unifies point-wise and pair-wise
approaches and is simply based on the idea of treating each instance point as a
pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that
the proposed algorithm of Active Sampling within CRC, when coupled with a
linear Support Vector Machine, usually outperforms state-of-the-art point-wise
and pair-wise ranking approaches in terms of both accuracy and efficiency.Comment: a shorter version was presented in ACML 201
Knowledge Graph Embedding with Iterative Guidance from Soft Rules
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of
current research. Combining such an embedding model with logic rules has
recently attracted increasing attention. Most previous attempts made a one-time
injection of logic rules, ignoring the interactive nature between embedding
learning and logical inference. And they focused only on hard rules, which
always hold with no exception and usually require extensive manual effort to
create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a
novel paradigm of KG embedding with iterative guidance from soft rules. RUGE
enables an embedding model to learn simultaneously from 1) labeled triples that
have been directly observed in a given KG, 2) unlabeled triples whose labels
are going to be predicted iteratively, and 3) soft rules with various
confidence levels extracted automatically from the KG. In the learning process,
RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and
integrates such newly labeled triples to update the embedding model. Through
this iterative procedure, knowledge embodied in logic rules may be better
transferred into the learned embeddings. We evaluate RUGE in link prediction on
Freebase and YAGO. Experimental results show that: 1) with rule knowledge
injected iteratively, RUGE achieves significant and consistent improvements
over state-of-the-art baselines; and 2) despite their uncertainties,
automatically extracted soft rules are highly beneficial to KG embedding, even
those with moderate confidence levels. The code and data used for this paper
can be obtained from https://github.com/iieir-km/RUGE.Comment: To appear in AAAI 201
ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment
Recruitment of appropriate people for certain positions is critical for any
companies or organizations. Manually screening to select appropriate candidates
from large amounts of resumes can be exhausted and time-consuming. However,
there is no public tool that can be directly used for automatic resume quality
assessment (RQA). This motivates us to develop a method for automatic RQA.
Since there is also no public dataset for model training and evaluation, we
build a dataset for RQA by collecting around 10K resumes, which are provided by
a private resume management company. By investigating the dataset, we identify
some factors or features that could be useful to discriminate good resumes from
bad ones, e.g., the consistency between different parts of a resume. Then a
neural-network model is designed to predict the quality of each resume, where
some text processing techniques are incorporated. To deal with the label
deficiency issue in the dataset, we propose several variants of the model by
either utilizing the pair/triplet-based loss, or introducing some
semi-supervised learning technique to make use of the abundant unlabeled data.
Both the presented baseline model and its variants are general and easy to
implement. Various popular criteria including the receiver operating
characteristic (ROC) curve, F-measure and ranking-based average precision (AP)
are adopted for model evaluation. We compare the different variants with our
baseline model. Since there is no public algorithm for RQA, we further compare
our results with those obtained from a website that can score a resume.
Experimental results in terms of different criteria demonstrate the
effectiveness of the proposed method. We foresee that our approach would
transform the way of future human resources management.Comment: ICD
Cross-domain sentiment classification using a sentiment sensitive thesaurus
Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods. We conduct an extensive empirical analysis of the proposed method on single and multi-source domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus
- …