9,409 research outputs found
Preparing, restructuring, and augmenting a French treebank: lexicalised parsers or coherent treebanks?
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from the Paris 7 Treebank
(P7T), which is cleaner, more coherent, has several transformed structures, and introduces new linguistic analyses. To determine the effect of these changes, we
investigate how theMFT fares in statistical parsing. Probabilistic parsers trained on the MFT training set (currently 3800 trees) already perform better than their counterparts trained on five times the P7T data (18,548 trees), providing an extreme example of the importance of data quality over quantity in statistical parsing. Moreover,
regression analysis on the learning curve of parsers trained on the MFT lead to the prediction that parsers trained on the full projected 18,548 tree MFT training set
will far outscore their counterparts trained on the full P7T. These analyses also show how problematic data can lead to problematic conclusions–in particular, we find that
lexicalisation in the probabilistic parsing of French is probably not as crucial as was once thought (Arun and Keller (2005))
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Tagging news articles or blog posts with relevant tags from a collection of
predefined ones is coined as document tagging in this work. Accurate tagging of
articles can benefit several downstream applications such as recommendation and
search. In this work, we propose a novel yet simple approach called DocTag2Vec
to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two
popular models for learning distributed representation of words and documents.
In DocTag2Vec, we simultaneously learn the representation of words, documents,
and tags in a joint vector space during training, and employ the simple
-nearest neighbor search to predict tags for unseen documents. In contrast
to previous multi-label learning methods, DocTag2Vec directly deals with raw
text instead of provided feature vector, and in addition, enjoys advantages
like the learning of tag representation, and the ability of handling newly
created tags. To demonstrate the effectiveness of our approach, we conduct
experiments on several datasets and show promising results against
state-of-the-art methods.Comment: 10 page
Towards Understanding User Preferences from User Tagging Behavior for Personalization
Personalizing image tags is a relatively new and growing area of research,
and in order to advance this research community, we must review and challenge
the de-facto standard of defining tag importance. We believe that for greater
progress to be made, we must go beyond tags that merely describe objects that
are visually represented in the image, towards more user-centric and subjective
notions such as emotion, sentiment, and preferences.
We focus on the notion of user preferences and show that the order that users
list tags on images is correlated to the order of preference over the tags that
they provided for the image. While this observation is not completely
surprising, to our knowledge, we are the first to explore this aspect of user
tagging behavior systematically and report empirical results to support this
observation. We argue that this observation can be exploited to help advance
the image tagging (and related) communities.
Our contributions include: 1.) conducting a user study demonstrating this
observation, 2.) collecting a dataset with user tag preferences explicitly
collected.Comment: 6 page
- …