11,156 research outputs found
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content
In this work we present a taxonomy of error categories for lexical normalization, which is the task of translating user generated content to canonical language. We annotate a recent normalization dataset to test the practical use of the taxonomy and read a near-perfect agreement. This annotated dataset is then used to evaluate how an existing normalization model performs on the different categories of the taxonomy. The results of this evaluation reveal that some of the problematic categories only include minor transformations, whereas most regular transformations are solved quite well
Generating ordered list of Recommended Items: a Hybrid Recommender System of Microblog
Precise recommendation of followers helps in improving the user experience
and maintaining the prosperity of twitter and microblog platforms. In this
paper, we design a hybrid recommender system of microblog as a solution of KDD
Cup 2012, track 1 task, which requires predicting users a user might follow in
Tencent Microblog. We describe the background of the problem and present the
algorithm consisting of keyword analysis, user taxonomy, (potential)interests
extraction and item recommendation. Experimental result shows the high
performance of our algorithm. Some possible improvements are discussed, which
leads to further study.Comment: 7 page
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
Exploring The Value Of Folksonomies For Creating Semantic Metadata
Finding good keywords to describe resources is an on-going problem: typically we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well populated source of unstructured tags describing web resources. This paper explores the value of the folksonomy tags as potential source of keyword metadata by examining the relationship between folksonomies, community produced annotations, and keywords extracted by machines. The experiment has been carried-out in two ways: subjectively, by asking two human indexers to evaluate the quality of the generated keywords from both systems; and automatically, by measuring the percentage of overlap between the folksonomy set and machine generated keywords set. The results of this experiment show that the folksonomy tags agree more closely with the human generated keywords than those automatically generated. The results also showed that the trained indexers preferred the semantics of folksonomy tags compared to keywords extracted automatically. These results can be considered as evidence for the strong relationship of folksonomies to the human indexer’s mindset, demonstrating that folksonomies used in the del.icio.us bookmarking service are a potential source for generating semantic metadata to annotate web resources
A Survey on Retrieval of Mathematical Knowledge
We present a short survey of the literature on indexing and retrieval of
mathematical knowledge, with pointers to 72 papers and tentative taxonomies of
both retrieval problems and recurring techniques.Comment: CICM 2015, 20 page
Learning Graph Embeddings from WordNet-based Similarity Measures
We present path2vec, a new approach for learning graph embeddings that relies
on structural measures of pairwise node similarities. The model learns
representations for nodes in a dense space that approximate a given
user-defined graph distance measure, such as e.g. the shortest path distance or
distance measures that take information beyond the graph structure into
account. Evaluation of the proposed model on semantic similarity and word sense
disambiguation tasks, using various WordNet-based similarity measures, show
that our approach yields competitive results, outperforming strong graph
embedding baselines. The model is computationally efficient, being orders of
magnitude faster than the direct computation of graph-based distances.Comment: Accepted to StarSem 201
- …