2,085 research outputs found
IBVS — Novel Features of a Small OA Astronomical Journal
The Information Bulletin on Variable Stars (IBVS) is a small, specialized astronomical journal. It has served the variable star community since 1961. An Open Access electronic version was started in 1994. This electronic version offers innovative services to the reader: the use of third-party tools for visualization (Aladin) and third-party name resolution services (SIMBAD or GCVS for objects, and ADS for author names) for search. Considerable efforts have been made to interconnect the journal with other electronic resources such as publications, databases, and archives, like CDS, ADS, GCVS, NED, WFPDB and WEBDA. Additional aspects of this small electronic journal to be discussed are: archiving policies, copyrights and the use of OAI-PMH
Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)
Automatic Discovery and Ranking of Synonyms for Search Keywords in the Web
Search engines are an indispensable part of a web user's life. A vast majority of these web users experience difficulties caused by the keyword-based search engines such as inaccurate results for queries and irrelevant URLs even though the given keyword is present in them. Also, relevant URLs may be lost as they may have the synonym of the keyword and not the original one. This condition is known as the polysemy problem. To alleviate these problems, we propose an algorithm called automatic discovery and ranking of synonyms for search keywords in the web (ADRS). The proposed method generates a list of candidate synonyms for individual keywords by employing the relevance factor of the URLs associated with the synonyms. Then, ranking of these candidate synonyms is done using co-occurrence frequencies and various page count-based measures. One of the major advantages of our algorithm is that it is highly scalable which makes it applicable to online data on the dynamic, domain-independent and unstructured World Wide Web. The experimental results show that the best results are obtained using the proposed algorithm with WebJaccard
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
An Army of Me: Sockpuppets in Online Discussion Communities
In online discussion communities, users can interact and share information
and opinions on a wide variety of topics. However, some users may create
multiple identities, or sockpuppets, and engage in undesired behavior by
deceiving others or manipulating discussions. In this work, we study
sockpuppetry across nine discussion communities, and show that sockpuppets
differ from ordinary users in terms of their posting behavior, linguistic
traits, as well as social network structure. Sockpuppets tend to start fewer
discussions, write shorter posts, use more personal pronouns such as "I", and
have more clustered ego-networks. Further, pairs of sockpuppets controlled by
the same individual are more likely to interact on the same discussion at the
same time than pairs of ordinary users. Our analysis suggests a taxonomy of
deceptive behavior in discussion communities. Pairs of sockpuppets can vary in
their deceptiveness, i.e., whether they pretend to be different users, or their
supportiveness, i.e., if they support arguments of other sockpuppets controlled
by the same user. We apply these findings to a series of prediction tasks,
notably, to identify whether a pair of accounts belongs to the same underlying
user or not. Altogether, this work presents a data-driven view of deception in
online discussion communities and paves the way towards the automatic detection
of sockpuppets.Comment: 26th International World Wide Web conference 2017 (WWW 2017
- …