169,649 research outputs found
An Army of Me: Sockpuppets in Online Discussion Communities
In online discussion communities, users can interact and share information
and opinions on a wide variety of topics. However, some users may create
multiple identities, or sockpuppets, and engage in undesired behavior by
deceiving others or manipulating discussions. In this work, we study
sockpuppetry across nine discussion communities, and show that sockpuppets
differ from ordinary users in terms of their posting behavior, linguistic
traits, as well as social network structure. Sockpuppets tend to start fewer
discussions, write shorter posts, use more personal pronouns such as "I", and
have more clustered ego-networks. Further, pairs of sockpuppets controlled by
the same individual are more likely to interact on the same discussion at the
same time than pairs of ordinary users. Our analysis suggests a taxonomy of
deceptive behavior in discussion communities. Pairs of sockpuppets can vary in
their deceptiveness, i.e., whether they pretend to be different users, or their
supportiveness, i.e., if they support arguments of other sockpuppets controlled
by the same user. We apply these findings to a series of prediction tasks,
notably, to identify whether a pair of accounts belongs to the same underlying
user or not. Altogether, this work presents a data-driven view of deception in
online discussion communities and paves the way towards the automatic detection
of sockpuppets.Comment: 26th International World Wide Web conference 2017 (WWW 2017
Search and Discovery Tools for Astronomical On-line Resources and Services
A growing number of astronomical resources and data or information services
are made available through the Internet. However valuable information is
frequently hidden in a deluge of non-pertinent or non up-to-date documents. At
a first level, compilations of astronomical resources provide help for
selecting relevant sites. Combining yellow-page services and meta-databases of
active pointers may be an efficient solution to the data retrieval problem.
Responses generated by submission of queries to a set of heterogeneous
resources are difficult to merge or cross-match, because different data
providers generally use different data formats: new endeavors are under way to
tackle this problem. We review the technical challenges involved in trying to
provide general search and discovery tools, and to integrate them through upper
level interfaces.Comment: 7 pages, 2 Postscript figures; to be published in A&A
The Analysis of Online Communities using Interactive Content-based Social Networks
published or submitted for publicationis peer reviewe
Automatically attaching web pages to an ontology
This paper describes a proposed system for automatically attaching material from the world wide web to concepts in an ontology. The motivation for this research stems from the Diogene project, which requires the project's own databases of learning objects to be augmented with additional resources from the web. Two main approaches to this problem are being taken: one using ontology mapping, and another based on the conventional text search facilities of the web, covered in this paper. By generating queries based on the concepts in the ontology, the aim is to retrieve material from the web, and then filter it to ensure its proper correspondence with a concept. The Diogene system will be briefly outlined, before the query-generation system is described. A small pilot experiment, designed to provide some initial results and insight into the problem, is then presented
Relation Discovery from Web Data for Competency Management
This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006
A Linked Data Approach to Sharing Workflows and Workflow Results
A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users
Report of the Stanford Linked Data Workshop
The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants
Social Search with Missing Data: Which Ranking Algorithm?
Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods
OCRIS : online catalogue and repository interoperability study. Final report
The aims and objectives of OCRIS were to: • Survey the extent to which repository content is in scope for institutional library OPACs, and the extent to which it is already recorded there; • Examine the interoperability of OPAC and repository software for the exchange of metadata and other information; • List the various services to institutional managers, researchers, teachers and learners offered respectively by OPACs and repositories; • Identify the potential for improvements in the links (e.g. using link resolver technology) from repositories and/or OPACs to other institutional services, such as finance or research administration; • Make recommendations for the development of possible further links between library OPACs and institutional repositories, identifying the benefits to relevant stakeholder groups
- …