90,190 research outputs found
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
On Web User Tracking: How Third-Party Http Requests Track Users' Browsing Patterns for Personalised Advertising
On today's Web, users trade access to their private data for content and
services. Advertising sustains the business model of many websites and
applications. Efficient and successful advertising relies on predicting users'
actions and tastes to suggest a range of products to buy. It follows that,
while surfing the Web users leave traces regarding their identity in the form
of activity patterns and unstructured data. We analyse how advertising networks
build user footprints and how the suggested advertising reacts to changes in
the user behaviour.Comment: arXiv admin note: substantial text overlap with arXiv:1605.0653
You never surf alone. Ubiquitous tracking of users' browsing habits
In the early age of the internet users enjoyed a large level of anonymity. At
the time web pages were just hypertext documents; almost no personalisation of
the user experience was o ered. The Web today has evolved as a world wide
distributed system following specific architectural paradigms. On the web now,
an enormous quantity of user generated data is shared and consumed by a network
of applications and services, reasoning upon users expressed preferences and
their social and physical connections. Advertising networks follow users'
browsing habits while they surf the web, continuously collecting their traces
and surfing patterns. We analyse how users tracking happens on the web by
measuring their online footprint and estimating how quickly advertising
networks are able to pro le users by their browsing habits
Recommended from our members
Application of Natural Language Processing and Evidential Analysis to Web-Based Intelligence Information Acquisition
The quality of decisions made in business and government relates directly to the quality of the information used to formulate the decision. This information may be retrieved from an organization's knowledge base (Intranet) or from the World Wide Web. Intelligence services Intranet held information can be efficiently manipulated by technologies based upon either semantics such as ontologies, or statistics such as meaning-based computing. These technologies require complex processing of large amount of textual information. However, they cannot currently be effectively applied to Web-based search due to various obstacles, such as lack of semantic tagging. A new approach proposed in this paper supports Web-based search for intelligence information utilizing evidence-based natural language processing (NLP). This approach combines traditional NLP methods for filtering of Web-search results, Grounded Theory to test the completeness of the evidence, and Evidential Analysis to test the quality of gathered information. The enriched information derived from the Web-search will be transferred to the intelligence services knowledge base for handling by an effective Intranet search system thus increasing substantially the information for intelligence analysis. The paper will show that the quality of retrieved information is significantly enhanced by the discovery of previously unknown facts derived from known facts
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
- …