20,960 research outputs found
KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles
We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework
for topical keyphrase generation and ranking. By shifting from the
unigram-centric traditional methods of unsupervised keyphrase extraction to a
phrase-centric approach, we are able to directly compare and rank phrases of
different lengths. We construct a topical keyphrase ranking function which
implements the four criteria that represent high quality topical keyphrases
(coverage, purity, phraseness, and completeness). The effectiveness of our
approach is demonstrated on two collections of content-representative titles in
the domains of Computer Science and Physics.Comment: 9 page
Information-theoretic measures of music listening behaviour
We present an information-theoretic approach to the mea-
surement of usersâ music listening behaviour and selection of music features. Existing
ethnographic studies of mu- sic use have guided the design of music retrieval systems however are
typically qualitative and exploratory in nature. We introduce the SPUD dataset, comprising 10, 000
hand- made playlists, with user and audio stream metadata. With this, we illustrate the use of
entropy for analysing music listening behaviour, e.g. identifying when a user changed music
retrieval system. We then develop an approach to identifying music features that reflect usersâ
criteria for playlist curation, rejecting features that are independent of user behaviour. The
dataset and the code used to produce it are made available. The techniques described support a
quantitative yet user-centred approach to the evaluation of music features and retrieval systems,
without assuming objective ground truth labels
Enhancing multi-source content delivery in content-centric networks with fountain coding
Fountain coding has been considered as especially suitable for lossy environments, such as wireless networks, as it provides redundancy while reducing coordination overheads between sender(s) and receiver(s). As such it presents beneficial properties for multi-source and/or multicast communication. In this paper we investigate enhancing/increasing multi-source content delivery efficiency in the context of Content-Centric Networking (CCN) with the usage of fountain codes. In particular, we examine whether the combination of fountain coding with the in-network caching capabilities of CCN can further improve performance. We also present an enhancement of CCN's Interest forwarding mechanism that aims at minimizing duplicate transmissions that may occur in a multi-source transmission scenario, where all available content providers and caches with matching (cached) content transmit data packets simultaneously. Our simulations indicate that the use of fountain coding in CCN is a valid approach that further increases network performance compared to traditional schemes
Information-theoretic measures of music listening behaviour
We present an information-theoretic approach to the mea-
surement of usersâ music listening behaviour and selection of music features. Existing
ethnographic studies of mu- sic use have guided the design of music retrieval systems however are
typically qualitative and exploratory in nature. We introduce the SPUD dataset, comprising 10, 000
hand- made playlists, with user and audio stream metadata. With this, we illustrate the use of
entropy for analysing music listening behaviour, e.g. identifying when a user changed music
retrieval system. We then develop an approach to identifying music features that reflect usersâ
criteria for playlist curation, rejecting features that are independent of user behaviour. The
dataset and the code used to produce it are made available. The techniques described support a
quantitative yet user-centred approach to the evaluation of music features and retrieval systems,
without assuming objective ground truth labels
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Current Challenges and Visions in Music Recommender Systems Research
Music recommender systems (MRS) have experienced a boom in recent years,
thanks to the emergence and success of online streaming services, which
nowadays make available almost all music in the world at the user's fingertip.
While today's MRS considerably help users to find interesting music in these
huge catalogs, MRS research is still facing substantial challenges. In
particular when it comes to build, incorporate, and evaluate recommendation
strategies that integrate information beyond simple user--item interactions or
content-based descriptors, but dig deep into the very essence of listener
needs, preferences, and intentions, MRS research becomes a big endeavor and
related publications quite sparse.
The purpose of this trends and survey article is twofold. We first identify
and shed light on what we believe are the most pressing challenges MRS research
is facing, from both academic and industry perspectives. We review the state of
the art towards solving these challenges and discuss its limitations. Second,
we detail possible future directions and visions we contemplate for the further
evolution of the field. The article should therefore serve two purposes: giving
the interested reader an overview of current challenges in MRS research and
providing guidance for young researchers by identifying interesting, yet
under-researched, directions in the field
The aceToolbox: low-level audiovisual feature extraction for retrieval and classification
In this paper we present an overview of a software platform
that has been developed within the aceMedia project,
termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM),
with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images
InSPeCT: Integrated Surveillance for Port Container Traffic
This paper describes a fully-operational content-indexing and management system, designed for monitoring and profiling freight-based vehicular traffic in a seaport environment. The 'InSPeCT' system captures video footage of passing vehicles and uses tailored OCR to index the footage according to vehicle license plates and freight codes. In addition to real-time functionality such as alerting, the system provides advanced search techniques for the efficient retrieval of records, where each vehicle is profiled according to multi-angled video, context information, and links to external information sources. Currently being piloted at a busy national seaport, the feedback from port officials indicates the system to be extremely useful in supplementing their existing transportation-security structures
- âŚ