51,534 research outputs found
Modeling Concept Dynamics for Large Scale Music Search
10.1145/2348283.2348346SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval455-46
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data
Widespread e-commerce activity on the Internet has led to new opportunities
to collect vast amounts of micro-level market and nonmarket data. In this paper
we share our experiences in collecting, validating, storing and analyzing large
Internet-based data sets in the area of online auctions, music file sharing and
online retailer pricing. We demonstrate how such data can advance knowledge by
facilitating sharper and more extensive tests of existing theories and by
offering observational underpinnings for the development of new theories. Just
as experimental economics pushed the frontiers of economic thought by enabling
the testing of numerous theories of economic behavior in the environment of a
controlled laboratory, we believe that observing, often over extended periods
of time, real-world agents participating in market and nonmarket activity on
the Internet can lead us to develop and test a variety of new theories.
Internet data gathering is not controlled experimentation. We cannot randomly
assign participants to treatments or determine event orderings. Internet data
gathering does offer potentially large data sets with repeated observation of
individual choices and action. In addition, the automated data collection holds
promise for greatly reduced cost per observation. Our methods rely on
technological advances in automated data collection agents. Significant
challenges remain in developing appropriate sampling techniques integrating
data from heterogeneous sources in a variety of formats, constructing
generalizable processes and understanding legal constraints. Despite these
challenges, the early evidence from those who have harvested and analyzed large
amounts of e-commerce data points toward a significant leap in our ability to
understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
Complexity Measures of Music
We present a technique to search for the presence of crucial events in music,
based on the analysis of the music volume. Earlier work on this issue was based
on the assumption that crucial events correspond to the change of music notes,
with the interesting result that the complexity index of the crucial events is
mu ~ 2, which is the same inverse power-law index of the dynamics of the brain.
The search technique analyzes music volume and confirms the results of the
earlier work, thereby contributing to the explanation as to why the brain is
sensitive to music, through the phenomenon of complexity matching. Complexity
matching has recently been interpreted as the transfer of multifractality from
one complex network to another. For this reason we also examine the
mulifractality of music, with the observation that the multifractal spectrum of
a computer performance is significantly narrower than the multifractal spectrum
of a human performance of the same musical score. We conjecture that although
crucial events are demonstrably important for information transmission, they
alone are not suficient to define musicality, which is more adequately measured
by the multifractality spectrum
Comparing Probabilistic Models for Melodic Sequences
Modelling the real world complexity of music is a challenge for machine
learning. We address the task of modeling melodic sequences from the same music
genre. We perform a comparative analysis of two probabilistic models; a
Dirichlet Variable Length Markov Model (Dirichlet-VMM) and a Time Convolutional
Restricted Boltzmann Machine (TC-RBM). We show that the TC-RBM learns
descriptive music features, such as underlying chords and typical melody
transitions and dynamics. We assess the models for future prediction and
compare their performance to a VMM, which is the current state of the art in
melody generation. We show that both models perform significantly better than
the VMM, with the Dirichlet-VMM marginally outperforming the TC-RBM. Finally,
we evaluate the short order statistics of the models, using the
Kullback-Leibler divergence between test sequences and model samples, and show
that our proposed methods match the statistics of the music genre significantly
better than the VMM.Comment: in Proceedings of the ECML-PKDD 2011. Lecture Notes in Computer
Science, vol. 6913, pp. 289-304. Springer (2011
Predictive Analysis for Social Processes II: Predictability and Warning Analysis
This two-part paper presents a new approach to predictive analysis for social
processes. Part I identifies a class of social processes, called positive
externality processes, which are both important and difficult to predict, and
introduces a multi-scale, stochastic hybrid system modeling framework for these
systems. In Part II of the paper we develop a systems theory-based,
computationally tractable approach to predictive analysis for these systems.
Among other capabilities, this analytic methodology enables assessment of
process predictability, identification of measurables which have predictive
power, discovery of reliable early indicators for events of interest, and
robust, scalable prediction. The potential of the proposed approach is
illustrated through case studies involving online markets, social movements,
and protest behavior
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
- …