59 research outputs found
A Generative Product-of-Filters Model of Audio
We propose the product-of-filters (PoF) model, a generative model that
decomposes audio spectra as sparse linear combinations of "filters" in the
log-spectral domain. PoF makes similar assumptions to those used in the classic
homomorphic filtering approach to signal processing, but replaces hand-designed
decompositions built of basic signal processing operations with a learned
decomposition based on statistical inference. This paper formulates the PoF
model and derives a mean-field method for posterior inference and a variational
EM algorithm to estimate the model's free parameters. We demonstrate PoF's
potential for audio processing on a bandwidth expansion task, and show that PoF
can serve as an effective unsupervised feature extractor for a speaker
identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod
Recommended from our members
Understanding Music Semantics and User Behavior with Probabilistic Latent Variable Models
Bayesian probabilistic modeling provides a powerful framework for building flexible models to incorporate latent structures through likelihood model and prior. When we specify a model, we make certain assumptions about the underlying data-generating process with respect to these latent structures. For example, the latent Dirichlet allocation (LDA) model assumes that when generating a document, we first select a latent topic and then select a word that often appears in the selected topic. We can uncover the latent structures conditioned on the observed data via posterior inference. In this dissertation, we apply the tools of probabilistic latent variable models and try to understand complex real-world data about music semantics and user behavior.
We first look into the problem of automatic music tagging -- inferring the semantic tags (e.g., "jazz'', "piano'', "happy'', etc.) from the audio features. We treat music tagging as a matrix completion problem and apply the Poisson matrix factorization model jointly on the vector-quantized audio features and a "bag-of-tags'' representation. This approach exploits the shared latent structure between semantic tags and acoustic codewords. We present experimental results on the Million Song Dataset for both annotation and retrieval tasks, illustrating the steady improvement in performance as more data is used.
We then move to the intersection between music semantics and user behavior: music recommendation. The leading performance in music recommendation is achieved by collaborative filtering methods which exploit the similarity patterns in user's listening history. We address the fundamental cold-start problem of collaborative filtering: it cannot recommend new songs that no one has listened to. We train a neural network on semantic tagging information as a content model and use it as a prior in a collaborative filtering model. The proposed system is evaluated on the Million Song Dataset and shows comparably better result than the collaborative filtering approaches, in addition to the favorable performance in the cold-start case.
Finally, we focus on general recommender systems. We examine two different types of data: implicit and explicit feedback, and introduce the notion of user exposure (whether or not a user is exposed to an item) as part of the data-generating process, which is latent for implicit data and observed for explicit data. For implicit data, we propose a probabilistic matrix factorization model and infer the user exposure from data. In the language of causal analysis (Imbens and Rubin, 2015), user exposure has close connection to the assignment mechanism. We leverage this connection more directly for explicit data and develop a causal inference approach to recommender systems. We demonstrate that causal inference for recommender systems leads to improved generalization to new data.
Exact posterior inference is generally intractable for latent variables models. Throughout this thesis, we will design specific inference procedure to tractably analyze the large-scale data encountered under each scenario
Biodegradation of polycyclic aromatic hydrocarbons (PAHs) by white rot-fungus Pseudotrametes gibbosa isolated from the boreal forest in Northeast China
This study compared laccase production and the degradation of polycyclic aromatic hydrocarbons (PAHs) by aboriginal white rot-fungus Pseudotrametes gibbosa (found in the northeast forest area of China) and Pleurotus ostreatus (which has been studied both domestically in China and overseas). The results showed that the laccase activity of P. gibbosa was 2841.3 U/l, which was 6 times more than that of P. ostreatus under the same culture conditions. The degradation of Anthracene and pyrene induced by P. gibbosa were 43.43 and 24.26%, while the removal efficiencies induced by P. ostreatus were only 30.12 and 18.76%. The results also showed a positive correlation between the PAHs degradation and laccase activity, and Pseudotrametes gibbosa had significant potential due to its higher laccase production and more potent degradation of PAHs. This study provides technical support for pollution amelioration using aboriginal white-rot fungus.Key words: White-rot fungus, laccase, polycyclic aromatic hydrocarbons, degradation
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Developing accurate off-policy estimators is crucial for both evaluating and
optimizing for new policies. The main challenge in off-policy estimation is the
distribution shift between the logging policy that generates data and the
target policy that we aim to evaluate. Typically, techniques for correcting
distribution shift involve some form of importance sampling. This approach
results in unbiased value estimation but often comes with the trade-off of high
variance, even in the simpler case of one-step contextual bandits. Furthermore,
importance sampling relies on the common support assumption, which becomes
impractical when the action space is large. To address these challenges, we
introduce the Policy Convolution (PC) family of estimators. These methods
leverage latent structure within actions -- made available through action
embeddings -- to strategically convolve the logging and target policies. This
convolution introduces a unique bias-variance trade-off, which can be
controlled by adjusting the amount of convolution. Our experiments on synthetic
and benchmark datasets demonstrate remarkable mean squared error (MSE)
improvements when using PC, especially when either the action space or policy
mismatch becomes large, with gains of up to 5 - 6 orders of magnitude over
existing estimators.Comment: Under review. 36 pages, 31 figure
Debiased offline evaluation of recommender systems: A weighted-sampling approach
Offline evaluation of recommender systems mostly relies on historical data, which is often biased by many confounders. In such data, user-item interactions are Missing Not At Random (MNAR). Measures of recommender system performance on MNAR test data are unlikely to be reliable indicators of real-world performance unless something is done to mitigate the bias. One way that researchers try to obtain less biased offline evaluation is by designing new supposedly unbiased performance estimators for use on MNAR test data. We investigate an alternative solution, a sampling approach. The general idea is to use a sampling strategy on MNAR data to generate an intervened test set with less bias --- one in which interactions are Missing At Random (MAR) or, at least, one that is more MAR-like. An example of this is SKEW, a sampling strategy that aims to adjust for the confounding effect that an item's popularity has on its likelihood of being observed. In this paper, we propose a novel formulation for the sampling approach. We compare our solution to SKEW and to two baselines which perform a random intervention on MNAR data (and hence are equivalent to no intervention in practice). We empirically validate for the first time the effectiveness of SKEW and we show our approach to be a better estimator of the performance one would obtain on (unbiased) MAR test data. Our strategy benefits from high generality properties (e.g. it can also be employed for training a recommender) and low overheads (e.g. it does not require any learning)
Attentive Neural Architecture Incorporating Song Features For Music Recommendation
Recommender Systems are an integral part of music sharing platforms. Often
the aim of these systems is to increase the time, the user spends on the
platform and hence having a high commercial value. The systems which aim at
increasing the average time a user spends on the platform often need to
recommend songs which the user might want to listen to next at each point in
time. This is different from recommendation systems which try to predict the
item which might be of interest to the user at some point in the user lifetime
but not necessarily in the very near future. Prediction of the next song the
user might like requires some kind of modeling of the user interests at the
given point of time. Attentive neural networks have been exploiting the
sequence in which the items were selected by the user to model the implicit
short-term interests of the user for the task of next item prediction, however
we feel that the features of the songs occurring in the sequence could also
convey some important information about the short-term user interest which only
the items cannot. In this direction, we propose a novel attentive neural
architecture which in addition to the sequence of items selected by the user,
uses the features of these items to better learn the user short-term
preferences and recommend the next song to the user.Comment: Accepted as a paper at the 12th ACM Conference on Recommender Systems
(RecSys 18
- …