12 research outputs found
Beyond Labels: Leveraging Deep Learning and LLMs for Content Metadata
Content metadata plays a very important role in movie recommender systems as
it provides valuable information about various aspects of a movie such as
genre, cast, plot synopsis, box office summary, etc. Analyzing the metadata can
help understand the user preferences to generate personalized recommendations
and item cold starting. In this talk, we will focus on one particular type of
metadata - \textit{genre} labels. Genre labels associated with a movie or a TV
series help categorize a collection of titles into different themes and
correspondingly setting up the audience expectation. We present some of the
challenges associated with using genre label information and propose a new way
of examining the genre information that we call as the \textit{Genre Spectrum}.
The Genre Spectrum helps capture the various nuanced genres in a title and our
offline and online experiments corroborate the effectiveness of the approach.
Furthermore, we also talk about applications of LLMs in augmenting content
metadata which could eventually be used to achieve effective organization of
recommendations in user's 2-D home-grid
Recommendations and User Agency: The Reachability of Collaboratively-Filtered Information
Recommender systems often rely on models which are trained to maximize
accuracy in predicting user preferences. When the systems are deployed, these
models determine the availability of content and information to different
users. The gap between these objectives gives rise to a potential for
unintended consequences, contributing to phenomena such as filter bubbles and
polarization. In this work, we consider directly the information availability
problem through the lens of user recourse. Using ideas of reachability, we
propose a computationally efficient audit for top- linear recommender
models. Furthermore, we describe the relationship between model complexity and
the effort necessary for users to exert control over their recommendations. We
use this insight to provide a novel perspective on the user cold-start problem.
Finally, we demonstrate these concepts with an empirical investigation of a
state-of-the-art model trained on a widely used movie ratings dataset.Comment: appeared at FAccT '2
Mining relationships in spatio-temporal datasets
University of Minnesota Ph.D. dissertation. January 2013. Major: Computer science. Advisor: Vipin Kumar. 1 computer file (PDF); xi, 140 pages
Constrained Spectral Clustering using L1 Regularization
Constrained spectral clustering is a semi-supervised learning problem that aims at incorporating userdefined constraints in spectral clustering. Typically, there are two kinds of constraints: (i) must-link, and (ii) cannot-link. These constraints represent prior knowledge indicating whether two data objects should be in the same cluster or not; thereby aiding in clustering. In this paper, we propose a novel approach that uses convex subproblems to incorporate constraints in spectral clustering and co-clustering. In comparison to the prior state-of-art approaches, our approach presents a more natural way to incorporate constraints in the spectral methods and allows us to make a trade off between the number of satisfied constraints and the quality of partitions on the original graph. We use an L1 regularizer analogous to LASSO, often used in literature to induce sparsity, in order to control the number of constraints satisfied. Our approach can handle both must-link and cannot-link constraints, unlike a large number of previous approaches that mainly work on the former. Further, our formulation is based on the reduction to a convex subproblem which is relatively easy to solve using existing solvers. We test our proposed approach on real world datasets and show its effectiveness for both spectral clustering and co-clustering over the prior state-of-art.
Connecting mutually influencing bloggers
The blogosphere shows the characteristics of a power law distribution where a small set of the bloggers (influentials) get the majority of readership and the vast majority receives little traffic. Blogger recommendation algorithms aim at finding influentials for recommendation, putting bloggers with limited readership at further disadvantage. These bloggers could benefit from mutual endorsement of each other with the eventual goal of forming strong local communities with broader readership. In this paper, we propose a recommendation algorithm to connect blogger pairs with the intent that once connected the bloggers would share a mutually influencing relationship between them. In particular, we compute bloggers’ influence profile based on how much she influences her blog friends and recommend bloggers with similar influence profiles. We characterize bloggers into four different groups: {global leaders, connectors, local leaders, isolates}. Our result shows marginal benefit for isolates and significant benefit for local leaders. Our approach can be instructive in building intelligent recommendation engine for bloggers with limited readership to build strong local communities
Churn Prediction in MMORPGs : A Social Influence Based Approach
(MMORPGs) are computer based games in which players interact with one another in the virtual world. Worldwide revenues for MMORPGs have seen amazing growth in last few years and it is more than a 2 billion dollars industry as per current estimates. Huge amount of revenue potential has attracted several gaming companies to launch online role playing games. One of the major problems these companies suffer apart from fierce competition is erosion of their customer base. Churn is a big problem for the gaming companies as churners impact negatively in the ”word-of-mouth ” reports for potential and existing customers leading to further erosion of user base. We study the problem of player churn in the popular MMORPG EverQuest II. The problem of churn prediction has been studied extensively in the past in various domains and social network analysis has recently been applied to the proble
Discovering Dynamic Dipoles in Climate Data
Pressure dipoles are important long distance climate phenomena (teleconnection) characterized by pressure anomalies of opposite polarity appearing at two different locations at the same time. Such dipoles have proven important for understanding and explaining the variability in climate in many regions of the world, e.g., the El Niño climate phenomenon is known to be responsible for precipitation and temperature anomalies worldwide. This paper presents a novel approach for dipole discovery that outperforms existing state of the art algorithms. Our approach is based on a climate anomaly network that is constructed using the correlation of time series of climate variables at all the locations on the Earth. One novel aspect of our approach to the analysis of such networks is a careful treatment of negative correlations, whose proper consideration is critical for finding dipoles. Another key insight provided by our work is the importance of modeling the time dependent patterns of the dipoles in order to better capture the impact of important climate phenomena on land. The results presented in this paper show that these innovations allow our approach to produce better results than previous approaches in terms of matching existing climate indices with high correlation and capturing the impact of climate indices on land.
Tracking Spatio-Temporal Diffusion in Climate Data
A forest canopy forms a critical platform for complex interactions between the vegetation and the atmosphere boundary layer and is considered as a crucial piece for environmental scientists in their understanding of the ecosystem and its response to the climate change. Microfronts represent a class of these interactions characterized by a moving mass of air that introduce fluctuations in ambient temperature and humidity on small spatial and temporal scales. In this paper, we present a joint spatio-temporal hidden markov model that simultaneously incorporates neighborhood dependencies in space and time. We show that our approach can trace the diffusion of microfronts more effectively than several baseline methods over a sensor data from Brazilian rainforest and a synthetically generated dataset.