11 research outputs found
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
Neural recommender models for sparse and skewed behavioral data
Modern online platforms offer recommendations and personalized search and services to a large and diverse user base while still aiming to acquaint users with the broader community on the platform. Prior work backed by large volumes of user data has shown that user retention is reliant on catering to their specific eccentric tastes, in addition to providing them popular services or content on the platform.
Long-tailed distributions are a fundamental characteristic of human activity, owing to the bursty nature of human attention. As a result, we often observe skew in data facets that involve human interaction. While there are superficial similarities to Zipf's law in textual data and other domains, the challenges with user data extend further. Individual words may have skewed frequencies in the corpus, but the long-tail words by themselves do not significantly impact downstream text-mining tasks. On the contrary, while sparse users (a majority on most online platforms) contribute little to the training data, they are equally crucial at inference time. Perhaps more so, since they are likely to churn.
In this thesis, we study platforms and applications that elicit user participation in rich social settings incorporating user-generated content, user-user interaction, and other modalities of user participation and data generation. For instance, users on the Yelp review platform participate in a follower-followee network and also create and interact with review text (two modalities of user data). Similarly, community question-answer (CQA) platforms incorporate user interaction and collaboratively authored content over diverse domains and discussion threads. Since user participation is multimodal, we develop generalizable abstractions beyond any single data modality.
Specifically, we aim to address the distributional mismatch that occurs with user data independent of dataset specifics; While a minority of the users generates most training samples, it is insufficient only to learn the preferences of this subset of users. As a result, the data's overall skew and individual users' sparsity are closely interlinked: sparse users with uncommon preferences are under-represented. Thus, we propose to treat these problems jointly with a skew-aware grouping mechanism that iteratively sharpens the identification of preference groups within the user population. As a result, we improve user characterization; content recommendation and activity prediction (+6-22% AUC, +6-43% AUC, +12-25% RMSE over state-of-the-art baselines), primarily for users with sparse activity.
The size of the item or content inventories compounds the skew problem. Recommendation models can achieve very high aggregate performance while recommending only a tiny proportion of the inventory (as little as 5%) to users. We propose a data-driven solution guided by the aggregate co-occurrence information across items in the dataset. We specifically note that different co-occurrences are not equally significant; For example, some co-occurring items are easily substituted while others are not. We develop a self-supervised learning framework where the aggregate co-occurrences guide the recommendation problem while providing room to learn these variations among the item associations. As a result, we improve coverage to ~100% (up from 5%) of the inventory and increase long-tail item recall up to 25%.
We also note that the skew and sparsity problems repeat across data modalities. For instance, social interactions and review content both exhibit aggregate skew, although individual users who actively generate reviews may not participate socially and vice-versa. It is necessary to differentially weight and merge different data sources for each user towards inference tasks in such cases. We show that the problem is inherently adversarial since the user participation modalities compete to describe a user accurately. We develop a framework to unify these representations while algorithmically tackling mode collapse, a well-known pitfall with adversarial models.
A more challenging but important instantiation of sparsity is the few-shot setting or cross-domain setting. We may only have a single or a few interactions for users or items in the sparse domains or partitions. We show that contextualizing user-item interactions helps us infer behavioral invariants in the dense domain, allowing us to correlate sparse participants to their active counterparts (resulting in 3x faster training, ~19% recall gains in multi-domain settings).
Finally, we consider the multi-task setting, where the platform incorporates multiple distinct recommendations and prediction tasks for each user. A single-user representation is insufficient for users who exhibit different preferences along each dimension. At the same time, it is counter-productive to handle correlated prediction or inference tasks in isolation. We develop a multi-faceted representation approach grounded on residual learning with heterogeneous knowledge graph representations, which provides us an expressive data representation for specialized domains and applications with multimodal user data. We achieve knowledge sharing by unifying task-independent and task-specific representations of each entity with a unified knowledge graph framework.
In each chapter, we also discuss and demonstrate how the proposed frameworks directly incorporate a wide range of gradient-optimizable recommendation and behavior models, maximizing their applicability and pertinence to user-centered inference tasks and platforms
Transfer Learning via Contextual Invariants for One-to-Many Cross-Domain Recommendation
The rapid proliferation of new users and items on the social web has
aggravated the gray-sheep user/long-tail item challenge in recommender systems.
Historically, cross-domain co-clustering methods have successfully leveraged
shared users and items across dense and sparse domains to improve inference
quality. However, they rely on shared rating data and cannot scale to multiple
sparse target domains (i.e., the one-to-many transfer setting). This, combined
with the increasing adoption of neural recommender architectures, motivates us
to develop scalable neural layer-transfer approaches for cross-domain learning.
Our key intuition is to guide neural collaborative filtering with
domain-invariant components shared across the dense and sparse domains,
improving the user and item representations learned in the sparse domains. We
leverage contextual invariances across domains to develop these shared modules,
and demonstrate that with user-item interaction context, we can learn-to-learn
informative representation spaces even with sparse interaction data. We show
the effectiveness and scalability of our approach on two public datasets and a
massive transaction dataset from Visa, a global payments technology company
(19% Item Recall, 3x faster vs. training separate models for each domain). Our
approach is applicable to both implicit and explicit feedback settings.Comment: SIGIR 202
Pre-trained Neural Recommenders: A Transferable Zero-Shot Framework for Recommendation Systems
Modern neural collaborative filtering techniques are critical to the success
of e-commerce, social media, and content-sharing platforms. However, despite
technical advances -- for every new application domain, we need to train an NCF
model from scratch. In contrast, pre-trained vision and language models are
routinely applied to diverse applications directly (zero-shot) or with limited
fine-tuning. Inspired by the impact of pre-trained models, we explore the
possibility of pre-trained recommender models that support building recommender
systems in new domains, with minimal or no retraining, without the use of any
auxiliary user or item information. Zero-shot recommendation without auxiliary
information is challenging because we cannot form associations between users
and items across datasets when there are no overlapping users or items. Our
fundamental insight is that the statistical characteristics of the user-item
interaction matrix are universally available across different domains and
datasets. Thus, we use the statistical characteristics of the user-item
interaction matrix to identify dataset-independent representations for users
and items. We show how to learn universal (i.e., supporting zero-shot
adaptation without user or item auxiliary information) representations for
nodes and edges from the bipartite user-item interaction graph. We learn
representations by exploiting the statistical properties of the interaction
data, including user and item marginals, and the size and density distributions
of their clusters
Audience-Centric Natural Language Generation via Style Infusion
Adopting contextually appropriate, audience-tailored linguistic styles is
critical to the success of user-centric language generation systems (e.g.,
chatbots, computer-aided writing, dialog systems). While existing approaches
demonstrate textual style transfer with large volumes of parallel or
non-parallel data, we argue that grounding style on audience-independent
external factors is innately limiting for two reasons. First, it is difficult
to collect large volumes of audience-specific stylistic data. Second, some
stylistic objectives (e.g., persuasiveness, memorability, empathy) are hard to
define without audience feedback.
In this paper, we propose the novel task of style infusion - infusing the
stylistic preferences of audiences in pretrained language generation models.
Since humans are better at pairwise comparisons than direct scoring - i.e., is
Sample-A more persuasive/polite/empathic than Sample-B - we leverage limited
pairwise human judgments to bootstrap a style analysis model and augment our
seed set of judgments. We then infuse the learned textual style in a GPT-2
based text generator while balancing fluency and style adoption. With
quantitative and qualitative assessments, we show that our infusion approach
can generate compelling stylized examples with generic text prompts. The code
and data are accessible at https://github.com/CrowdDynamicsLab/StyleInfusion.Comment: 14 pages, 3 figures, Accepted in Findings of EMNLP 202