433 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
`It is currently hodgepodge'': Examining AI/ML Practitioners' Challenges during Co-production of Responsible AI Values
Recently, the AI/ML research community has indicated an urgent need to
establish Responsible AI (RAI) values and practices as part of the AI/ML
lifecycle. Several organizations and communities are responding to this call by
sharing RAI guidelines. However, there are gaps in awareness, deliberation, and
execution of such practices for multi-disciplinary ML practitioners. This work
contributes to the discussion by unpacking co-production challenges faced by
practitioners as they align their RAI values. We interviewed 23 individuals,
across 10 organizations, tasked to ship AI/ML based products while upholding
RAI norms and found that both top-down and bottom-up institutional structures
create burden for different roles preventing them from upholding RAI values, a
challenge that is further exacerbated when executing conflicted values. We
share multiple value levers used as strategies by the practitioners to resolve
their challenges. We end our paper with recommendations for inclusive and
equitable RAI value-practices, creating supportive organizational structures
and opportunities to further aid practitioners
User-oriented recommender systems in retail
User satisfaction is considered a key objective for all service provider platforms, regardless of the nature of the service, encompassing domains such as media, entertainment, retail, and information. While the goal of satisfying users is the same across different domains and services, considering domain-specific characteristics is of paramount importance to ensure users have a positive experience with a given system. User interaction data with a system is one of the main sources of data that facilitates achieving this goal. In this thesis, we investigate how to learn from domain-specific user interactions. We focus on recommendation as our main task, and retail as our main domain. We further explore the finance domain and the demand forecasting task as additional directions to understand whether our methodology and findings generalize to other tasks and domains. The research in this thesis is organized around the following dimensions: 1) Characteristics of multi-channel retail: we consider a retail setting where interaction data comes from both digital (i.e., online) and in-store (i.e., offline) shopping; 2) From user behavior to recommendation: we conduct extensive descriptive studies on user interaction log datasets that inform the design of recommender systems in two domains, retail and finance. Our key contributions in characterizing multi-channel retail are two-fold. First, we propose a neural model that makes use of sales in multiple shopping channels in order to improve the performance of demand forecasting in a target channel. Second, we provide the first study of user behavior in a multi-channel retail setting, which results in insights about the channel-specific properties of user behavior, and their effects on the performance of recommender systems. We make three main contributions in designing user-oriented recommender systems. First, we provide a large-scale user behavior study in the finance domain, targeted at understanding financial information seeking behavior in user interactions with company filings. We then propose domain-specific user-oriented filing recommender systems that are informed by the findings of the user behavior analysis. Second, we analyze repurchasing behavior in retail, specifically in the grocery shopping domain. We then propose a repeat consumption-aware neural recommender for this domain. Third, we focus on scalable recommendation in retail and propose an efficient recommender system that explicitly models users' personal preferences that are reflected in their purchasing history
Efficient Continual Pre-training for Building Domain Specific Large Language Models
Large language models (LLMs) have demonstrated remarkable open-domain
capabilities. Traditionally, LLMs tailored for a domain are trained from
scratch to excel at handling domain-specific tasks. In this work, we explore an
alternative strategy of continual pre-training as a means to develop
domain-specific LLMs. We introduce FinPythia-6.9B, developed through
domain-adaptive continual pre-training on the financial domain. Continual
pre-trained FinPythia showcases consistent improvements on financial tasks over
the original foundational model. We further explore simple but effective data
selection strategies for continual pre-training. Our data selection strategies
outperforms vanilla continual pre-training's performance with just 10% of
corpus size and cost, without any degradation on open-domain standard tasks.
Our work proposes an alternative solution to building domain-specific LLMs from
scratch in a cost-effective manner
Supervised Adversarial Contrastive Learning for Emotion Recognition in Conversations
Extracting generalized and robust representations is a major challenge in
emotion recognition in conversations (ERC). To address this, we propose a
supervised adversarial contrastive learning (SACL) framework for learning
class-spread structured representations. The framework applies contrast-aware
adversarial training to generate worst-case samples and uses a joint
class-spread contrastive learning objective on both original and adversarial
samples. It can effectively utilize label-level feature consistency and retain
fine-grained intra-class features. To avoid the negative impact of adversarial
perturbations on context-dependent data, we design a contextual adversarial
training strategy to learn more diverse features from context and enhance the
model's context robustness. We develop a sequence-based method SACL-LSTM under
this framework, to learn label-consistent and context-robust emotional features
for ERC. Experiments on three datasets demonstrate that SACL-LSTM achieves
state-of-the-art performance on ERC. Extended experiments prove the
effectiveness of the SACL framework.Comment: 16 pages, accepted by ACL 202
Towards learning mechanistic models at the right level of abstraction
Das menschliche Gehirn ist in der Lage, Vorhersagen zu treffen, zu planen und sich durch mentale Simulationen kontrafaktische Situationen vorzustellen. Künstliche neuronale Netze sind zwar in bestimmten Bereichen brereits sehr leistungsfähig, scheinen aber immer noch ein mechanistisches Verständnis der Welt zu vermissen. In dieser Arbeit befassen wir uns mit verschiedenen Ansätzen, wie neuronale Netze die zugrundeliegenden Mechanismen des modellierten Systems besser erfassen können. Wir werden uns mit Adaptive skip intervals (ASI) befassen; eine Methode, die es dynamischen Modellen ermöglicht, ihre eigene zeitliche Vergröberung an jedem Punkt zu wählen. Dadurch werden langfristige Vorhersagen sowohl einfacher als auch rechnerisch effizienter. Als Nächstes werden wir uns mit alternativen Möglichkeiten zur Aggregation von Gradienten in verschiedenen Umgebungen befassen, was zum Begriff der Invariant Learning Consistency (ILC) und der Methode AND-mask für einen modifizierten stochastischen Gradientenabstieg führt. Durch das Herausfiltern inkonsistenter Trainingssignale aus verschiedenen Umgebungen bleiben die gemeinsamen Mechanismen erhalten. Schließlich werden wir sehen, dass Lernen auf der Grundlage von Meta-Gradienten Trajektorien von dynamischen Systemen transformieren kann, um nützliche Lernsignale in Richtung eines zugrunde liegenden Ziels zu konstruieren, wie z. B. Reward beim Reinforcement Learning. Dadurch kann das interne Modell sowohl eine zeitliche als auch eine Zustandsabstraktion beinhalten
GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking
Retrieval-enhanced text generation, which aims to leverage passages retrieved
from a large passage corpus for delivering a proper answer given the input
query, has shown remarkable progress on knowledge-intensive language tasks such
as open-domain question answering and knowledge-enhanced dialogue generation.
However, the retrieved passages are not ideal for guiding answer generation
because of the discrepancy between retrieval and generation, i.e., the
candidate passages are all treated equally during the retrieval procedure
without considering their potential to generate the proper answers. This
discrepancy makes a passage retriever deliver a sub-optimal collection of
candidate passages to generate answers. In this paper, we propose the
GeneRative Knowledge Improved Passage Ranking (GripRank) approach, addressing
the above challenge by distilling knowledge from a generative passage estimator
(GPE) to a passage ranker, where the GPE is a generative language model used to
measure how likely the candidate passages can generate the proper answer. We
realize the distillation procedure by teaching the passage ranker learning to
rank the passages ordered by the GPE. Furthermore, we improve the distillation
quality by devising a curriculum knowledge distillation mechanism, which allows
the knowledge provided by the GPE can be progressively distilled to the ranker
through an easy-to-hard curriculum, enabling the passage ranker to correctly
recognize the provenance of the answer from many plausible candidates. We
conduct extensive experiments on four datasets across three knowledge-intensive
language tasks. Experimental results show advantages over the state-of-the-art
methods for both passage ranking and answer generation on the KILT benchmark.Comment: 11 pages, 4 figure
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Modeling Events and Interactions through Temporal Processes -- A Survey
In real-world scenario, many phenomena produce a collection of events that
occur in continuous time. Point Processes provide a natural mathematical
framework for modeling these sequences of events. In this survey, we
investigate probabilistic models for modeling event sequences through temporal
processes. We revise the notion of event modeling and provide the mathematical
foundations that characterize the literature on the topic. We define an
ontology to categorize the existing approaches in terms of three families:
simple, marked, and spatio-temporal point processes. For each family, we
systematically review the existing approaches based based on deep learning.
Finally, we analyze the scenarios where the proposed techniques can be used for
addressing prediction and modeling aspects.Comment: Image replacement
- …