8 research outputs found
Malicious Selling Strategies During Livestream Shopping: A Case Study of Alibaba's Taobao and ByteDance's TikTok
Due to the limitations imposed by the COVID-19 pandemic, many users have
shifted their shopping patterns from offline to online. Livestream shopping has
become popular as one of the online shopping media. However, many streamers'
malicious selling behaviors have been reported. In this research, we sought to
explore streamers' malicious selling strategies and understand how viewers
perceive these strategies. First, we recorded 40 livestream shopping sessions
from two popular livestream platforms in China -- Taobao and TikTok (or
"Douyin" in Chinese). We identified four categories of malicious selling
strategies (i.e., Restrictive, Deceptive, Covert, and Asymmetric) and found
that platform designs enhanced these malicious selling strategies. Second,
through an interview study with 13 viewers, we provide a rich description of
viewers' awareness of malicious selling strategies and the challenges they
encountered while trying to overcome malicious selling. We conclude by
discussing the policy and design implications of countering malicious selling
Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind
When reading a story, humans can rapidly understand new fictional characters
with a few observations, mainly by drawing analogy to fictional and real people
they met before in their lives. This reflects the few-shot and meta-learning
essence of humans' inference of characters' mental states, i.e., humans'
theory-of-mind (ToM), which is largely ignored in existing research. We fill
this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of
models' ability of meta-learning of ToM in a realistic narrative understanding
scenario. Our benchmark consists of 1,000 parsed movie scripts for this
purpose, each corresponding to a few-shot character understanding task; and
requires models to mimic humans' ability of fast digesting characters with a
few starting scenes in a new movie. Our human study verified that humans can
solve our problem by inferring characters' mental states based on their
previously seen movies; while the state-of-the-art metric-learning and
meta-learning approaches adapted to our task lags 30% behind
FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
Detecting factual errors in textual information, whether generated by large
language models (LLM) or curated by humans, is crucial for making informed
decisions. LLMs' inability to attribute their claims to external knowledge and
their tendency to hallucinate makes it difficult to rely on their responses.
Humans, too, are prone to factual errors in their writing. Since manual
detection and correction of factual errors is labor-intensive, developing an
automatic approach can greatly reduce human effort. We present FLEEK, a
prototype tool that automatically extracts factual claims from text, gathers
evidence from external knowledge sources, evaluates the factuality of each
claim, and suggests revisions for identified errors using the collected
evidence. Initial empirical evaluation on fact error detection (77-85\% F1)
shows the potential of FLEEK. A video demo of FLEEK can be found at
https://youtu.be/NapJFUlkPdQ.Comment: EMNLP 2023 (Demonstration Track
Open Domain Knowledge Extraction for Knowledge Graphs
The quality of a knowledge graph directly impacts the quality of downstream
applications (e.g. the number of answerable questions using the graph). One
ongoing challenge when building a knowledge graph is to ensure completeness and
freshness of the graph's entities and facts. In this paper, we introduce ODKE,
a scalable and extensible framework that sources high-quality entities and
facts from open web at scale. ODKE utilizes a wide range of extraction models
and supports both streaming and batch processing at different latency. We
reflect on the challenges and design decisions made and share lessons learned
when building and deploying ODKE to grow an industry-scale open domain
knowledge graph.Comment: 7 pages, 7 figures, 5 tables, preprint technical report, no code or
data is release
GEMv2 : Multilingual NLG benchmarking in a single line of code
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe
GEMv2 : Multilingual NLG benchmarking in a single line of code
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe
Assessing Topical Homogeneity with Word Embedding and Distance Matrices
Researchers from many fields have used statistical tools to make sense of large bodies of text. Many tools support quantitative analysis of documents within a corpus, but relatively few studies have examined statistical characteristics of whole corpora. Statistical summaries of whole corpora and comparisons between corpora have potential application in the analysis of topically organized applications such social media platforms. In this study, we created matrix representations of several corpora and examined several statistical tests to make comparisons between pairs of corpora with respect to the topical homogeneity of documents within each corpus. Results of three experiments suggested that a matrix of cosine distances calculated from vector summaries of short phrases contains useful information about how closely the documents within a corpus relate to one another. Both the tested summarization method and a non-parametric test for comparing cosine distance matrices appear to have utility for examining and comparing corpora containing brief texts