8 research outputs found

    Malicious Selling Strategies During Livestream Shopping: A Case Study of Alibaba's Taobao and ByteDance's TikTok

    Full text link
    Due to the limitations imposed by the COVID-19 pandemic, many users have shifted their shopping patterns from offline to online. Livestream shopping has become popular as one of the online shopping media. However, many streamers' malicious selling behaviors have been reported. In this research, we sought to explore streamers' malicious selling strategies and understand how viewers perceive these strategies. First, we recorded 40 livestream shopping sessions from two popular livestream platforms in China -- Taobao and TikTok (or "Douyin" in Chinese). We identified four categories of malicious selling strategies (i.e., Restrictive, Deceptive, Covert, and Asymmetric) and found that platform designs enhanced these malicious selling strategies. Second, through an interview study with 13 viewers, we provide a rich description of viewers' awareness of malicious selling strategies and the challenges they encountered while trying to overcome malicious selling. We conclude by discussing the policy and design implications of countering malicious selling

    Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind

    Full text link
    When reading a story, humans can rapidly understand new fictional characters with a few observations, mainly by drawing analogy to fictional and real people they met before in their lives. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., humans' theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of models' ability of meta-learning of ToM in a realistic narrative understanding scenario. Our benchmark consists of \sim1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task; and requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. Our human study verified that humans can solve our problem by inferring characters' mental states based on their previously seen movies; while the state-of-the-art metric-learning and meta-learning approaches adapted to our task lags 30% behind

    FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge

    Full text link
    Detecting factual errors in textual information, whether generated by large language models (LLM) or curated by humans, is crucial for making informed decisions. LLMs' inability to attribute their claims to external knowledge and their tendency to hallucinate makes it difficult to rely on their responses. Humans, too, are prone to factual errors in their writing. Since manual detection and correction of factual errors is labor-intensive, developing an automatic approach can greatly reduce human effort. We present FLEEK, a prototype tool that automatically extracts factual claims from text, gathers evidence from external knowledge sources, evaluates the factuality of each claim, and suggests revisions for identified errors using the collected evidence. Initial empirical evaluation on fact error detection (77-85\% F1) shows the potential of FLEEK. A video demo of FLEEK can be found at https://youtu.be/NapJFUlkPdQ.Comment: EMNLP 2023 (Demonstration Track

    Open Domain Knowledge Extraction for Knowledge Graphs

    Full text link
    The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.Comment: 7 pages, 7 figures, 5 tables, preprint technical report, no code or data is release

    GEMv2 : Multilingual NLG benchmarking in a single line of code

    Get PDF
    Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe

    GEMv2 : Multilingual NLG benchmarking in a single line of code

    Get PDF
    Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe

    Assessing Topical Homogeneity with Word Embedding and Distance Matrices

    No full text
    Researchers from many fields have used statistical tools to make sense of large bodies of text. Many tools support quantitative analysis of documents within a corpus, but relatively few studies have examined statistical characteristics of whole corpora. Statistical summaries of whole corpora and comparisons between corpora have potential application in the analysis of topically organized applications such social media platforms. In this study, we created matrix representations of several corpora and examined several statistical tests to make comparisons between pairs of corpora with respect to the topical homogeneity of documents within each corpus. Results of three experiments suggested that a matrix of cosine distances calculated from vector summaries of short phrases contains useful information about how closely the documents within a corpus relate to one another. Both the tested summarization method and a non-parametric test for comparing cosine distance matrices appear to have utility for examining and comparing corpora containing brief texts
    corecore