34 research outputs found

    Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection

    Full text link
    This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are unavailable or expensive to gather. Therefore another solution, based on the sharing of information between tasks, has been developed: Multi-Task Learning (MTL). Despite the recent developments regarding MTL, the problem of negative transfer has still to be solved. Negative transfer is a phenomenon that occurs when noisy information is shared between tasks, resulting in a drop in performance. This paper proposes a new approach to mitigate the negative transfer problem based on the task awareness concept. The proposed approach results in diminishing the negative transfer together with an improvement of performance over classic MTL solution. Moreover, the proposed approach has been implemented in two unified architectures to detect Sexism, Hate Speech, and Toxic Language in text comments. The proposed architectures set a new state-of-the-art both in EXIST-2021 and HatEval-2019 benchmarks.Comment: 8 pages, 2 figures, 5 tables, IJCNN 2023 conferenc

    AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime

    Full text link
    With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the learning with disagreements regime. This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm by training directly from the data with disagreements, without using any aggregated label. Yet, performances considering both soft and hard evaluations are reported. The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish. In particular, our system is articulated in three different pipelines. The ensemble approach outperformed the individual large language models obtaining the best performances both adopting a soft and a hard label evaluation. This work describes the participation in all the three EXIST tasks, considering a soft evaluation, it obtained fourth place in Task 2 at EXIST and first place in Task 3, with the highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of our approaches is publicly available at https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement.Comment: 15 pages, 9 tables, 1 figures, conferenc

    Examining the Impact of Uncontrolled Variables on Physiological Signals in User Studies for Information Processing Activities

    Full text link
    Physiological signals can potentially be applied as objective measures to understand the behavior and engagement of users interacting with information access systems. However, the signals are highly sensitive, and many controls are required in laboratory user studies. To investigate the extent to which controlled or uncontrolled (i.e., confounding) variables such as task sequence or duration influence the observed signals, we conducted a pilot study where each participant completed four types of information-processing activities (READ, LISTEN, SPEAK, and WRITE). Meanwhile, we collected data on blood volume pulse, electrodermal activity, and pupil responses. We then used machine learning approaches as a mechanism to examine the influence of controlled and uncontrolled variables that commonly arise in user studies. Task duration was found to have a substantial effect on the model performance, suggesting it represents individual differences rather than giving insight into the target variables. This work contributes to our understanding of such variables in using physiological signals in information retrieval user studies.Comment: Accepted to the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23

    SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents

    Get PDF
    ABSTRACT Topic models such as Latent Dirichlet Allocation (LDA

    Designing and Evaluating Presentation Strategies for Fact-Checked Content

    Full text link
    With the rapid growth of online misinformation, it is crucial to have reliable fact-checking methods. Recent research on finding check-worthy claims and automated fact-checking have made significant advancements. However, limited guidance exists regarding the presentation of fact-checked content to effectively convey verified information to users. We address this research gap by exploring the critical design elements in fact-checking reports and investigating whether credibility and presentation-based design improvements can enhance users' ability to interpret the report accurately. We co-developed potential content presentation strategies through a workshop involving fact-checking professionals, communication experts, and researchers. The workshop examined the significance and utility of elements such as veracity indicators and explored the feasibility of incorporating interactive components for enhanced information disclosure. Building on the workshop outcomes, we conducted an online experiment involving 76 crowd workers to assess the efficacy of different design strategies. The results indicate that proposed strategies significantly improve users' ability to accurately interpret the verdict of fact-checking articles. Our findings underscore the critical role of effective presentation of fact reports in addressing the spread of misinformation. By adopting appropriate design enhancements, the effectiveness of fact-checking reports can be maximized, enabling users to make informed judgments.Comment: Accepted to the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23

    Report on the future conversations workshop at CHIIR 2021

    Get PDF
    The Future Conversations workshop at CHIIR’21 looked to the future of search, recommen- dation, and information interaction to ask: where are the opportunities for conversational interactions? What do we need to do to get there? Furthermore, who stands to benefit?The workshop was hands-on and interactive. Rather than a series of technical talks, we solicited position statements on opportunities, problems, and solutions in conversational search in all modalities (written, spoken, or multimodal). This paper –co-authored by the organisers and participants of the workshop– summarises the submitted statements and the discussions we had during the two sessions of the workshop. Statements discussed during the workshop are available at https://bit.ly/FutureConversations2021Statements

    How future surgery will benefit from SARS-COV-2-related measures: a SPIGC survey conveying the perspective of Italian surgeons

    Get PDF
    COVID-19 negatively affected surgical activity, but the potential benefits resulting from adopted measures remain unclear. The aim of this study was to evaluate the change in surgical activity and potential benefit from COVID-19 measures in perspective of Italian surgeons on behalf of SPIGC. A nationwide online survey on surgical practice before, during, and after COVID-19 pandemic was conducted in March-April 2022 (NCT:05323851). Effects of COVID-19 hospital-related measures on surgical patients' management and personal professional development across surgical specialties were explored. Data on demographics, pre-operative/peri-operative/post-operative management, and professional development were collected. Outcomes were matched with the corresponding volume. Four hundred and seventy-three respondents were included in final analysis across 14 surgical specialties. Since SARS-CoV-2 pandemic, application of telematic consultations (4.1% vs. 21.6%; p < 0.0001) and diagnostic evaluations (16.4% vs. 42.2%; p < 0.0001) increased. Elective surgical activities significantly reduced and surgeons opted more frequently for conservative management with a possible indication for elective (26.3% vs. 35.7%; p < 0.0001) or urgent (20.4% vs. 38.5%; p < 0.0001) surgery. All new COVID-related measures are perceived to be maintained in the future. Surgeons' personal education online increased from 12.6% (pre-COVID) to 86.6% (post-COVID; p < 0.0001). Online educational activities are considered a beneficial effect from COVID pandemic (56.4%). COVID-19 had a great impact on surgical specialties, with significant reduction of operation volume. However, some forced changes turned out to be benefits. Isolation measures pushed the use of telemedicine and telemetric devices for outpatient practice and favored communication for educational purposes and surgeon-patient/family communication. From the Italian surgeons' perspective, COVID-related measures will continue to influence future surgical clinical practice

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries

    Entity-Based Filtering and Topic Detection for Online Reputation Monitoring in Twitter

    No full text
    <p>With the rise of social media channels such as Twitter --the most popular microblogging service-- the control of what is said about entities --companies, people or products-- online has been shifted from them to users and consumers. This has generated the necessity of monitoring the reputation of those entities online. In this context, it is only natural to witness a significant growth of demand for text mining software for Online Reputation Monitoring: automatic tools that help processing, understanding and aggregating large streams of facts and opinions about a company or individual.<br>Despite the variety of Online Reputation Monitoring tools on the market, there is no standard evaluation framework yet --a widely accepted set of task definitions, evaluation measures and reusable test collections to tackle this problem. In fact, there is even no consensus on what the tasks carried out during the Online Reputation Monitoring process are, on which a system should minimize the effort of the user.</p> <p>In the context of a collective effort to identify and formalize the main challenges in the Online Reputation Monitoring process in Twitter, we have participated in the definition of tasks and subsequent creation of suitable test collections (WePS-3, RepLab 2012 and RepLab 2013 evaluation campaigns) and we have studied in depth two of the identified challenges:<br>'filtering' (Is a tweet related to a given entity of interest?) --modeled as a binary classification task--<br>and 'topic detection' (What is being said about an entity in a given tweet stream?), that consists of clustering tweets according to topics. Compared to previous studies on Twitter, our problem lies in its 'long tail': except for a few exceptions, the volume of information related to a specific entity (organization or company) at a given time is orders of magnitude smaller than Twitter trending topics, making the problem much more challenging than identifying<br>Twitter trends.</p> <p>We rely on three building blocks to propose different approaches to tackle these two tasks : the use of 'filter keywords', 'external resources' (such as Wikipedia, representative pages of the entity of interest, etc.) and the use of 'entity-specific training data' when available.</p> <p>We have found that the notion of 'filter keywords' --expressions that, if present<br>in a tweet, indicate a high probability that it is either related or unrelated to the entity of interest-- can be effectively used to tackle the filtering task. Here, (i) specificity of a term to the tweet stream of the entity is a useful feature to identify keywords, and (ii) the association between a term and the entity's Wikipedia page is useful to differentiate positive vs. negative filter keywords, especially when it is averaged by considering its most co-occurrent terms. In addition, exploring the nature of filter keywords also led us to the conclusion that there is a gap between the vocabulary that characterizes a company in Twitter and the vocabulary associated to the company in its homepage, in Wikipedia, and even in the Web at large.</p> <p>We have also found that, when entity-specific training data is available --as in the known-entity scenario-- it is more cost effective to use a simple Bag-of-Words classifier. When enough training data is available (around 700 tweets per entity), Bag-of-Words classifiers can be effectively used for the filtering task. Moreover, they can be used effectively in an active learning scenario, where the system updates its classification model with the stream of annotations and interactions with the system made by the reputation expert along the monitoring process. In this context, we found that by selecting the tweets to be labeled as those on which the classifier is less confident (margin sampling), the cost of creating a bulk training set can be reduced by 90% after inspecting 10% of test data. Unlike many other applications of active learning on Natural Language Processing tasks, margin sampling works better than random sampling.</p> <p>As for the topic detection problem, we considered two main strategies: the first is inspired on the notion of filter keywords and works by clustering terms as an intermediate step towards document clustering. The second --and most successful-- learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitter-based features; and then applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate of topic detection annotations in the RepLab 2013 dataset --to our knowledge, the largest dataset available for Online<br>Reputation Monitoring. A detailed<br>qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time).</p> <p>Along with our contribution to building a standard evaluation framework to study the Online Reputation Monitoring problem from a scientific perspective, we believe that the outcome of our research has practical implications and may help the development of semi-automatic tools to assist reputation experts in their daily work.</p> <p> </p
    corecore