10 research outputs found

    When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections

    Get PDF
    This is the peer reviewed version of the following article: David E. Losada, Javier Parapar and Alvaro Barreiro (2019) When to Stop Making Relevance Judgments? A Study of Stopping Methods for Building Information Retrieval Test Collections. Journal of the Association for Information Science and Technology, 70 (1), 49-60, which has been published in final form at https://doi.org/10.1002/asi.24077. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived VersionsIn information retrieval evaluation, pooling is a well‐known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very highThis work received financial support from the (i) “Ministerio de Economía y Competitividad” of the Government of Spain and FEDER Funds under the researchproject TIN2015-64282-R, (ii) Xunta de Galicia (project GPC 2016/035), and (iii) Xunta de Galicia “Consellería deCultura, Educación e Ordenación Universitaria” and theEuropean Regional Development Fund (ERDF) throughthe following 2016–2019 accreditations: ED431G/01(“Centro singular de investigación de Galicia”) andED431G/08S

    Real-time focused extraction of social media users

    Get PDF
    In this paper, we explore a real-time automation challenge: the problem of focused extraction of Social Media users. This challenge can be seen as a special form of focused crawling where the main target is to detect users with certain patterns. Given a specific user profile, the task consists of rapidly ingesting Social Media data and early detecting target users. This is a real-time intelligent automation task that has numerous applications in domains such as safety, health or marketing. The volume and dynamics of Social Media contents demand efficient real-time solutions able to predict which users are worth to explore. To meet this aim, we propose and evaluate several methods that effectively allow us to harvest relevant users. Even with little contextual information (e.g., a single user submission), our methods quickly focus on the most promising users. We also developed a distributed microservice architecture that supports real-time parallel extraction of Social Media users. This modular architecture scales up in clusters of computers and it can be easily adapted for user extraction in multiple domains and Social Media sources. Our experiments suggest that some of the proposed prioritisation methods, which work with minimal user context, are effective at rapidly focusing on the most relevant users. These methods perform satisfactorily with huge volumes of users and interactions and lead to harvest ratios 2 to 9 times higher than those achieved by random prioritisationThis work was supported in part by the Ministerio de Ciencia e Innovación (MICINN) under Grant RTI2018-093336-B-C21 and Grant PLEC2021-007662; in part by Xunta de Galicia under Grant ED431G/08, Grant ED431G-2019/04, Grant ED431C 2018/19, and Grant ED431F 2020/08; and in part by the European Regional Development Fund (ERDF)S

    A Big Data Platform for Real Time Analysis of Signs of Depression in Social Media

    Get PDF
    In this paper we propose a scalable platform for real-time processing of Social Media data. The platform ingests huge amounts of contents, such as Social Media posts or comments, and can support Public Health surveillance tasks. The processing and analytical needs of multiple screening tasks can easily be handled by incorporating user-defined execution graphs. The design is modular and supports different processing elements, such as crawlers to extract relevant contents or classifiers to categorise Social Media. We describe here an implementation of a use case built on the platform that monitors Social Media users and detects early signs of depressionThis work was funded by FEDER/Ministerio de Ciencia, Innovación y Universidades—Agencia Estatal de Investigación/ Project (RTI2018-093336-B-C21). Our research also receives financial support from the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/04, ED431C 2018/29, ED431C 2018/19) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University SystemS

    Methodology for integration of fisher's ecological knowledge in fisheries biology and management using knowledge representation (artificial intelligence)

    Get PDF
    Presentado na International Conference "Putting Fisher's Knowledge to Work", Vancouver, Canadá, 27-30 agosto de 2001[abstract] The fisheries crisis of the last decades and the overexploitation of a great number of stocks (FAO 1995) have been due mainly to the inadequacy of scientific knowledge, uncertainties in assessments and/or failures of the management systems. These problems are critical when the management of coastal ecosystems and artisanal fisheries is involved. These systems possess great complexity due to the high number of human factors that influence their functioning and the fishing activity. Small-scale coastal fisheries have a much greater social significance than offshore industrial fisheries, despite the larger economical importance of the latter (only in macro-economic terms). The artisanal coastal fisheries in Galicia (NW Spain) are in a general state of overexploitation derived from the mismatch between management (derived implicitly from models designed for industrial finfisheries) and the biological and socioeconomic context. Freire and García-Allut (2000) proposed a new management policy (based on the establishment of territorial users’ rights, the involvement of fishers in the assessment and management process in collaboration with the government agencies, and the use of protected areas and minimum landing sizes as key regulations) to solve the above problems. As well as a new management system, research should pay special attention to the design and use of inexpensive and rapid methodologies to get relevant scientific data, and introduce local or traditional ecological knowledge of the fishers to the assessment and management process. In this paper, we analyze the values and characteristics of fishers’ ecological knowledge (FEK). Using the artisanal coastal fisheries of Galicia as a case study, we present the objectives of the integration of FEK in fisheries biology and management and propose a methodology for that goal. The use of Artificial Intelligence (AI) as a tool for the analysis and integration of FEK is discussed, and the role of Knowledge Representation, a branch of AI, is described to show the epistemological and technological adequacy of the chosen languages and tools in a non-computer science foru

    Using description logics to integrate fishers' ecological knowledge in the research of artisanal fisheries

    Get PDF
    [Abstract] The aim of this paper is to show the role that Knowledge Representation can play in the research of artisanal fisheries. In particular we concentrate on the epistemological and technological adequacy of implementations of Description Logics to represent fishers’ ecological knowledge, so contributing to address some open methodological questions about its collection and us

    Personality trait analysis during the COVID-19 pandemic: a comparative study on social media

    Get PDF
    The COVID-19 pandemic, a global contagion of coronavirus infection caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), has triggered severe social and economic disruption around the world and provoked changes in people’s behavior. Given the extreme societal impact of COVID-19, it becomes crucial to understand the emotional response of the people and the impact of COVID-19 on personality traits and psychological dimensions. In this study, we contribute to this goal by thoroughly analyzing the evolution of personality and psychological aspects in a large-scale collection of tweets extracted during the COVID-19 pandemic. The objectives of this research are: i) to provide evidence that helps to understand the estimated impact of the pandemic on people’s temperament, ii) to find associations and trends between specific events (e.g., stages of harsh confinement) and people’s reactions, and iii) to study the evolution of multiple personality aspects, such as the degree of introversion or the level of neuroticism. We also examine the development of emotions, as a natural complement to the automatic analysis of the personality dimensions. To achieve our goals, we have created two large collections of tweets (geotagged in the United States and Spain, respectively), collected during the pandemic. Our work reveals interesting trends in personality dimensions, emotions, and events. For example, during the pandemic period, we found increasing traces of introversion and neuroticism. Another interesting insight from our study is that the most frequent signs of personality disorders are those related to depression, schizophrenia, and narcissism. We also found some peaks of negative/positive emotions related to specific eventsOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. The authors thank the support obtained from: i) project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next GenerationEU), ii) project PID2022-137061OB-C22 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Proyectos de Generación de Conocimiento; suppported by the European Regional Development Fund) and iii) Consellería de Educación, Universidade e Formación Profesional (accreditation 2019-2022 ED431G-2019/04, ED431C 2022/19) and the European Regional Development Fund, which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University SystemS

    An unsupervised perplexity-based method for boilerplate removal

    Get PDF
    The availability of large web-based corpora has led to significant advances in a wide range of technologies, including massive retrieval systems or deep neural networks. However, leveraging this data is challenging, since web content is plagued by the so-called boilerplate: ads, incomplete or noisy text and rests of the navigation structure, such as menus or navigation bars. In this work, we present a novel and efficient approach to extract useful and well-formed content from web-scraped data. Our approach takes advantage of Language Models and their implicit knowledge about correctly formed text, and we demonstrate here that perplexity is a valuable artefact that can contribute in terms of effectiveness and efficiency. As a matter of fact, the removal of noisy parts leads to lighter AI or search solutions that are effective and entail important reductions in resources spent. We exemplify here the usefulness of our method with two downstream tasks, search and classification, and a cleaning task. We also provide a Python package with pre-trained models and a web demo demonstrating the capabilities of our approachS

    TWebS: an application of terminological logics in web searching

    No full text
    corecore