74 research outputs found

    ChatGPT outperforms crowd workers for text-annotation tasks

    Full text link
    Many NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles (n = 6,183), we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, stance, topics, and frame detection. Across the four datasets, the zero-shot accuracy of ChatGPT exceeds that of crowd workers by about 25 percentage points on average, while ChatGPT’s intercoder agreement exceeds that of both crowd workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003—about thirty times cheaper than MTurk. These results demonstrate the potential of large language models to drastically increase the efficiency of text classification

    ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks

    Full text link
    Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we demonstrate that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection. Specifically, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers for four out of five tasks, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003 -- about twenty times cheaper than MTurk. These results show the potential of large language models to drastically increase the efficiency of text classification

    Conspiracy theories on Twitter: emerging motifs and temporal dynamics during the COVID-19 pandemic

    Get PDF
    The COVID-19 pandemic resulted in an upsurge in the spread of diverse conspiracy theories (CTs) with real-life impact. However, the dynamics of user engagement remain under-researched. In the present study, we leverage Twitter data across 11 months in 2020 from the timelines of 109 CT posters and a comparison group (non-CT group) of equal size. Within this approach, we used word embeddings to distinguish non-CT content from CT-related content as well as analysed which element of CT content emerged in the pandemic. Subsequently, we applied time series analyses on the aggregate and individual level to investigate whether there is a difference between CT posters and non-CT posters in non-CT tweets as well as the temporal dynamics of CT tweets. In this regard, we provide a description of the aggregate and individual series, conducted a STL decomposition in trends, seasons, and errors, as well as an autocorrelation analysis, and applied generalised additive mixed models to analyse nonlinear trends and their differences across users. The narrative motifs, characterised by word embeddings, address pandemic-specific motifs alongside broader motifs and can be related to several psychological needs (epistemic, existential, or social). Overall, the comparison of the CT group and non-CT group showed a substantially higher level of overall COVID-19-related tweets in the non-CT group and higher level of random fluctuations. Focussing on conspiracy tweets, we found a slight positive trend but, more importantly, an increase in users in 2020. Moreover, the aggregate series of CT content revealed two breaks in 2020 and a significant albeit weak positive trend since June. On the individual level, the series showed strong differences in temporal dynamics and a high degree of randomness and day-specific sensitivity. The results stress the importance of Twitter as a means of communication during the pandemic and illustrate that these beliefs travel very fast and are quickly endorsed

    Content Moderation As a Political Issue: The Twitter Discourse Around Trump's Ban

    Full text link
    Content moderation — the regulation of the material that users create and disseminate online — is an important activity for all social media platforms. While routine, this practice raises significant questions linked to democratic accountability and civil liberties. Following the decision of many platforms to ban Donald J. Trump in the aftermath of the attack on the U.S. Capitol in January 2021, content moderation has increasingly become a politically contested issue. This paper studies that process with a focus on the public discourse on Twitter. The analysis includes over 9 million tweets and retweets posted by over 3 million unique users between January 2020 and April 2021. First, the salience of content moderation was driven by left-leaning users, and "Section 230" was the most important topic across the ideological spectrum. Second, stance towards Section 230 was relatively volatile and increasingly polarized. These findings highlight relevant elements of the ongoing process of political contestation surrounding this issue, and provide a descriptive foundation to understand the politics of content moderation

    Mapping Groundwater Resource using Multispectral Sentinel 2 and Fuzzy Logic method, Case Study: Salafchegan, Qom, Iran

    Get PDF
    Groundwater is one of the essential freshwater sources for human consumption, with the highest reserves of fresh water on earth after glaciers and glaciers. Conservation and maintenance of groundwater quality in a large area require an overview of the status and potential of groundwater resources in that area, which can be applied to potential areas using remote sensing technology. In this study, after extracting the factors influencing the formation of groundwater aquifers from the Sentinel satellite image, appropriate information layers were prepared and integrated into the ArcGIS using different fuzzy operators and potential maps prepared with the location of groundwater wells. The area was validated. The results of combining slope layers, slope direction, lithology, drainage length density, lineament length density, lineament buffer, drainage buffer, and vegetation in the area showed that fuzzy multiplication and gamma operators could be used as suitable operators for Introducing information layers to identify groundwater potential in the area. Also, using the gamma numbers 0.1 gave better results than larger gamma numbers. The research results showed that 15.9% of the studied area has good and very good potential for the presence of underground water in the production map using the fuzzy gamma with gamma 0.1 method. Also, this map was validated by 70.1% of water wells in the region. The normalized ratio of accuracy to validity in the final production model with this method was estimated to be 54%, which is entirely acceptable compared to other methods

    Effects of quercetin on bisphenol A-induced mitochondrial toxicity in rat liver

    Get PDF
    Objective(s): Recognized as a distinguished environmental and global toxicant, Bisphenol A (BPA) affects the liver, which is a vital body organ, by the induction of oxidative stress. The present study was designed to investigate the protective effect of quercetin against BPA in hepatotoxicity in Wistar rats and also, the activity of mitochondrial enzymes were evaluated. Materials and Methods: To this end, 32 male Wistar rats were divided into four groups (six rats per group), including control, BPA (250 mg/kg), BPA + quercetin (75 mg/kg), and quercetin (75 mg/kg).Results: The BPA-induced alterations were restored in concentrations of alanine aminotransferase (ALT), alkaline phosphatase (ALP), lactate dehydrogenase (LDH), and aspartate aminotransferase (AST) due to the quercetin treatment (75 mg/kg) (all

    Population and fertility by age and sex for 195 countries and territories, 1950–2017: a systematic analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background Population estimates underpin demographic and epidemiological research and are used to track progress on numerous international indicators of health and development. To date, internationally available estimates of population and fertility, although useful, have not been produced with transparent and replicable methods and do not use standardised estimates of mortality. We present single-calendar year and single-year of age estimates of fertility and population by sex with standardised and replicable methods. Methods We estimated population in 195 locations by single year of age and single calendar year from 1950 to 2017 with standardised and replicable methods. We based the estimates on the demographic balancing equation, with inputs of fertility, mortality, population, and migration data. Fertility data came from 7817 location-years of vital registration data, 429 surveys reporting complete birth histories, and 977 surveys and censuses reporting summary birth histories. We estimated age-specific fertility rates (ASFRs; the annual number of livebirths to women of a specified age group per 1000 women in that age group) by use of spatiotemporal Gaussian process regression and used the ASFRs to estimate total fertility rates (TFRs; the average number of children a woman would bear if she survived through the end of the reproductive age span [age 10–54 years] and experienced at each age a particular set of ASFRs observed in the year of interest). Because of sparse data, fertility at ages 10–14 years and 50–54 years was estimated from data on fertility in women aged 15–19 years and 45–49 years, through use of linear regression. Age-specific mortality data came from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017 estimates. Data on population came from 1257 censuses and 761 population registry location-years and were adjusted for underenumeration and age misreporting with standard demographic methods. Migration was estimated with the GBD Bayesian demographic balancing model, after incorporating information about refugee migration into the model prior. Final population estimates used the cohort-component method of population projection, with inputs of fertility, mortality, and migration data. Population uncertainty was estimated by use of out-of-sample predictive validity testing. With these data, we estimated the trends in population by age and sex and in fertility by age between 1950 and 2017 in 195 countries and territories.Background Population estimates underpin demographic and epidemiological research and are used to track progress on numerous international indicators of health and development. To date, internationally available estimates of population and fertility, although useful, have not been produced with transparent and replicable methods and do not use standardised estimates of mortality. We present single-calendar year and single-year of age estimates of fertility and population by sex with standardised and replicable methods. Methods We estimated population in 195 locations by single year of age and single calendar year from 1950 to 2017 with standardised and replicable methods. We based the estimates on the demographic balancing equation, with inputs of fertility, mortality, population, and migration data. Fertility data came from 7817 location-years of vital registration data, 429 surveys reporting complete birth histories, and 977 surveys and censuses reporting summary birth histories. We estimated age-specific fertility rates (ASFRs; the annual number of livebirths to women of a specified age group per 1000 women in that age group) by use of spatiotemporal Gaussian process regression and used the ASFRs to estimate total fertility rates (TFRs; the average number of children a woman would bear if she survived through the end of the reproductive age span [age 10–54 years] and experienced at each age a particular set of ASFRs observed in the year of interest). Because of sparse data, fertility at ages 10–14 years and 50–54 years was estimated from data on fertility in women aged 15–19 years and 45–49 years, through use of linear regression. Age-specific mortality data came from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017 estimates. Data on population came from 1257 censuses and 761 population registry location-years and were adjusted for underenumeration and age misreporting with standard demographic methods. Migration was estimated with the GBD Bayesian demographic balancing model, after incorporating information about refugee migration into the model prior. Final population estimates used the cohort-component method of population projection, with inputs of fertility, mortality, and migration data. Population uncertainty was estimated by use of out-of-sample predictive validity testing. With these data, we estimated the trends in population by age and sex and in fertility by age between 1950 and 2017 in 195 countries and territories

    Global, regional, and national age-sex-specific mortality and life expectancy, 1950–2017: a systematic analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background Assessments of age-specific mortality and life expectancy have been done by the UN Population Division, Department of Economics and Social Affairs (UNPOP), the United States Census Bureau, WHO, and as part of previous iterations of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD). Previous iterations of the GBD used population estimates from UNPOP, which were not derived in a way that was internally consistent with the estimates of the numbers of deaths in the GBD. The present iteration of the GBD, GBD 2017, improves on previous assessments and provides timely estimates of the mortality experience of populations globally. Methods The GBD uses all available data to produce estimates of mortality rates between 1950 and 2017 for 23 age groups, both sexes, and 918 locations, including 195 countries and territories and subnational locations for 16 countries. Data used include vital registration systems, sample registration systems, household surveys (complete birth histories, summary birth histories, sibling histories), censuses (summary birth histories, household deaths), and Demographic Surveillance Sites. In total, this analysis used 8259 data sources. Estimates of the probability of death between birth and the age of 5 years and between ages 15 and 60 years are generated and then input into a model life table system to produce complete life tables for all locations and years. Fatal discontinuities and mortality due to HIV/AIDS are analysed separately and then incorporated into the estimation. We analyse the relationship between age-specific mortality and development status using the Socio-demographic Index, a composite measure based on fertility under the age of 25 years, education, and income. There are four main methodological improvements in GBD 2017 compared with GBD 2016: 622 additional data sources have been incorporated; new estimates of population, generated by the GBD study, are used; statistical methods used in different components of the analysis have been further standardised and improved; and the analysis has been extended backwards in time by two decades to start in 1950.Background Assessments of age-specific mortality and life expectancy have been done by the UN Population Division, Department of Economics and Social Affairs (UNPOP), the United States Census Bureau, WHO, and as part of previous iterations of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD). Previous iterations of the GBD used population estimates from UNPOP, which were not derived in a way that was internally consistent with the estimates of the numbers of deaths in the GBD. The present iteration of the GBD, GBD 2017, improves on previous assessments and provides timely estimates of the mortality experience of populations globally. Methods The GBD uses all available data to produce estimates of mortality rates between 1950 and 2017 for 23 age groups, both sexes, and 918 locations, including 195 countries and territories and subnational locations for 16 countries. Data used include vital registration systems, sample registration systems, household surveys (complete birth histories, summary birth histories, sibling histories), censuses (summary birth histories, household deaths), and Demographic Surveillance Sites. In total, this analysis used 8259 data sources. Estimates of the probability of death between birth and the age of 5 years and between ages 15 and 60 years are generated and then input into a model life table system to produce complete life tables for all locations and years. Fatal discontinuities and mortality due to HIV/AIDS are analysed separately and then incorporated into the estimation. We analyse the relationship between age-specific mortality and development status using the Socio-demographic Index, a composite measure based on fertility under the age of 25 years, education, and income. There are four main methodological improvements in GBD 2017 compared with GBD 2016: 622 additional data sources have been incorporated; new estimates of population, generated by the GBD study, are used; statistical methods used in different components of the analysis have been further standardised and improved; and the analysis has been extended backwards in time by two decades to start in 1950
    • …
    corecore