24 research outputs found
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection
Large language models (LLMs) have demonstrated remarkable capability to
generate fluent responses to a wide variety of user queries, but this has also
resulted in concerns regarding the potential misuse of such texts in
journalism, educational, and academic context. In this work, we aim to develop
automatic systems to identify machine-generated text and to detect potential
misuse. We first introduce a large-scale benchmark M4, which is
multi-generator, multi-domain, and multi-lingual corpus for machine-generated
text detection. Using the dataset, we experiment with a number of methods and
we show that it is challenging for detectors to generalize well on unseen
examples if they are either from different domains or are generated by
different large language models. In such cases, detectors tend to misclassify
machine-generated text as human-written. These results show that the problem is
far from solved and there is a lot of room for improvement. We believe that our
dataset M4, which covers different generators, domains and languages, will
enable future research towards more robust approaches for this pressing
societal problem. The M4 dataset is available at
https://github.com/mbzuai-nlp/M4.Comment: 11 page
On smart gaze based annotation of histopathology images for training of deep convolutional neural networks
Unavailability of large training datasets is a bottleneck that needs to be overcome to realize the true potential of deep learning in histopathology applications. Although slide digitization via whole slide imaging scanners has increased the speed of data acquisition, labeling of virtual slides requires a substantial time investment from pathologists. Eye gaze annotations have the potential to speed up the slide labeling process. This work explores the viability and timing comparisons of eye gaze labeling compared to conventional manual labeling for training object detectors. Challenges associated with gaze based labeling and methods to refine the coarse data annotations for subsequent object detection are also discussed. Results demonstrate that gaze tracking based labeling can save valuable pathologist time and delivers good performance when employed for training a deep object detector. Using the task of localization of Keratin Pearls in cases of oral squamous cell carcinoma as a test case, we compare the performance gap between deep object detectors trained using hand-labelled and gaze-labelled data. On average, compared to 'Bounding-box' based hand-labeling, gaze-labeling required 57.6% less time per label and compared to 'Freehand' labeling, gaze-labeling required on average 85% less time per label
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric
foundation and instruction-tuned open generative large language models (LLMs).
The models are based on the GPT-3 decoder-only architecture and are pretrained
on a mixture of Arabic and English texts, including source code in various
programming languages. With 13 billion parameters, they demonstrate better
knowledge and reasoning capabilities in Arabic than any existing open Arabic
and multilingual models by a sizable margin, based on extensive evaluation.
Moreover, the models are competitive in English compared to English-centric
open models of similar size, despite being trained on much less English data.
We provide a detailed description of the training, the tuning, the safety
alignment, and the evaluation of the models. We release two open versions of
the model -- the foundation Jais model, and an instruction-tuned Jais-chat
variant -- with the aim of promoting research on Arabic LLMs. Available at
https://huggingface.co/inception-mbzuai/jais-13b-chatComment: Arabic-centric, foundation model, large-language model, LLM,
generative model, instruction-tuned, Jais, Jais-cha
Effects of hospital facilities on patient outcomes after cancer surgery: an international, prospective, observational study
Background Early death after cancer surgery is higher in low-income and middle-income countries (LMICs) compared with in high-income countries, yet the impact of facility characteristics on early postoperative outcomes is unknown. The aim of this study was to examine the association between hospital infrastructure, resource availability, and processes on early outcomes after cancer surgery worldwide.Methods A multimethods analysis was performed as part of the GlobalSurg 3 study-a multicentre, international, prospective cohort study of patients who had surgery for breast, colorectal, or gastric cancer. The primary outcomes were 30-day mortality and 30-day major complication rates. Potentially beneficial hospital facilities were identified by variable selection to select those associated with 30-day mortality. Adjusted outcomes were determined using generalised estimating equations to account for patient characteristics and country-income group, with population stratification by hospital.Findings Between April 1, 2018, and April 23, 2019, facility-level data were collected for 9685 patients across 238 hospitals in 66 countries (91 hospitals in 20 high-income countries; 57 hospitals in 19 upper-middle-income countries; and 90 hospitals in 27 low-income to lower-middle-income countries). The availability of five hospital facilities was inversely associated with mortality: ultrasound, CT scanner, critical care unit, opioid analgesia, and oncologist. After adjustment for case-mix and country income group, hospitals with three or fewer of these facilities (62 hospitals, 1294 patients) had higher mortality compared with those with four or five (adjusted odds ratio [OR] 3.85 [95% CI 2.58-5.75]; p<0.0001), with excess mortality predominantly explained by a limited capacity to rescue following the development of major complications (63.0% vs 82.7%; OR 0.35 [0.23-0.53]; p<0.0001). Across LMICs, improvements in hospital facilities would prevent one to three deaths for every 100 patients undergoing surgery for cancer.Interpretation Hospitals with higher levels of infrastructure and resources have better outcomes after cancer surgery, independent of country income. Without urgent strengthening of hospital infrastructure and resources, the reductions in cancer-associated mortality associated with improved access will not be realised
Mapping age- and sex-specific HIV prevalence in adults in sub-Saharan Africa, 2000-2018
BACKGROUND: Human immunodeficiency virus and acquired immune deficiency syndrome (HIV/AIDS) is still among the leading causes of disease burden and mortality in sub-Saharan Africa (SSA), and the world is not on track to meet targets set for ending the epidemic by the Joint United Nations Programme on HIV/AIDS (UNAIDS) and the United Nations Sustainable Development Goals (SDGs). Precise HIV burden information is critical for effective geographic and epidemiological targeting of prevention and treatment interventions. Age- and sex-specific HIV prevalence estimates are widely available at the national level, and region-wide local estimates were recently published for adults overall. We add further dimensionality to previous analyses by estimating HIV prevalence at local scales, stratified into sex-specific 5-year age groups for adults ages 15-59 years across SSA. METHODS: We analyzed data from 91 seroprevalence surveys and sentinel surveillance among antenatal care clinic (ANC) attendees using model-based geostatistical methods to produce estimates of HIV prevalence across 43 countries in SSA, from years 2000 to 2018, at a 5 × 5-km resolution and presented among second administrative level (typically districts or counties) units. RESULTS: We found substantial variation in HIV prevalence across localities, ages, and sexes that have been masked in earlier analyses. Within-country variation in prevalence in 2018 was a median 3.5 times greater across ages and sexes, compared to for all adults combined. We note large within-district prevalence differences between age groups: for men, 50% of districts displayed at least a 14-fold difference between age groups with the highest and lowest prevalence, and at least a 9-fold difference for women. Prevalence trends also varied over time; between 2000 and 2018, 70% of all districts saw a reduction in prevalence greater than five percentage points in at least one sex and age group. Meanwhile, over 30% of all districts saw at least a five percentage point prevalence increase in one or more sex and age group. CONCLUSIONS: As the HIV epidemic persists and evolves in SSA, geographic and demographic shifts in prevention and treatment efforts are necessary. These estimates offer epidemiologically informative detail to better guide more targeted interventions, vital for combating HIV in SSA
Mapping age- and sex-specific HIV prevalence in adults in sub-Saharan Africa, 2000–2018
Background: Human immunodeficiency virus and acquired immune deficiency syndrome (HIV/AIDS) is still among the leading causes of disease burden and mortality in sub-Saharan Africa (SSA), and the world is not on track to meet targets set for ending the epidemic by the Joint United Nations Programme on HIV/AIDS (UNAIDS) and the United Nations Sustainable Development Goals (SDGs). Precise HIV burden information is critical for effective geographic and epidemiological targeting of prevention and treatment interventions. Age- and sex-specific HIV prevalence estimates are widely available at the national level, and region-wide local estimates were recently published for adults overall. We add further dimensionality to previous analyses by estimating HIV prevalence at local scales, stratified into sex-specific 5-year age groups for adults ages 15–59 years across SSA. Methods: We analyzed data from 91 seroprevalence surveys and sentinel surveillance among antenatal care clinic (ANC) attendees using model-based geostatistical methods to produce estimates of HIV prevalence across 43 countries in SSA, from years 2000 to 2018, at a 5 × 5-km resolution and presented among second administrative level (typically districts or counties) units. Results: We found substantial variation in HIV prevalence across localities, ages, and sexes that have been masked in earlier analyses. Within-country variation in prevalence in 2018 was a median 3.5 times greater across ages and sexes, compared to for all adults combined. We note large within-district prevalence differences between age groups: for men, 50% of districts displayed at least a 14-fold difference between age groups with the highest and lowest prevalence, and at least a 9-fold difference for women. Prevalence trends also varied over time; between 2000 and 2018, 70% of all districts saw a reduction in prevalence greater than five percentage points in at least one sex and age group. Meanwhile, over 30% of all districts saw at least a five percentage point prevalence increase in one or more sex and age group. Conclusions: As the HIV epidemic persists and evolves in SSA, geographic and demographic shifts in prevention and treatment efforts are necessary. These estimates offer epidemiologically informative detail to better guide more targeted interventions, vital for combating HIV in SSA
Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries
Abstract
Background
Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres.
Methods
This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries.
Results
In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia.
Conclusion
This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries
Recommended from our members
Global burden of 288 causes of death and life expectancy decomposition in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021
BACKGROUND Regular, detailed reporting on population health by underlying cause of death is fundamental for public health decision making. Cause-specific estimates of mortality and the subsequent effects on life expectancy worldwide are valuable metrics to gauge progress in reducing mortality rates. These estimates are particularly important following large-scale mortality spikes, such as the COVID-19 pandemic. When systematically analysed, mortality rates and life expectancy allow comparisons of the consequences of causes of death globally and over time, providing a nuanced understanding of the effect of these causes on global populations. METHODS The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 cause-of-death analysis estimated mortality and years of life lost (YLLs) from 288 causes of death by age-sex-location-year in 204 countries and territories and 811 subnational locations for each year from 1990 until 2021. The analysis used 56 604 data sources, including data from vital registration and verbal autopsy as well as surveys, censuses, surveillance systems, and cancer registries, among others. As with previous GBD rounds, cause-specific death rates for most causes were estimated using the Cause of Death Ensemble model-a modelling tool developed for GBD to assess the out-of-sample predictive validity of different statistical models and covariate permutations and combine those results to produce cause-specific mortality estimates-with alternative strategies adapted to model causes with insufficient data, substantial changes in reporting over the study period, or unusual epidemiology. YLLs were computed as the product of the number of deaths for each cause-age-sex-location-year and the standard life expectancy at each age. As part of the modelling process, uncertainty intervals (UIs) were generated using the 2·5th and 97·5th percentiles from a 1000-draw distribution for each metric. We decomposed life expectancy by cause of death, location, and year to show cause-specific effects on life expectancy from 1990 to 2021. We also used the coefficient of variation and the fraction of population affected by 90% of deaths to highlight concentrations of mortality. Findings are reported in counts and age-standardised rates. Methodological improvements for cause-of-death estimates in GBD 2021 include the expansion of under-5-years age group to include four new age groups, enhanced methods to account for stochastic variation of sparse data, and the inclusion of COVID-19 and other pandemic-related mortality-which includes excess mortality associated with the pandemic, excluding COVID-19, lower respiratory infections, measles, malaria, and pertussis. For this analysis, 199 new country-years of vital registration cause-of-death data, 5 country-years of surveillance data, 21 country-years of verbal autopsy data, and 94 country-years of other data types were added to those used in previous GBD rounds. FINDINGS The leading causes of age-standardised deaths globally were the same in 2019 as they were in 1990; in descending order, these were, ischaemic heart disease, stroke, chronic obstructive pulmonary disease, and lower respiratory infections. In 2021, however, COVID-19 replaced stroke as the second-leading age-standardised cause of death, with 94·0 deaths (95% UI 89·2-100·0) per 100 000 population. The COVID-19 pandemic shifted the rankings of the leading five causes, lowering stroke to the third-leading and chronic obstructive pulmonary disease to the fourth-leading position. In 2021, the highest age-standardised death rates from COVID-19 occurred in sub-Saharan Africa (271·0 deaths [250·1-290·7] per 100 000 population) and Latin America and the Caribbean (195·4 deaths [182·1-211·4] per 100 000 population). The lowest age-standardised death rates from COVID-19 were in the high-income super-region (48·1 deaths [47·4-48·8] per 100 000 population) and southeast Asia, east Asia, and Oceania (23·2 deaths [16·3-37·2] per 100 000 population). Globally, life expectancy steadily improved between 1990 and 2019 for 18 of the 22 investigated causes. Decomposition of global and regional life expectancy showed the positive effect that reductions in deaths from enteric infections, lower respiratory infections, stroke, and neonatal deaths, among others have contributed to improved survival over the study period. However, a net reduction of 1·6 years occurred in global life expectancy between 2019 and 2021, primarily due to increased death rates from COVID-19 and other pandemic-related mortality. Life expectancy was highly variable between super-regions over the study period, with southeast Asia, east Asia, and Oceania gaining 8·3 years (6·7-9·9) overall, while having the smallest reduction in life expectancy due to COVID-19 (0·4 years). The largest reduction in life expectancy due to COVID-19 occurred in Latin America and the Caribbean (3·6 years). Additionally, 53 of the 288 causes of death were highly concentrated in locations with less than 50% of the global population as of 2021, and these causes of death became progressively more concentrated since 1990, when only 44 causes showed this pattern. The concentration phenomenon is discussed heuristically with respect to enteric and lower respiratory infections, malaria, HIV/AIDS, neonatal disorders, tuberculosis, and measles. INTERPRETATION Long-standing gains in life expectancy and reductions in many of the leading causes of death have been disrupted by the COVID-19 pandemic, the adverse effects of which were spread unevenly among populations. Despite the pandemic, there has been continued progress in combatting several notable causes of death, leading to improved global life expectancy over the study period. Each of the seven GBD super-regions showed an overall improvement from 1990 and 2021, obscuring the negative effect in the years of the pandemic. Additionally, our findings regarding regional variation in causes of death driving increases in life expectancy hold clear policy utility. Analyses of shifting mortality trends reveal that several causes, once widespread globally, are now increasingly concentrated geographically. These changes in mortality concentration, alongside further investigation of changing risks, interventions, and relevant policy, present an important opportunity to deepen our understanding of mortality-reduction strategies. Examining patterns in mortality concentration might reveal areas where successful public health interventions have been implemented. Translating these successes to locations where certain causes of death remain entrenched can inform policies that work to improve life expectancy for people everywhere. FUNDING Bill & Melinda Gates Foundation
Mortality of emergency abdominal surgery in high-, middle- and low-income countries
Background: Surgical mortality data are collected routinely in high-income countries, yet virtually no low- or middle-income countries have outcome surveillance in place. The aim was prospectively to collect worldwide mortality data following emergency abdominal surgery, comparing findings across countries with a low, middle or high Human Development Index (HDI).
Methods: This was a prospective, multicentre, cohort study. Self-selected hospitals performing emergency surgery submitted prespecified data for consecutive patients from at least one 2-week interval during July to December 2014. Postoperative mortality was analysed by hierarchical multivariable logistic regression.
Results: Data were obtained for 10 745 patients from 357 centres in 58 countries; 6538 were from high-, 2889 from middle- and 1318 from low-HDI settings. The overall mortality rate was 1⋅6 per cent at 24 h (high 1⋅1 per cent, middle 1⋅9 per cent, low 3⋅4 per cent; P < 0⋅001), increasing to 5⋅4 per cent by 30 days (high 4⋅5 per cent, middle 6⋅0 per cent, low 8⋅6 per cent; P < 0⋅001). Of the 578 patients who died, 404 (69⋅9 per cent) did so between 24 h and 30 days following surgery (high 74⋅2 per cent, middle 68⋅8 per cent, low 60⋅5 per cent). After adjustment, 30-day mortality remained higher in middle-income (odds ratio (OR) 2⋅78, 95 per cent c.i. 1⋅84 to 4⋅20) and low-income (OR 2⋅97, 1⋅84 to 4⋅81) countries. Surgical safety checklist use was less frequent in low- and middle-income countries, but when used was associated with reduced mortality at 30 days.
Conclusion: Mortality is three times higher in low- compared with high-HDI countries even when adjusted for prognostic factors. Patient safety factors may have an important role. Registration number: NCT02179112 (http://www.clinicaltrials.gov)
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific fields, and maintaining trust in communication. In this work, we address this problem by introducing a new benchmark based on a multilingual, multi-domain and multi-generator corpus of MGTs — M4GT-Bench. The benchmark is compiled of three tasks: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detection where one need to identify, which particular model generated the text; and (3) mixed human-machine text detection, where a word boundary delimiting MGT from human-written content should be determined. On the developed benchmark, we have tested several MGT detection baselines and also conducted an evaluation of human performance. We see that obtaining good performance in MGT detection usually requires an access to the training data from the same domain and generators. The benchmark is available at https://github.com/mbzuai-nlp/M4GT-Bench