7 research outputs found

    Tokenizer Choice For LLM Training: Negligible or Crucial?

    Full text link
    The recent success of LLMs has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot. Shedding light on this underexplored area, we conduct a comprehensive study on the influence of tokenizer choice on LLM downstream performance by training 24 mono- and multilingual LLMs at a 2.6B parameter scale, ablating different tokenizer algorithms and parameterizations. Our studies highlight that the tokenizer choice can significantly impact the model's downstream performance, training and inference costs. In particular, we find that the common tokenizer evaluation metrics fertility and parity are not always predictive of model downstream performance, rendering these metrics a questionable proxy for the model's downstream performance. Furthermore, we show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English. While English-only tokenizers have been applied to the training of multi-lingual LLMs, we find that this approach results in a severe downstream performance degradation and additional training costs of up to 68%, due to an inefficient tokenization vocabulary

    The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990-2017: A systematic analysis for the global burden of disease study 2017

    Get PDF
    © 2020 The Author(s). Background Oesophageal cancer is a common and often fatal cancer that has two main histological subtypes: oesophageal squamous cell carcinoma and oesophageal adenocarcinoma. Updated statistics on the incidence and mortality of oesophageal cancer, and on the disability-adjusted life-years (DALYs) caused by the disease, can assist policy makers in allocating resources for prevention, treatment, and care of oesophageal cancer. We report the latest estimates of these statistics for 195 countries and territories between 1990 and 2017, by age, sex, and Socio-demographic Index (SDI), using data from the Global Burden of Diseases, Injuries, and Risk Factors Study 2017 (GBD). Methods We used data from vital registration systems, vital registration-samples, verbal autopsy records, and cancer registries, combined with relevant modelling, to estimate the mortality, incidence, and burden of oesophageal cancer from 1990 to 2017. Mortality-to-incidence ratios (MIRs) were estimated and fed into a Cause of Death Ensemble model (CODEm) including risk factors. MIRs were used for mortality and non-fatal modelling. Estimates of DALYs attributable to the main risk factors of oesophageal cancer available in GBD were also calculated. The proportion of oesophageal squamous cell carcinoma to all oesophageal cancers was extracted by use of publicly available data, and its variation was examined against SDI, the Healthcare Access and Quality (HAQ) Index, and available risk factors in GBD that are specific for oesophageal squamous cell carcinoma (eg, unimproved water source and indoor air pollution) and for oesophageal adenocarcinoma (gastro-oesophageal reflux disease). Findings There were 473 000 (95% uncertainty interval [95% UI] 459 000-485 000) new cases of oesophageal cancer and 436 000 (425 000-448 000) deaths due to oesophageal cancer in 2017. Age-standardised incidence was 5.9 (5.7-6.1) per 100 000 population and age-standardised mortality was 5.5 (5.3-5.6) per 100 000. Oesophageal cancer caused 9.78 million (9.53-10.03) DALYs, with an age-standardised rate of 120 (117-123) per 100 000 population. Between 1990 and 2017, age-standardised incidence decreased by 22.0% (18.6-25.2), mortality decreased by 29.0% (25.8-32.0), and DALYs decreased by 33.4% (30.4-36.1) globally. However, as a result of population growth and ageing, the total number of new cases increased by 52.3% (45.9-58.9), from 310 000 (300 000-322 000) to 473 000 (459 000-485 000); the number of deaths increased by 40.0% (34.1-46.3), from 311 000 (301 000-323 000) to 436 000 (425 000-448 000); and total DALYs increased by 27.4% (22.1-33.1), from 7.68 million (7.42-7.97) to 9.78 million (9.53-10.03). At the national level, China had the highest number of incident cases (235 000 [223 000-246 000]), deaths (213 000 [203 000-223 000]), and DALYs (4.46 million [4.25-4.69]) in 2017. The highest national-level agestandardised incidence rates in 2017 were observed in Malawi (23.0 [19.4-26.5] per 100 000 population) and Mongolia (18.5 [16.4-20.8] per 100 000). In 2017, age-standardised incidence was 2.7 times higher, mortality 2.9 times higher, and DALYs 3.0 times higher in males than in females. In 2017, a substantial proportion of oesophageal cancer DALYs were attributable to known risk factors: tobacco smoking (39.0% [35.5-42.2]), alcohol consumption (33.8% [27.3-39.9]), high BMI (19.5% [6.3-36.0]), a diet low in fruits (19.1% [4.2-34.6]), and use of chewing tobacco (7.5% [5.2-9.6]). Countries with a low SDI and HAQ Index and high levels of indoor air pollution had a higher proportion of oesophageal squamous cell carcinoma to all oesophageal cancer cases than did countries with a high SDI and HAQ Index and with low levels of indoor air pollution. Interpretation Despite reductions in age-standardised incidence and mortality rates, oesophageal cancer remains a major cause of cancer mortality and burden across the world. Oesophageal cancer is a highly fatal disease, requiring increased primary prevention efforts and, possibly, screening in some high-risk areas. Substantial variation exists in age-standardised incidence rates across regions and countries, for reasons that are unclear

    Understanding Burnout in Indian Housewives Amidst COVID-19 Pandemic

    No full text
    COVID-19 Pandemic has brought the world underwaters. All over the world, people were affected. The focus during this period was mostly on patients and frontline workers, with some attention also towards working adults. One cohort that has not gained much light during this pandemic is of housewives. Housewives had to manage household chores along with managing family relations – especially in India, where societal expectations lie on the female to provide family members with care and manage the household. Dealing with uncertainty, decreased availability of personal space, increased presence of and interaction with people in the household due to work from home scenarios, shifting to the online world and adapting to the change, economic disturbances, absence of domestic help, managing parental responsibility, increased stress about one’s own and family members’ health and lack of social interaction have contributed to their inconvenience. Existing evidence supports that housewives have been experiencing burnout in their homes. This qualitative study was conducted to see how the added pressure of COVID – 19 and social isolation has affected housewives mentally, leading to burnout. This narrative study includes participants of Indian origin, between the ages of 34 to 50 years. Participants were shortlisted on the basis of their scores obtained on the COVID-19 Burnout Scale, designed by Murat Yıldırım and Fatma Solmaz. The themes generated through this research study are related to understanding the impact of burnout on the mental health of housewives along the areas of physical health, financial well-being, digitization, uncertainty regarding COVID-19, parental responsibilities, social & emotional health, relationship management, and coping mechanisms. The findings of this study suggest that the mental health of housewives has significantly worsened during the COVID-19 pandemic due to constant exposure to certain stressors

    Covid-19: A pandemic here to stay!

    No full text
    Since December 2019, SARS-CoV-2 has spread to more than 200 countries and has become a global pandemic. There have been more than 49 million confirmed cases of Covid-19 as of 1st of November, 2020 with over 1.2 million case fatalities all over the world. The current review paper gives an update on the epidemiology, investigations modalities and treatment options including the various current treatment protocols, vaccines in development and experimental drugs in research

    The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990-2017 : a systematic analysis for the Global Burden of Disease Study 2017

    No full text
    corecore