65 research outputs found

    What's In My Big Data?

    Full text link
    Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose What's In My Big Data? (WIMBD), a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corpora. WIMBD builds on two basic capabilities -- count and search -- at scale, which allows us to analyze more than 35 terabytes on a standard compute node. We apply WIMBD to ten different corpora used to train popular language models, including C4, The Pile, and RedPajama. Our analysis uncovers several surprising and previously undocumented findings about these corpora, including the high prevalence of duplicate, synthetic, and low-quality content, personally identifiable information, toxic language, and benchmark contamination. For instance, we find that about 50% of the documents in RedPajama and LAION-2B-en are duplicates. In addition, several datasets used for benchmarking models trained on such corpora are contaminated with respect to important benchmarks, including the Winograd Schema Challenge and parts of GLUE and SuperGLUE. We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them: github.com/allenai/wimbd

    Findings of the WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages

    Get PDF
    We present the results of the WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages. The shared task included both a data and a systems track, along with additional innovations, such as a focus on African languages and extensive human evaluation of submitted systems. We received 14 system submissions from 8 teams, as well as 6 data track contributions. We report a large progress in the quality of translation for African languages since the last iteration of this shared task: there is an increase of about 7.5 BLEU points across 72 language pairs, and the average BLEU scores went from 15.09 to 22.60

    Mentoring at the University of Pennsylvania: Results of a Faculty Survey

    Get PDF
    BACKGROUND: Research suggests mentoring is related to career satisfaction and success. Most studies have focused on junior faculty. OBJECTIVE: To explore multiple aspects of mentoring at an academic medical center in relation to faculty rank, track, and gender. DESIGN: Cross-sectional mail survey in mid-2003. PARTICIPANTS: Faculty members, 1,432, at the University of Pennsylvania School of Medicine MEASUREMENTS: Self-administered survey developed from existing instruments and stakeholders. RESULTS: Response rate was 73% (n = 1,046). Most (92%) assistant and half (48%) of associate professors had a mentor. Assistant professors in the tenure track were most likely to have a mentor (98%). At both ranks, the faculty was given more types of advice than types of opportunities. Satisfaction with mentoring was correlated with the number of types of mentoring received (r = .48 and .53, P < .0001), job satisfaction (r = .44 and .31, P < .0001), meeting frequency (r = .53 and .61, P < .0001), and expectation of leaving the University within 5 years (Spearman r = −.19 and −.18, P < .0001), at the assistant and associate rank, respectively. Significant predictors of higher overall job satisfaction were associate rank [Odds ratio (OR) = 2.04, CI = 1.29–3.21], the 10-point mentoring satisfaction rating (OR = 1.27, CI = 1.17–1.35), and number of mentors (OR = 1.60, CI = 1.20–2.07). CONCLUSIONS: Having a mentor, or preferably, multiple mentors is strongly related to satisfaction with mentoring and overall job satisfaction. Surprisingly, few differences were related to gender. Mentoring of clinician–educators, research track faculty, and senior faculty, and the use of multiple mentors require specific attention of academic leadership and further study

    T cell-inflamed gene expression profile and PD-L1 expression and pembrolizumab efficacy in advanced esophageal cancer

    Get PDF
    Aim: Investigate the relationship between response to pembrolizumab and expression of the 18-gene T cell-inflamed gene expression profile (TcellinfGEP) or PD-L1 combined positive score (CPS) in esophageal cancer. Materials & methods: This analysis included heavily pretreated patients with advanced/metastatic esophageal/gastroesophageal junction adenocarcinoma or squamous cell carcinoma who received pembrolizumab in the single-arm, phase II study KEYNOTE-180. PD-L1 CPS was evaluated with PD-L1 IHC 22C3 pharmDx. Results: In patients with squamous cell carcinoma, trends toward enrichment for responders were observed for patients with PD-L1 CPS ≥10 tumors. In patients with adenocarcinoma, a trend was observed for TcellinfGEP but not for PD-L1. Conclusion: TcellinfGEP and PD-L1 CPS may enrich for responders to pembrolizumab in patients with esophageal cancer. Clinical Trial Registration: NCT02559687 (ClinicalTrials.gov

    Proximal major limb amputations – a retrospective analysis of 45 oncological cases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proximal major limb amputations due to malignant tumors have become rare but are still a valuable treatment option in palliation and in some cases can even cure. The aim of this retrospective study was to analyse outcome in those patients, including the postoperative course, survival, pain, quality of life, and prosthesis usage.</p> <p>Methods</p> <p>Data of 45 consecutive patients was acquired from patient's charts and contact to patients, and general practitioners. Patients with interscapulothoracic amputation (n = 14), shoulder disarticulation (n = 13), hemipelvectomy (n = 3) or hip disarticulation (n = 15) were included.</p> <p>Results</p> <p>The rate of proximal major limb amputations in patients treated for sarcoma was 2.3% (37 out of 1597). Survival for all patients was 42.9% after one year and 12.7% after five years. Survival was significantly better in patients with complete tumor resections. Postoperative chemotherapy and radiation did not prolong survival. Eighteen percent of the patients with malignant disease developed local recurrence. In 44%, postoperative complications were observed. Different modalities of postoperative pain management and the site of the amputation had no significant influence on long-term pain assessment and quality of life. Eighty-seven percent suffered from phantom pain, 15.6% considered their quality of life worse than before the operation. Thirty-two percent of the patients who received a prosthesis used it regularly.</p> <p>Conclusion</p> <p>Proximal major limb amputations severely interfere with patients' body function and are the last, albeit valuable, option within the treatment concept of extremity malignancies or severe infections. Besides short survival, high complication rates, and postoperative pain, patients' quality of life can be improved for the time they have remaining.</p

    Effect of Recombinant Growth Hormone Replacement in a Growth Hormone Deficient Subject Recovering from Mild Traumatic Brain Injury: A Case Report

    Full text link
    Objective: To assess the effects of growth hormone (GH) replacement in an individual who sustained mild traumatic brain injury (mTBI) as an adult and was found to have GH deficiency by glucagon stimulation testing. Participant: A 43-year old woman who sustained a mild TBI at age 37 years. She was 6.8 years post-injury when she began supplementation. Intervention: Recombinant human GH (rhGH) subcutaneously per day for 1 year. Main outcome measures: Single fibre muscle function was evaluated from muscle biopsies. Body composition, muscle strength and peak aerobic capacity were also measured. In addition, neuropsychological tests of memory, processing speed and motor dexterity and speed, as well as a self-report depression inventory were administered. All assessments were performed at baseline and after 6 and 12 months of rhGH replacement therapy. Results: Single muscle fibre changes were greatest at 6 months. Body composition showed continuous improvement. Muscle strength improved for knee extension. Peak oxygen consumption increased at 6 months and total work and ventilatory equivalents continued to improve at 12 months. Significant improvements in neuropsychological test performance were not found, with the exception of performance on a test of motor dexterity and speed. Conclusion: rhGH replacement in a subject with GH deficiency after mild TBI improves muscle force production, body composition and aerobic capacity. Reliable improvements on tests of cognition were not found in this subject
    • …
    corecore