19 research outputs found

    Detectable clonal mosaicism and its relationship to aging and cancer

    Get PDF
    In an analysis of 31,717 cancer cases and 26,136 cancer-free controls from 13 genome-wide association studies, we observed large chromosomal abnormalities in a subset of clones in DNA obtained from blood or buccal samples. We observed mosaic abnormalities, either aneuploidy or copy-neutral loss of heterozygosity, of >2 Mb in size in autosomes of 517 individuals (0.89%), with abnormal cell proportions of between 7% and 95%. In cancer-free individuals, frequency increased with age, from 0.23% under 50 years to 1.91% between 75 and 79 years (P = 4.8 × 10(-8)). Mosaic abnormalities were more frequent in individuals with solid tumors (0.97% versus 0.74% in cancer-free individuals; odds ratio (OR) = 1.25; P = 0.016), with stronger association with cases who had DNA collected before diagnosis or treatment (OR = 1.45; P = 0.0005). Detectable mosaicism was also more common in individuals for whom DNA was collected at least 1 year before diagnosis with leukemia compared to cancer-free individuals (OR = 35.4; P = 3.8 × 10(-11)). These findings underscore the time-dependent nature of somatic events in the etiology of cancer and potentially other late-onset diseases

    Investigating model performance in language identification: beyond simple error statistics

    No full text
    Language development experts need tools that can automatically identify languages from fluent, conversational speech and provide reliable estimates of usage rates at the level of an individual recording. However, LID systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics. Overview statistics may mask systematic errors in model performance for some subsets of the data, and consequently, have worse performance on data derived from some subsets of human speakers, creating a kind of algorithmic bias. Here, we investigate how well a number of LID systems perform on individual recordings and speech units with different linguistic properties in the MERLIon CCS Challenge featuring accented code-switched child-directed speech.Signal Processing System

    MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

    No full text
    To enhance the reliability and robustness of language identification (LID) and language diarization (LD) systems for heterogeneous populations and scenarios, there is a need for speech processing models to be trained on datasets that feature diverse language registers and speech patterns. We present the MERLIon CCS challenge, featuring a first-of-its-kind Zoom video call dataset of parent-child shared book reading, of over 30 hours with over 300 recordings, annotated by multilingual transcribers using a high-fidelity linguistic transcription protocol. The audio corpus features spontaneous and in-the-wild English-Mandarin code-switching, child-directed speech in non-standard accents with diverse language-mixing patterns recorded in a variety of home environments. This report describes the corpus, as well as LID and LD results for our baseline and several systems submitted to the MERLIon CCS challenge using the corpus.Biomaterials & Tissue BiomechanicsSignal Processing System
    corecore