114 research outputs found

    Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

    Full text link
    Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we present the first systematic study of the key factors in crowdsourcing paraphrase collection. We consider variations in instructions, incentives, data domains, and workflows. We manually analyzed paraphrases for correctness, grammaticality, and linguistic diversity. Our observations provide new insight into the trade-offs between accuracy and diversity in crowd responses that arise as a result of task design, providing guidance for future paraphrase generation procedures.Comment: Published at ACL 201

    Community Genetics screening in a pandemic: solutions for pre-test education, informed consent, and specimen collection

    Full text link
    A Community Genetics carrier screening program for the Jewish community has operated on-site in high schools in Sydney (Australia) for 25 years. During 2020, in response to the COVID-19 pandemic, government-mandated social-distancing, ‘lock-down’ public health orders, and laboratory supply-chain shortages prevented the usual operation and delivery of the annual testing program. We describe development of three responses to overcome these challenges: (1) pivoting to online education sufficient to ensure informed consent for both genetic and genomic testing; (2) development of contactless telehealth with remote training and supervision for collecting genetic samples using buccal swabs; and (3) a novel patient and specimen identification ‘GeneTrustee’ protocol enabling fully identified clinical-grade specimens to be collected and DNA extracted by a research laboratory while maintaining full participant confidentiality and privacy. These telehealth strategies for education, consent, specimen collection and sample processing enabled uninterrupted delivery and operation of complex genetic testing and screening programs even amid pandemic restrictions. These tools remain available for future operation and can be adapted to other programs

    CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion

    Get PDF
    Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.Methodology/Principal Findings: We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.Conclusions/Significance: The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/

    AGEMAP: A Gene Expression Database for Aging in Mice

    Get PDF
    We present the AGEMAP (Atlas of Gene Expression in Mouse Aging Project) gene expression database, which is a resource that catalogs changes in gene expression as a function of age in mice. The AGEMAP database includes expression changes for 8,932 genes in 16 tissues as a function of age. We found great heterogeneity in the amount of transcriptional changes with age in different tissues. Some tissues displayed large transcriptional differences in old mice, suggesting that these tissues may contribute strongly to organismal decline. Other tissues showed few or no changes in expression with age, indicating strong levels of homeostasis throughout life. Based on the pattern of age-related transcriptional changes, we found that tissues could be classified into one of three aging processes: (1) a pattern common to neural tissues, (2) a pattern for vascular tissues, and (3) a pattern for steroid-responsive tissues. We observed that different tissues age in a coordinated fashion in individual mice, such that certain mice exhibit rapid aging, whereas others exhibit slow aging for multiple tissues. Finally, we compared the transcriptional profiles for aging in mice to those from humans, flies, and worms. We found that genes involved in the electron transport chain show common age regulation in all four species, indicating that these genes may be exceptionally good markers of aging. However, we saw no overall correlation of age regulation between mice and humans, suggesting that aging processes in mice and humans may be fundamentally different

    Whole genome sequencing for the genetic diagnosis of heterogenous dystonia phenotypes

    Get PDF
    Introduction: Dystonia is a clinically and genetically heterogeneous disorder and a genetic cause is often difficult to elucidate. This is the first study to use whole genome sequencing (WGS) to investigate dystonia in a large sample of affected individuals. Methods: WGS was performed on 111 probands with heterogenous dystonia phenotypes. We performed analysis for coding and non-coding variants, copy number variants (CNVs), and structural variants (SVs). We assessed for an association between dystonia and 10 known dystonia risk variants. Results: A genetic diagnosis was obtained for 11.7% (13/111) of individuals. We found that a genetic diagnosis was more likely in those with an earlier age at onset, younger age at testing, and a combined dystonia phenotype. We identified pathogenic/likely-pathogenic variants in ADCY5 (n = 1), ATM (n = 1), GNAL (n = 2), GLB1 (n = 1), KMT2B (n = 2), PRKN (n = 2), PRRT2 (n = 1), SGCE (n = 2), and THAP1 (n = 1). CNVs were detected in 3 individuals. We found an association between the known risk variant ARSG rs11655081 and dystonia (p = 0.003). Conclusion: A genetic diagnosis was found in 11.7% of individuals with dystonia. The diagnostic yield was higher in those with an earlier age of onset, younger age at testing, and a combined dystonia phenotype. WGS may be particularly relevant for dystonia given that it allows for the detection of CNVs, which accounted for 23% of the genetically diagnosed cases. © 2019 The Author
    • …
    corecore