112 research outputs found

    A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

    Full text link
    Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of mapping tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification

    Representation Bias of Adolescents in AI: A Bilingual, Bicultural Study

    Full text link
    Popular and news media often portray teenagers with sensationalism, as both a risk to society and at risk from society. As AI begins to absorb some of the epistemic functions of traditional media, we study how teenagers in two countries speaking two languages: 1) are depicted by AI, and 2) how they would prefer to be depicted. Specifically, we study the biases about teenagers learned by static word embeddings (SWEs) and generative language models (GLMs), comparing these with the perspectives of adolescents living in the U.S. and Nepal. We find English-language SWEs associate teenagers with societal problems, and more than 50% of the 1,000 words most associated with teenagers in the pretrained GloVe SWE reflect such problems. Given prompts about teenagers, 30% of outputs from GPT2-XL and 29% from LLaMA-2-7B GLMs discuss societal problems, most commonly violence, but also drug use, mental illness, and sexual taboo. Nepali models, while not free of such associations, are less dominated by social problems. Data from workshops with N=13 U.S. adolescents and N=18 Nepalese adolescents show that AI presentations are disconnected from teenage life, which revolves around activities like school and friendship. Participant ratings of how well 20 trait words describe teens are decorrelated from SWE associations, with Pearson's r=.02, n.s. in English FastText and r=.06, n.s. in GloVe; and r=.06, n.s. in Nepali FastText and r=-.23, n.s. in GloVe. U.S. participants suggested AI could fairly present teens by highlighting diversity, while Nepalese participants centered positivity. Participants were optimistic that, if it learned from adolescents, rather than media sources, AI could help mitigate stereotypes. Our work offers an understanding of the ways SWEs and GLMs misrepresent a developmentally vulnerable group and provides a template for less sensationalized characterization.Comment: Accepted at Artificial Intelligence, Ethics, and Society 202

    ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

    Full text link
    This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.Accepted at Artificial Intelligence, Ethics, and Society 202

    Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI

    Full text link
    Multimodal AI models capable of associating images and text hold promise for numerous domains, ranging from automated image captioning to accessibility applications for blind and low-vision users. However, uncertainty about bias has in some cases limited their adoption and availability. In the present work, we study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases, and we find evidence that such biases are reflected across three distinct CLIP model families. We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model. Human-like impressions of visually unobservable attributes, like trustworthiness and sexuality, emerge only in models trained on the largest dataset, indicating that a better fit to uncurated cultural data results in the reproduction of increasingly subtle social biases. Moreover, we use a hierarchical clustering approach to show that dataset size predicts the extent to which the underlying structure of facial impression bias resembles that of facial impression bias in humans. Finally, we show that Stable Diffusion models employing CLIP as a text encoder learn facial impression biases, and that these biases intersect with racial biases in Stable Diffusion XL-Turbo. While pretrained CLIP models may prove useful for scientific studies of bias, they will also require significant dataset curation when intended for use as general-purpose models in a zero-shot setting.Accepted at Artificial Intelligence, Ethics, and Society 202

    Sensory Communication

    Get PDF
    Contains table of contents for Section 2, an introduction and reports on fourteen research projects.National Institutes of Health Grant RO1 DC00117National Institutes of Health Grant RO1 DC02032National Institutes of Health/National Institute on Deafness and Other Communication Disorders Grant R01 DC00126National Institutes of Health Grant R01 DC00270National Institutes of Health Contract N01 DC52107U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-95-K-0014U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-96-K-0003U.S. Navy - Office of Naval Research Grant N00014-96-1-0379U.S. Air Force - Office of Scientific Research Grant F49620-95-1-0176U.S. Air Force - Office of Scientific Research Grant F49620-96-1-0202U.S. Navy - Office of Naval Research Subcontract 40167U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-96-K-0002National Institutes of Health Grant R01-NS33778U.S. Navy - Office of Naval Research Grant N00014-92-J-184

    Meta-analysis of SHANK Mutations in Autism Spectrum Disorders: A Gradient of Severity in Cognitive Impairments.

    Get PDF
    International audienceSHANK genes code for scaffold proteins located at the post-synaptic density of glutamatergic synapses. In neurons, SHANK2 and SHANK3 have a positive effect on the induction and maturation of dendritic spines, whereas SHANK1 induces the enlargement of spine heads. Mutations in SHANK genes have been associated with autism spectrum disorders (ASD), but their prevalence and clinical relevance remain to be determined. Here, we performed a new screen and a meta-analysis of SHANK copy-number and coding-sequence variants in ASD. Copy-number variants were analyzed in 5,657 patients and 19,163 controls, coding-sequence variants were ascertained in 760 to 2,147 patients and 492 to 1,090 controls (depending on the gene), and, individuals carrying de novo or truncating SHANK mutations underwent an extensive clinical investigation. Copy-number variants and truncating mutations in SHANK genes were present in ∼1% of patients with ASD: mutations in SHANK1 were rare (0.04%) and present in males with normal IQ and autism; mutations in SHANK2 were present in 0.17% of patients with ASD and mild intellectual disability; mutations in SHANK3 were present in 0.69% of patients with ASD and up to 2.12% of the cases with moderate to profound intellectual disability. In summary, mutations of the SHANK genes were detected in the whole spectrum of autism with a gradient of severity in cognitive impairment. Given the rare frequency of SHANK1 and SHANK2 deleterious mutations, the clinical relevance of these genes remains to be ascertained. In contrast, the frequency and the penetrance of SHANK3 mutations in individuals with ASD and intellectual disability-more than 1 in 50-warrant its consideration for mutation screening in clinical practice

    Ripe to be Heard: Worker Voice in the Fair Food Programme

    Get PDF
    The Fair Food Program (FFP) provides a mechanism through which agricultural workers’ collective voice is expressed, heard and responded to within global value chains. The FFP's model of worker-driven social responsibility presents an alternative to traditional corporate social responsibility. This article identifies the FFP's key components and demonstrates its resilience by identifying the ways in which the issues faced by a new group of migrant workers – recruited through a “guest-worker” scheme – were incorporated and dealt with. This case study highlights the important potential presented by the programme to address labour abuses across transnationalized labour markets while considering early replication possibilities

    Morphological Diversity between Culture Strains of a Chlorarachniophyte, Lotharella globosa

    Get PDF
    Chlorarachniophytes are marine unicellular algae that possess secondary plastids of green algal origin. Although chlorarachniophytes are a small group (the phylum of Chlorarachniophyta contains 14 species in 8 genera), they have variable and complex life cycles that include amoeboid, coccoid, and/or flagellate cells. The majority of chlorarachniophytes possess two or more cell types in their life cycles, and which cell types are found is one of the principle morphological criteria used for species descriptions. Here we describe an unidentified chlorarachniophyte that was isolated from an artificial coral reef that calls this criterion into question. The life cycle of the new strain includes all three major cell types, but DNA barcoding based on the established nucleomorph ITS sequences showed it to share 100% sequence identity with Lotharella globosa. The type strain of L. globosa was also isolated from a coral reef, but is defined as completely lacking an amoeboid stage throughout its life cycle. We conclude that L. globosa possesses morphological diversity between culture strains, and that the new strain is a variety of L. globosa, which we describe as Lotharella globosa var. fortis var. nov. to include the amoeboid stage in the formal description of L. globosa. This intraspecies variation suggest that gross morphological stages maybe lost rather rapidly, and specifically that the type strain of L. globosa has lost the ability to form the amoeboid stage, perhaps recently. This in turn suggests that even major morphological characters used for taxonomy of this group may be variable in natural populations, and therefore misleading

    A New Chicken Genome Assembly Provides Insight into Avian Genome Structure

    Get PDF
    The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts
    corecore