679 research outputs found

    Exposing Bias in Online Communities through Large-Scale Language Models

    Full text link
    Progress in natural language generation research has been shaped by the ever-growing size of language models. While large language models pre-trained on web data can generate human-sounding text, they also reproduce social biases and contribute to the propagation of harmful stereotypes. This work utilises the flaw of bias in language models to explore the biases of six different online communities. In order to get an insight into the communities' viewpoints, we fine-tune GPT-Neo 1.3B with six social media datasets. The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations. Together, these methods reveal that bias differs in type and intensity for the various models. This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities. Additionally, the examples generated for this work demonstrate the limitations of using automated sentiment and toxicity classifiers in bias research

    Ruled surfaces with isotropic generators

    Get PDF
    n/

    Variant detection and runs of homozygosity in next generation sequencing data elucidate the genetic background of Lundehund syndrome

    Get PDF
    Runs of homozygosity for Lundehund specific regions in 500-SNP windows. The chromosomal position of ROH regions, number of SNPs in these regions (n), size in base pairs (size_bp), canine genes (gene) and human orthologues (human gene) are shown. (XLSX 88 kb

    A dual isotopic approach using radioactive phosphorus and the isotopic composition of oxygen associated to phosphorus to understand plant reaction to a change in P nutrition

    Get PDF
    Abstract Background Changing the phosphorus (P) nutrition leads to changes in plant metabolism. The aim of this study was to investigate how these changes are reflected in the distribution of 33P and the isotopic composition of oxygen associated to P (δ18OP) in different plant parts of soybean (Glycine max cv. Toliman). Two P pools were extracted sequentially with 0.3 M trichloroacetic acid (TCA P) and 10 M nitric acid (HNO3; residual P). Results The δ18OP of TCA P in the old leaves of the − P plants (23.8‰) significantly decreased compared to the + P plants (27.4‰). The 33P data point to an enhanced mobilisation of P from residual P in the old leaves of the − P plants compared to the + P plants. Conclusions Omitting P for 10 days lead to a translocation of P from source to sink organs in soybeans. This was accompanied by a significant lowering of the δ18OP of TCA P in the source organs due to the enzymatic hydrolysis of organic P. Combining 33P and δ18OP can provide useful insights in plant responses to P omission at an early stage

    Reddit financial image post sentiment dataset

    Get PDF
    The dataset presented in this paper consists of sentiment information extracted from image and text data of financial subreddit posts. Members of these subreddits post about their trading behavior, express their opinions, and discuss capital market trends. Their posts contain sentiment information on financial topics as well as signaling information on trading decisions. Frequently, members post screenshots of their portfolios from their mobile broker apps. We collected the posts, processed them to extract sentiment scores using various methods, and anonymized them. The dataset consists therefore not of any content from the posts or information about the author, but the processed sentiment information within the post. Further financial tickers mentioned in the posts are tracked, such that the effect of sentiment in the posts can be attributed to financial products and used in the context of financial forecasting. The posts were collected using the Reddit [2] and Pushshift APIs [3] and processed using an Amazon Web Services architecture. A fine-tuned MobileNets artificial neural network [4] was used to classify images into four distinct categories, which had been determined in a preliminary analysis. The categories included classical memes, number posts (e.g. screenshots of mobile broker portfolios), text posts (e.g. screenshots from twitter) and chart posts (e.g. other financial screenshots, such as charts). The reason for the classification of images into the four categories is that the images are so inherently different, that different extraction methods had to be applied for each category. OCR – methods [5] were used to extract text from images. Custom methods were applied to extract sentiment and other information from the resulting text. The data [1] is available on a 20-minute basis and can be used in many areas, such as financial forecasting and analyzing sentiment dynamics in social media posts
    • …
    corecore