2,731 research outputs found
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
2023-2024 Undergraduate Catalog
2023-2024 undergraduate catalog for Morehead State University
Expectations and expertise in artificial intelligence: specialist views and historical perspectives on conceptualisation, promise, and funding
Artificial intelligence’s (AI) distinctiveness as a technoscientific field that imitates the ability to think went through a resurgence of interest post-2010, attracting a flood of scientific and popular expectations as to its utopian or dystopian transformative consequences. This thesis offers observations about the formation and dynamics of expectations based on documentary material from the previous periods of perceived AI hype (1960-1975 and 1980-1990, including in-between periods of perceived dormancy), and 25 interviews with UK-based AI specialists, directly involved with its development, who commented on the issues during the crucial period of uncertainty (2017-2019) and intense negotiation through which AI gained momentum prior to its regulation and relatively stabilised new rounds of long-term investment (2020-2021). This examination applies and contributes to longitudinal studies in the sociology of expectations (SoE) and studies of experience and expertise (SEE) frameworks, proposing a historical sociology of expertise and expectations framework. The research questions, focusing on the interplay between hype mobilisation and governance, are: (1) What is the relationship between AI practical development and the broader expectational environment, in terms of funding and conceptualisation of AI? (2) To what extent does informal and non-developer assessment of expectations influence formal articulations of foresight? (3) What can historical examinations of AI’s conceptual and promissory settings tell about the current rebranding of AI?
The following contributions are made: (1) I extend SEE by paying greater attention to the interplay between technoscientific experts and wider collective arenas of discourse amongst non-specialists and showing how AI’s contemporary research cultures are overwhelmingly influenced by the hype environment but also contribute to it. This further highlights the interaction between competing rationales focusing on exploratory, curiosity-driven scientific research against exploitation-oriented strategies at formal and informal levels. (2) I suggest benefits of examining promissory environments in AI and related technoscientific fields longitudinally, treating contemporary expectations as historical products of sociotechnical trajectories through an authoritative historical reading of AI’s shifting conceptualisation and attached expectations as a response to availability of funding and broader national imaginaries. This comes with the benefit of better perceiving technological hype as migrating from social group to social group instead of fading through reductionist cycles of disillusionment; either by rebranding of technical operations, or by the investigation of a given field by non-technical practitioners. It also sensitises to critically examine broader social expectations as factors for shifts in perception about theoretical/basic science research transforming into applied technological fields. Finally, (3) I offer a model for understanding the significance of interplay between conceptualisations, promising, and motivations across groups within competing dynamics of collective and individual expectations and diverse sources of expertise
2023-2024 Boise State University Undergraduate Catalog
This catalog is primarily for and directed at students. However, it serves many audiences, such as high school counselors, academic advisors, and the public. In this catalog you will find an overview of Boise State University and information on admission, registration, grades, tuition and fees, financial aid, housing, student services, and other important policies and procedures. However, most of this catalog is devoted to describing the various programs and courses offered at Boise State
Identifying long non-coding RNA in the chicken transcriptome
The transcriptome remains a vast under explored space in genomics. Unlike the genome which is linear in nature, the use of alternative transcription start, end, and splicing sites in eukaryotes creates the possibility of near infinite differentially expressed RNA. While many expressed messenger RNA have been identified through the proteins that they produce, there is still very little known about the world of long non-coding RNA (lncRNA).
Long non-coding RNA are a vast unknown space and represent one of the largest frontiers of transcriptomics. While little is known about this class of RNA as a whole, there have been specific lncRNA which have been found to be crucial components of biological development. Given the characteristics of lncRNA there may also be a sub-class that is involved in cell differentiation and speciation. In order to explore lncRNA and generate high throughput predictions of their functions, I used the chicken as a model and applied comparative genomics using newly assembled genomes from other avian species.
Long non-coding RNA present the almost perfect scenario for evading detection from previous RNA discovery methods. They have been shown to be poorly conserved across species, with generally low expression levels and no downstream product that is immediately identifiable. Given these factors, previous RNA detection methods such as expressed sequence tags and RNA sequencing cannot provide reliable evidence for the mass identification of lncRNA.
In the first chapter I explore the characteristics of Iso-Seq (Pacific Biosciences long read RNA sequencing technology) and methods for processing the data to improve long non-coding RNA identification. I also explore the use of non-traditional cDNA library preparation methods including cDNA normalization and 5’ cap selection. I found that the ability of long read RNA sequencing to provide full length transcript sequences allows for more robust methods of lncRNA prediction.
In the second chapter, I explore the data processing of long reads. I use a dataset generated by Pacific Biosciences using the Universal Human Reference RNA as an example of ideal long read data. By using data based on the human transcriptome, I was able to compare my results with information from one of the most well annotated and studied transcriptomes. I demonstrate the Transcriptome Annotation by Modular Algorithms (TAMA) software that I developed and how it can be used to explore the non-coding RNA within the transcriptome.
In the third chapter, I explore the transcriptome constructed from Iso-Seq data on different chicken tissue samples. I used the TAMA software along with other tools to make pipelines optimized for lncRNA discovery and to perform functional annotation. Using these methodologies I identified over 300,000 putative transcript models corresponding to over 50,000 genes. Of these over 100,000 transcript models appear to be lncRNA which correspond to over 38,000 gene loci. The majority of these are predicted as sense exonic and mono-exonic lncRNA. While it will require further investigation to produce sufficient evidence that these RNA are not the result of transcriptional noise, I have identified a subset of these which appear to have functional importance given their co-expression with known genes.
I demonstrate that while lncRNA appear to be generally lowly expressed, they often express in a tissue-specific manner which suggests a possible role in tissue differentiation.
From these investigations, I have found that there are potentially thousands of unannotated lncRNA within the chicken transcriptome with characteristics that require new technologies such as long read sequencing to identify.
These novel lncRNA include a subset which could have functional roles in the regulation of cell differentiation
Evaluating ecosystem interventions for improved health outcomes - The case of the Volta Estuary mangroves and malaria
Degradative alteration of ecological systems worldwide is progressing at a time when their influence on human wellbeing is becoming more evident. For some ecosystems and aspects of wellbeing, more concrete knowledge exists. Insights into the science of mangrove-health relationships are however limited and fragmented, with no assessments of human perspectives around these phenomena. This study investigated the nature of the mangrove-human health nexus by assessing the impacts of mangrove ecosystem interventions on health-related ecosystem goods and services and self-reported malaria experiences. Using a mix of methods comprising a systematic literature review, key informant interviews, health questionnaires and Qualitative Comparative Analysis (QCA), this study merges three bodies of work. Research participant viewpoints were synthesised regarding the evolution of mangrove characteristics and use patterns over time, and how these are affected by ecosystem restoration. Survey respondents were also engaged in a recall exercise of malaria experiences over the same period, to provide a basis for causal inference analysis using QCA methodology. Results show that mangrove dependence is declining with ecosystem degradation in Ghana, but ecosystem restoration can modulate some negative health impacts of mangrove degradation, such as infectious disease risk and threats to protein nutrition. Further, specific ecological conditions elicited by ecosystem interventions work together diversely to decrease malaria incidence, but mainly to amplify benefits of current malaria vector control interventions. The causal relationships reveal that certain aspects of wetland restoration can be strengthened to deliver conditions that improve consequences of current malaria management strategies. Environment and health managers must collaborate in policy reorientation, monitoring, evaluation, and capacity building to realise more tangible scientific evidence and sustainable cross-sector outcomes. Ecosystem interventions could plug the shortfalls arising from resource constraints in health policy implementation, towards more uniform outcomes especially in marginal communities
Examining the Link between Personality Traits, Cognitive Performance, and Consecutive Interpreting
Interpreting is a highly complex activity that not only demands proficient linguistic expertise, but also non-linguistic abilities such as non-linguistic cognitive
performance (Macnamara, 2012; Riesbeck et al., 1978; Wang, 2004). In addition to this, individual differences in personality may also play a potential role in the interpreter's ability to perform their job (Barrick & Mount, 1991; Rothmann & Coetzer, 2003). The current study sought to examine whether there is a relationship between personality traits, cognitive ability, and consecutive interpreting. The
five-factor model of personality (Costa & McCrae, 1988) was used to examine the personality of participants with its five categories of personality type (Openness to Experience; Conscientiousness; Extraversion; Agreeableness; and Neuroticism), and five cognitive ability tasks (Working Memory; Attentional Control; Multi-tasking; Speed of Information Processing; and Psychological Endurance) were chosen to examine their potential relationship with interpreting ability.
To fulfill this goal, an empirical study was conducted, collecting data from 80 participants in total (40 with consecutive interpreting backgrounds in the experimental group and 40 without interpreting foundations as a control group). Data was collected using online questionnaires and a set of cognitive tasks. The three online questionnaires, the Big Five (Goldberg, 1992), Attentional Control Scale (Derryberry & Reed, 2002) and Psychological Endurance Scale (Hamby et al., 2015) were used to examine participants’ personality, Attentional Control and Psychological Endurance respectively, whilst the objective cognitive tasks were designed to measure participant Working Memory, Multi-tasking ability and Speed of Information Processing using the Listening Span Test (Liu et al., 2004), Digits Symbol Substitution Test (Kaufman & Lichtenberger, 2006; Wechsler, 1939) and Linguistic Dual Task (Stachowiak, 2015; Meyer & Kieras, 1997) respectively.
The main findings of the current results were: firstly, a significant difference was found in cognitive abilities between experimental and control group in the areas of
Working Memory, Attentional Control, Multi-tasking and Psychological Endurance. Secondly, several personality traits correlated with scores on some cognitive abilities.
For example, Openness to Experience positively correlated with Attentional Control and Psychological Endurance; Conscientiousness positively correlated with Working Memory, Attentional Control and Psychological Endurance; Extraversion positively correlated with Attentional Control and Psychological Endurance; whilst Neuroticism negatively correlated with Attentional Control and Psychological Endurance. Thirdly, several personality traits (Openness to Experience, Conscientiousness and Extraversion) appear to be significantly related more to the experimental group than
the control group. Finally, mediation analysis appears to show that interpreting training has a mediating effect on the relationship between certain types of personality traits and cognitive abilities. In some cases, interpreting training and personality traits appear to exert an interacting effect and have a combining influence on some cognitive abilities. These findings can hopefully provide a
foundation for future study and be applied in practice to help interpreting training projects and cognitive ability improvement
Brain Computations and Connectivity [2nd edition]
This is an open access title available under the terms of a CC BY-NC-ND 4.0 International licence. It is free to read on the Oxford Academic platform and offered as a free PDF download from OUP and selected open access locations.
Brain Computations and Connectivity is about how the brain works. In order to understand this, it is essential to know what is computed by different brain systems; and how the computations are performed.
The aim of this book is to elucidate what is computed in different brain systems; and to describe current biologically plausible computational approaches and models of how each of these brain systems computes.
Understanding the brain in this way has enormous potential for understanding ourselves better in health and in disease. Potential applications of this understanding are to the treatment of the brain in disease; and to artificial intelligence which will benefit from knowledge of how the brain performs many of its extraordinarily impressive functions.
This book is pioneering in taking this approach to brain function: to consider what is computed by many of our brain systems; and how it is computed, and updates by much new evidence including the connectivity of the human brain the earlier book: Rolls (2021) Brain Computations: What and How, Oxford University Press.
Brain Computations and Connectivity will be of interest to all scientists interested in brain function and how the brain works, whether they are from neuroscience, or from medical sciences including neurology and psychiatry, or from the area of computational science including machine learning and artificial intelligence, or from areas such as theoretical physics
Higher-order interactions in single-cell gene expression: towards a cybergenetic semantics of cell state
Finding and understanding patterns in gene expression guides our understanding of living organisms, their development, and diseases, but is a challenging and high-dimensional problem as there are many molecules involved. One way to learn about the structure of a gene regulatory network is by studying the interdependencies among its constituents in transcriptomic data sets. These interdependencies could be arbitrarily complex, but almost all current models of gene regulation contain pairwise interactions only, despite experimental evidence existing for higher-order regulation that cannot be decomposed into pairwise mechanisms. I set out to capture these higher-order dependencies in single-cell RNA-seq data using two different approaches. First, I fitted maximum entropy (or Ising) models to expression data by training restricted Boltzmann machines (RBMs). On simulated data, RBMs faithfully reproduced both pairwise and third-order interactions. I then trained RBMs on 37 genes from a scRNA-seq data set of 70k astrocytes from an embryonic mouse. While pairwise and third-order interactions were revealed, the estimates contained a strong omitted variable bias, and there was no statistically sound and tractable way to quantify the uncertainty in the estimates. As a result I next adopted a model-free approach. Estimating model-free interactions (MFIs) in single-cell gene expression data required a quasi-causal graph of conditional dependencies among the genes, which I inferred with an MCMC graph-optimisation algorithm on an initial estimate found by the Peter-Clark algorithm. As the estimates are model-free, MFIs can be interpreted either as mechanistic relationships between the genes, or as substructures in the cell population. On simulated data, MFIs revealed synergy and higher-order mechanisms in various logical and causal dynamics more accurately than any correlation- or information-based quantities. I then estimated MFIs among 1,000 genes, at up to seventh-order, in 20k neurons and 20k astrocytes from two different mouse brain scRNA-seq data sets: one developmental, and one adolescent. I found strong evidence for up to fifth-order interactions, and the MFIs mostly disambiguated direct from indirect regulation by preferentially coupling causally connected genes, whereas correlations persisted across causal chains. Validating the predicted interactions against the Pathway Commons database, gene ontology annotations, and semantic similarity, I found that pairwise MFIs contained different but a similar amount of mechanistic information relative to networks based on correlation. Furthermore, third-order interactions provided evidence of combinatorial regulation by transcription factors and immediate early genes.
I then switched focus from mechanism to population structure. Each significant MFI can be assigned a set of single cells that most influence its value. Hierarchical clustering of the MFIs by cell assignment revealed substructures in the cell population corresponding to diverse cell states. This offered a new, purely data-driven view on cell states because the inferred states are not required to localise in gene expression space. Across the four data sets, I found 69 significant and biologically interpretable cell states, where only 9 could be obtained by standard approaches. I identified immature neurons among developing astrocytes and radial glial cells, D1 and D2 medium spiny neurons, D1 MSN subtypes, and cell-cycle related states present across four data sets. I further found evidence for states defined by genes associated to neuropeptide signalling, neuronal activity, myelin metabolism, and genomic imprinting. MFIs thus provide a new, statistically sound method to detect substructure in single-cell gene expression data, identifying cell types, subtypes, or states that can be delocalised in gene expression space and whose hierarchical structure provides a new view on the semantics of cell state. The estimation of the quasi-causal graph, the MFIs, and inference of the associated states is implemented as a publicly available Nextflow pipeline called Stator
- …