8,021 research outputs found
Recommended from our members
PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population
Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.</p
Recommended from our members
Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through âcrowd-sourcing.â Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for ânext-generation,â high-coverage lexical terminologies.</p
Personality Dysfunction Manifest in Words : Understanding Personality Pathology Using Computational Language Analysis
Personality disorders (PDs) are some of the most prevalent and high-risk mental health conditions, and yet remain poorly understood. Today, the development of new technologies means that there are advanced tools that can be used to improve our understanding and treatment of PD. One promising tool â indeed, the focus of this thesis â is computational language analysis. By looking at patterns in how people with personality pathology use words, it is possible to gain access into their constellation of thinking, feelings, and behaviours. To date, however, there has been little research at the intersection of verbal behaviour and personality pathology. Accordingly, the central goal of this thesis is to demonstrate how PD can be better understood through the analysis of natural language. This thesis presents three research articles, comprising four empirical studies, that each leverage computational language analysis to better understand personality pathology. Each paper focuses on a distinct core feature of PD, while incorporating language analysis methods: Paper 1 (Study 1) focuses on interpersonal dysfunction; Paper 2 (Studies 2 and 3) focuses on emotion dysregulation; and Paper 3 (Study 4) focuses on behavioural dysregulation (i.e., engagement in suicidality and deliberate self-harm). Findings from this research have generated better understanding of fundamental features of PD, including insight into characterising dimensions of social dysfunction (Paper 1), maladaptive emotion processes that may contribute to emotion dysregulation (Paper 2), and psychosocial dynamics relating to suicidality and deliberate self-harm (Paper 3) in PD. Such theoretical knowledge subsequently has important implications for clinical practice, particularly regarding the potential to inform psychological therapy. More broadly, this research highlights how language can provide implicit and unobtrusive insight into the personality and psychological processes that underlie personality pathology at a large-scale, using an individualised, naturalistic approach
Exploring the use of nature as an adjunct to psychological interventions for depression in young populations
Depression in adolescence is a global priority and it is critical to identify effective and accessible interventions. This systematic review aimed to synthesise experimental research on nature-based interventions (NBIs), to determine effects on depressive symptoms in young people. The secondary research question sought to understand characteristics of effective NBIs. A comprehensive systematic search was conducted across major and grey literature databases and papers were screened according to specified criteria. Participantsâ ages were required to be between 10 and 24 years and studies needed to use an experimental design, including a control group. Experimental conditions were defined by psychotherapeutic interventions with nature exposure and outcomes measured either clinical symptomatology or subjective states of depression. Ten papers were identified, quality assessed and summarised in a narrative synthesis. Thirteen significant effects were reported in nine studies, highlighting the potential for NBIs as effective interventions for depressive symptoms in young people. However, due to methodological biases, such as lack of randomisation or control over group differences and frequent use of passive control groups, there remains considerable uncertainty over the effectiveness of NBIs. Characteristics of effective NBIs are tentatively discussed, however, further work is needed to clarify which aspects specifically contribute to the beneficial effects observed. Future research should seek to address the limitations of small samples, selection biases and test NBIs against more comparable and evidence-based interventions. It is hoped future studies will consider the inclusion of clinical populations, to explore the utility of NBIs as a treatment option for adolescent depression
Sound of Violent Images / Violence of Sound Images: Pulling apart Tom and Jerry
Violence permeates Tom and Jerry in the repetitive, physically violent gags and scenes of humiliation and mocking, yet unarguably, there is comedic value in the onscreen violence.The musical scoring of Tom and Jerry in the early William Hanna and Joseph Barbera period of production (pre-1958) by Scott Bradley played a key role in conveying the comedic impact of violent gags due to the close synchronisation of music and sound with visual action and is typified by a form of sound design characteristic of zip crash animation as described by Paul Taberham (2012), in which sound actively participates in the humour and directly influences the viewerâs interpretation of the visual action. This research investigates the sound-image relationships in Tom and Jerry through practice, by exploring how processes of decontextualisation and desynchronisation of sound and image elements of violent gags unmask the underlying violent subtext of Tom and Jerryâs slapstick comedy. This research addresses an undertheorised area in animation related to the role of sound-image synchronisation and presents new knowledge derived from the novel application of audiovisual analysis of Tom and Jerry source material and the production of audiovisual artworks. The findings of this research are discussed from a pan theoretical perspective drawing on theorisation of film sound and cognitivist approaches to film music.
This investigation through practice, supports the notion that intrinsic and covert processes of sound-image synchronisation as theorised by Kevin Donnelly (2014), play a key role in the reading of slapstick violence as comedic. Therefore, this practice-based research can be viewed as a case study that demonstrates the potential of a sampling-based creative practice to enable new readings to emerge from sampled source material. Novel artefacts were created in the form of audiovisual works that embody specific knowledge of factors related to the reconfiguration of sound-image relations and their impact in altering viewersâ readings of violence contained within Tom and Jerry. Critically, differences emerged between the artworks in terms of the extent to which they unmasked underlying themes of violence and potential mediating factors are discussed related to the influence of asynchrony on comical framing, the role of the unseen voice, perceived musicality and perceptions of interiority in the audiovisual artworks. The research findings yielded new knowledge regarding a potential gender-based bias in the perception of the human voice in the animated artworks produced. This research also highlights the role of intra-animation dimensions pertaining to the use of the single frame, the use of blank spaces and the relationship of sound-image synchronisation to the notion of the acousmatic imaginary. The PhD includes a portfolio of experimental audiovisual artworks produced during the testing and experimental phases of the research on which the textual dissertation critically reflects
Discovery of genetic factors for reading ability and dyslexia
The ability to read is critical to access wider learning and achieve qualifications, for accessing employment, and for adult life skills. Approximately one in ten individuals are affected by dyslexia, a learning difficulty which primarily impacts word reading and spelling. Specifically, phonological processing (the ability to decode phonemes) is impaired in dyslexia. Whilst some believe dyslexia represents the extreme end of a continuum of reading ability, others have suggested it is a distinct trait.
Variation in reading ability is a highly heritable (possibly 70%) complex trait caused by many genetic variants with a small effect size. However, the genetic architecture of reading ability and dyslexia is largely unknown due to a lack of quantitative genetic studies with sufficient statistical power to detect such small effect sizes. Previously, most genetic studies of reading ability have been conducted using samples of children with dyslexia, which tend to be modest in size. Whilst large samples of genotyped unselected adults have been collected (for example UK Biobank), phenotypic data on reading or language skills is rarely prioritised.
The overall aim of this thesis is to discover genetic variants associated with dyslexia and variation in reading skill in order to better understand the aetiology of reading difficulties, which in turn, may inform prediction, identification and intervention strategies in the future. Firstly, I will conduct a genome-wide association (GWA) study of over 50,000 adults with a self-reported dyslexia diagnosis and over 1 million controls to identify associated single nucleotide polymorphisms (SNPs). I will also explore ways to improve power for discovering genetic factors associated with reading ability. To do this, I will first investigate whether unselected adult samples are valid as a means to identify genetic factors associated with reading skill through a candidate gene approach. Secondly, I will investigate whether proxy reading phenotypes are also a means to gain power through large cohorts that have no quantitative measure of reading ability. Such samples may be informative for future GWA meta-analysis of quantitative reading ability.
In Chapter 1, I will first introduce reading ability and dyslexia. I will discuss how reading ability is a quantitative trait and how it can be measured before discussing how dyslexia is identified. Then, I will consider how dyslexia may relate to reading ability: whether it represents the extreme end of a continuum of reading or whether it is a distinct trait. I will then introduce the known causes of variation in reading ability and dyslexia, which includes both environmental and genetic factors. Next, I will present the history of genetic studies of reading ability and dyslexia and their limitations. Finally, I will discuss the current state of genetic research into reading ability and introduce the aims of my thesis in detail.
Chapter 2 is a publication in Nature Genetics entitled âDiscovery of 42 genome-wide significant loci associated with dyslexiaâ which includes GWA analysis of over 1 million 23andMe, Inc participants reporting on dyslexia diagnosis. I identify 42 independent genome-wide significant loci, 15 of which are in genes previously linked to cognitive ability and/or educational attainment, and 27 of which are novel and may be more specific to dyslexia. Extensive downstream biological analysis is performed alongside genetic correlations with other traits and dyslexia polygenic score prediction of quantitative reading scores.
Chapter 3 is a publication in Twin Research and Human Genetics on âThe association of dyslexia and developmental speech and language disorder candidate genes with reading and language abilities in adultsâ which analyses an adult population cohort
with quantitative measures of reading and language ability to replicate previous associations of candidate genes and biological pathways with dyslexia. I demonstrate that unselected adult populations are a valid means by which to identify genes which have previously been associated with dyslexia and/or speech and language disorder.
Chapter 4 is a research chapter in which I construct a proxy reading phenotype from measures of reading frequency in an unselected adult sample for whom a quantitative measure of reading ability is not available. I find that a dyslexia polygenic score constructed from the dyslexia GWA analysis in Chapter 3 cannot explain variation in the proxy phenotype suggesting that book reading is not a sufficient substitute for reading ability.
Finally, in Chapter 5, I integrate and discuss my research findings. I highlight the discovery of 42 variants associated with dyslexia through GWAS, in addition to the discovery of new genes and biological pathways which may form part of the biological basis of dyslexia. Following this, I consider what GWAS tells us about candidate gene findings. I discuss traits which are genetically correlated with dyslexia, including quantitative reading skills and ADHD. I consider the relationship between dyslexia and reading ability, and how genetic studies can help us to understand this better. I also consider the relationship between dyslexia and other developmental disorders, and how genetic studies can help us to understand this better. Lastly, I discuss methods to boost power for GWAS of reading ability
Location Reference Recognition from Texts: A Survey and Comparison
A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matchingâbased, statistical learning-âbased, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27Â most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs
Ecological and confined domain ontology construction scheme using concept clustering for knowledge management
Knowledge management in a structured system is a complicated task that requires common, standardized methods that are acceptable to all actors in a system. Ontology, in this regard, is a primary element and plays a central role in knowledge management, interoperability between various departments, and better decision making. The ontology construction for structured systems comprises logical and structural complications. Researchers have already proposed a variety of domain ontology construction schemes. However, these schemes do not involve some important phases of ontology construction that make ontologies more collaborative. Furthermore, these schemes do not provide details of the activities and methods involved in the construction of an ontology, which may cause difficulty in implementing the ontology. The major objectives of this research were to provide a comparison between some existing ontology construction schemes and to propose an enhanced ecological and confined domain ontology construction (EC-DOC) scheme for structured knowledge management. The proposed scheme introduces five important phases to construct an ontology, with a major focus on the conceptualizing and clustering of domain concepts. In the conceptualization phase, a glossary of domain-related concepts and their properties is maintained, and a Fuzzy C-Mean soft clustering mechanism is used to form the clusters of these concepts. In addition, the localization of concepts is instantly performed after the conceptualization phase, and a translation file of localized concepts is created. The EC-DOC scheme can provide accurate concepts regarding the terms for a specific domain, and these concepts can be made available in a preferred local language
CALANGO: a phylogeny-aware comparative genomics tool for discovering quantitative genotype-phenotype associations across species
Living species vary significantly in phenotype and genomic content. Sophisticated statistical methods linking genes with phenotypes within a species have led to breakthroughs in complex genetic diseases and genetic breeding. Despite the abundance of genomic and phenotypic data available for thousands of species, finding genotype-phenotype associations across species is challenging due to the non-independence of species data resulting from common ancestry. To address this, we present CALANGO (comparative analysis with annotation-based genomic components), a phylogeny-aware comparative genomics tool to find homologous regions and biological roles associated with quantitative phenotypes across species. In two case studies, CALANGO identified both known and previously unidentified genotype-phenotype associations. The first study revealed unknown aspects of the ecological interaction between Escherichia coli, its integrated bacteriophages, and the pathogenicity phenotype. The second identified an association between maximum height in angiosperms and the expansion of a reproductive mechanism that prevents inbreeding and increases genetic diversity, with implications for conservation biology and agriculture
- âŠ