63 research outputs found

    Genetic effects of tissue-specific enhancers in schizophrenia and hypertrophic cardiomyopathy

    Get PDF
    Most human conditions develop in genetically susceptible individuals from the interaction with environmental risk factors. These complex disorders result from the summation of effects from multiple genetic risk loci. Genome-wide association studies (GWASes) measure the association of single nucleotide polymorphisms (SNPs) with traits or conditions, and allow the creation of individualised polygenic risk scores. However, these explain only a small portion of a condition’s genetic heritability. Further, there is evidence that schizophrenia GWAS signals are enriched within genomic regulatory blocks, which are clusters of conserved non-coding elements that span key developmental loci and function as long-range enhancers activating transcription of target developmental genes. This suggests that enhancer-based annotations might be useful to refine polygenic signals for schizophrenia. In this work, I aimed to increase the amount of variance explained by PRS for schizophrenia, and a comparison condition hypertrophic cardiomyopathy, using tissue-specific regulatory enhancer-promoter annotations. To do so, I developed neural- and cardiac-specific enhancer lists, which I tested for enrichment, respectively, in schizophrenia and hypertrophic cardiomyopathy (HCM) heritability. I found that neural-specific enhancers are highly enriched in schizophrenia heritability -- especially when overlapping genomic regulatory blocks. Then I created partitioned polygenic risk scores for enhancer-based and non-enhancer-based SNPs, where enhancer-based SNPs are prioritised. I further compared the amount of adjusted heritability for both conditions explained by original GWAS vs partitioned polygenic risk scores, and found up to a 6.5% increase in the Coefficient of Determination for schizophrenia, and similar amounts for HCM -- however, this was not statistically significant. The increasing trend was specific for brain-expressed enhancers in schizophrenia, while it was widespread for HCM. Finally, I considered whether neural-specific enhancer-based partitions might be better modelled in GWAS using nonadditive effects, however my results were inconclusive due to small sample sizes.Open Acces

    Text Analytics to Predict Time and Cause of Death from Verbal Autopsies

    Get PDF
    This thesis describes the first Text Analytics approach to predicting Causes of Death (CoD) from Verbal Autopsies (VA). VA is an alternative technique recommended by the World Health Organisation for ascertaining CoD in low and middle-income countries (LMIC). CoD information is vitally important in the provision of healthcare. CoD information from VA can be obtained via two main approaches: manual, also referred to as the physician-review and automatic. The automatic-based approach is an active research area due to its efficiency and cost effectiveness over the manual approach. VA contains both closed responses and open narrative text. However, the open narrative text has been ignored by the state-of-art automatic approaches and this remains a challenge and an important research issue. We hypothesise that it is feasible to predict CoD from the narratives of VA. We further contend that an automatic approach that could utilise the information contained in both narrative and closed response text of VA could lead to an improved prediction accuracy of CoD. This research has been formulated as a Text Classification problem, which employs Corpus and Computational Linguistics, Natural Language Processing and Machine Learning techniques to automatically classify VA documents according to CoD. Firstly, the research uses a VA corpus built from a sample collection of over 11,400 VA documents collected during a 10 year period in Ghana, West Africa. About 80 per cent of these documents have been annotated with CoD by medical experts. Secondly, we design experiments to identify Machine Learning techniques (algorithm, feature representation scheme, and feature reduction strategy) suitable for classifying VA open narratives (VAModel1). Thirdly, we propose novel methods of extracting features to build a model that predicts CoD from VA narratives using the annotated VA corpus as training and testing set. Furthermore, we develop two additional models: only closed responses based (VAModel2); and a hybrid of closed and open narrative based model (VAModel3). Our VAModel1 performs reasonably better than our baseline model, suggesting the feasibility of predicting the CoD from the VA open narratives. Overall, VAModel3 performance was observed to achieve better performance than VAModel1 but not significantly better than VAModel2. Also, in terms of reliability, VAModel1 obtained a moderate agreement (kappa score = 0.4) when compared with the gold standard– medical experts (average annotation agreement between medical experts, kappa score= 0.64). Furthermore, an acceptable agreement was obtained for VAModel2 (kappa score =0.71) and VAModel3 (kappa score =0.75), suggesting the reliability of these two models is better than medical experts. Also, a detailed analysis suggested that combining information from narratives and closed responses leads to an increase in performance for some CoD categories whereas information obtained from the closed responses part is enough for other CoD categories. Our research provides an alternative automatic approach to predicting CoD from VA, which is essential for LMIC. Therefore, further research into various aspects of the modelling process could improve the current performance of automatically predicting CoD from VAs

    New rough set based maximum partitioning attribute algorithm for categorical data clustering

    Get PDF
    Clustering a set of data into homogeneous groups is a fundamental operation in data mining. Recently, consideration has been put on categorical data clustering, where the data set consists of non-numerical attributes. However, implementing several existing categorical clustering algorithms is challenging as some cannot handle uncertainty while others have stability issues. The Rough Set theory (RST) is a mathematical tool for dealing with categorical data and handling uncertainty. It is also used to identify cause-effect relationships in databases as a form of learning and data mining. Therefore, this study aims to address the issues of uncertainty and stability for categorical clustering, and it proposes an improved algorithm centred on RST. The proposed method employed the partitioning measure to calculate the information system's positive and boundary regions of attributes. Firstly, an attributes partitioning method called Positive Region-based Indiscernibility (PRI) was developed to address the uncertainty issue in attribute partitioning for categorical data. The PRI method requires the positive and boundary regions-based partitioning calculation method. Next, to address the computational complexity issue in the clustering process, a clustering attribute selection method called Maximum Mean Partitioning (MMP) is introduced by computing the mean. The MMP method selects the maximum degree of the mean attribute, and the attribute with the maximum mean partitioning value is chosen as the best clustering attribute. The integration of proposed PRI and MMP methods generated a new rough set hybrid clustering algorithm for categorical data clustering algorithm named Maximum Partitioning Attribute (MPA) algorithm. This hybrid algorithm is an all-inclusive solution for uncertainty, computational complexity, cluster purity, and higher accuracy in attribute partitioning and selecting a clustering attribute. The proposed MPA algorithm is compared against the baseline algorithms, namely Maximum Significance Attribute (MSA), Information-Theoretic Dependency Roughness (ITDR), Maximum Indiscernibility Attribute (MIA), and simple classical K-Mean. In addition, seven small data sets from previously utilized research cases and 21 UCI repository and benchmark datasets are used for validation. Finally, the results were presented in tabular and graphical form, showing the proposed MPA algorithm outperforms the baseline algorithms for all data sets. Furthermore, the results showed that the proposed MPA algorithm improves the rough accuracy against MSA, ITDR, and MIA by 54.42%. Hence, the MPA algorithm has reduced the computational complexity compared to MSA, ITDR, and MIA with 77.11% less time and 58.66% minimum iterations. Similarly, a significant percentage improvement, up to 97.35%, was observed for overall purity by the MPA algorithm against MSA, ITDR, and MIA. In addition, the increment up to 34.41% of the overall accuracy of simple K-means by MPA has been obtained. Hence, it is proven that the proposed MPA has given promising solutions to address the categorical data clustering problem

    Domestication of open educational resources by academics in an open distance e-learning institution of South Africa

    Get PDF
    The emergence of open educational resources has gained popularity and acceptance in higher education institutions and beyond the basic education sector. This brought a persisting shift in depending on information communication technologies for tuition and research provision. Information technology artifact was not treated in isolation to user perspective. The study established how academics accept, feel, perceive, and what skills, opportunities, challenges exist to hinder the domestication. The study context had no uniform guidelines or tools and policy in place for the domestication of open educational resources. The study adopted the exploratory approach guided by the interpretivism paradigm. The study employed Domestication theory. This study conducted in an heterogenous single case study, which is the open distance e-learning (University of South Africa). That was done for an in-depth investigation by relying on multi-methods for data triangulation such as semi-structured interviews, focus group interviews, document analysis, and actual artifact analysis. The total of participants were 52. The study found that most academics played a role in the domestication of open educational resources besides the minority who were unable. The experience and prior knowledge were found to be a factor hindering the domestication process. Open distance e-learning found to relevant space for open educational resources. Such institutions play a role in the adoption and development of open educational resources and mostly rely in information technology for tuition and research. Information technology infrastructure found to be an enabler and disabler in the domestication process. This study contribution to the world of knowledge is based on the theory and practice. Eight theoretical propositions were suggested. The study further contributed by extension of domestication theory as recommended two additional phases which are non-appropriation and dis-appropriation. The current proposed Domestication theory has five phases. Lastly, the study recommended the actual guidelines for adoption and development of open educational resources. This guideline can be adopted by higher education institutions by infusing them in policy development or for general guidance in actual adoption and developments

    Understanding the Elements of Executable Architectures Through a Multi-Dimensional Analysis Framework

    Get PDF
    The objective of this dissertation study is to conduct a holistic investigation into the elements of executable architectures. Current research in the field of Executable Architectures has provided valuable solution-specific demonstrations and has also shown the value derived from such an endeavor. However, a common theory underlying their applications has been missing. This dissertation develops and explores a method for holistically developing an Executable Architecture Specification (EAS), i.e., a meta-model containing both semantic and syntactic information, using a conceptual framework for guiding data coding, analysis, and validation. Utilization of this method resulted in the description of the elements of executable architecture in terms of a set of nine information interrogatives: an executable architecture information ontology. Once the detail-rich EAS was constructed with this ontology, it became possible to define the potential elements of executable architecture through an intermediate level meta-model. The intermediate level meta-model was further refined into an interrogative level meta-model using only the nine information interrogatives, at a very high level of abstraction

    Potential Indirect Relationships in Productive Networks

    Get PDF
    Productive Networks, such as Social Networks Services, organize evidence about human behavior. This evidence is independent of the network content type, and may support the discovery of new relationships between users and content, or with other users. These indirect relationships are important for recommendation systems, and systems where potential relationships between users and content (e.g., locations) is relevant, such as with the emergency management domain, where the discovery of relationships between users and locations on productive networks may enable the identification of population density variations, increasing the accuracy of emergency alerts. This thesis presents a Productive Networks model, which enables the development of a methodology for indirect relationships discovery, using the metadata on the network, and avoiding the computational cost of content analysis. We designed and conducted a set of experiments to evaluate our proposals. Our results are twofold: firstly, the productive network model is sufficiently robust to represent a wide range of networks; secondly, the indirect relationship discovery methodology successfully identifies relevant relationships between users and content. We also present applications of the model and methodology in several contexts

    Principles and Applications of Data Science

    Get PDF
    Data science is an emerging multidisciplinary field which lies at the intersection of computer science, statistics, and mathematics, with different applications and related to data mining, deep learning, and big data. This Special Issue on “Principles and Applications of Data Science” focuses on the latest developments in the theories, techniques, and applications of data science. The topics include data cleansing, data mining, machine learning, deep learning, and the applications of medical and healthcare, as well as social media

    Undergraduate and Graduate Course Descriptions, 2013 Summer

    Get PDF
    Wright State University undergraduate and graduate course descriptions from Summer 2013
    corecore