250 research outputs found

    Automating the Annotation of Data through Machine Learning and Semantic Technologies

    Get PDF
    The ever-increasing scale and complexity of scientific research is surpassing our means to assimilate newly produced knowledge. Computer tools are necessary for the organisation, retrieval, and interpretation of new scientific knowledge and data. The efficacy of such tools requires that research outputs are described by rich machine-readable metadata. Ontologies provide the framework to unambiguously describe the meaning of knowledge and data, so that it may be re-used or combined to synthesise new knowledge. However, manually annotating research with ontology terms, a process called semantic annotation, is also infeasible due to the aforementioned scale. This thesis describes research to develop deep learning-based tools for semantic annotation. The approaches described explore different methods for exploiting the domain knowledge encoded into ontologies to avoid the need to manually curate training corpora. They also take advantage of the inherent integrative capabilities of ontologies, to leverage combinations of heterogeneous knowledge to improve annotation performance and model interpretability. Several models exceeded previous benchmarks for semantic annotation in the bio-medical domain. This thesis concludes with a discussion of the strengths and limitations of the methods, and the implications for multi-domain ontology semantic annotation and for explainable artificial intelligence

    Spectral Feature Selection for Data Mining

    Get PDF
    This timely introduction to spectral feature selection illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. It presents the theoretical foundations of spectral feature selection, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection. Source code for the algorithms is available online

    Haplotype estimation in polyploids using DNA sequence data

    Get PDF
    Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p

    Metalearning

    Get PDF
    This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user. This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience. This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence. ; Metalearning is the study of principled methods that exploit metaknowledge to obtain efficient models and solutions by adapting machine learning and data mining processes. While the variety of machine learning and data mining techniques now available can, in principle, provide good model solutions, a methodology is still needed to guide the search for the most appropriate model in an efficient way. Metalearning provides one such methodology that allows systems to become more effective through experience. This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms. It shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems. It can thus help developers improve their algorithms and also develop learning systems that can improve themselves. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining and artificial intelligence

    Statistical Methods in Neuroimaging Genetics: Pathways Sparse Regression and Cluster Size Inference

    No full text
    In the field of neuroimaging genetics, brain images are used as phenotypes in the search for genetic variants associated with brain structure or function. This search presents a formidable statistical challenge, not least because of the very high dimensionality of genotype and phenotype data produced by modern SNP (single nucleotide polymorphism) arrays and high resolution MRI. This thesis focuses on the use of multivariate sparse regression models such as the group lasso and sparse group lasso for the identification of gene pathways associated with both univariate and multivariate quantitative traits. The methods described here take particular account of various factors specific to pathways genome-wide association studies including widespread correlation (linkage disequilibrium) between genetic predictors, and the fact that many variants overlap multiple pathways. A resampling strategy that exploits finite sample variability is employed to provide robust rankings for pathways, SNPs and genes. Comprehensive simulation studies are presented comparing one proposed method, pathways group lasso with adaptive weights, to a popular alternative. This method is extended to the case of a multivariate phenotype, and the resulting pathways sparse reduced-rank regression model and algorithm is applied to a study identifying gene pathways associated with structural change in the brain characteristic of Alzheimer’s disease. The original model is also adapted for the task of ’pathways-driven’ SNP and gene selection, and this latter model, pathways sparse group lasso with adaptive weights, is applied in a search for SNPs and genes associated with elevated lipid levels in two separate cohorts of Asian adults. Finally, in a separate section an existing method for the identification of spatially extended clusters of image voxels with heightened activation is evaluated in an imaging genetic context. This method, known as cluster size inference, rests on a number of assumptions. Using real imaging and SNP data, false positive rates are found to be poorly controlled outside of a narrow range of parameters related to image smoothness and activation thresholds for cluster formation

    Metalearning

    Get PDF
    This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user. This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience. This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence. ; Metalearning is the study of principled methods that exploit metaknowledge to obtain efficient models and solutions by adapting machine learning and data mining processes. While the variety of machine learning and data mining techniques now available can, in principle, provide good model solutions, a methodology is still needed to guide the search for the most appropriate model in an efficient way. Metalearning provides one such methodology that allows systems to become more effective through experience. This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms. It shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems. It can thus help developers improve their algorithms and also develop learning systems that can improve themselves. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining and artificial intelligence

    National Aeronautics and Space Administration (NASA)/American Society for Engineering Education (ASEE) Summer Faculty Fellowship Program: 1995.

    Get PDF
    The JSC NASA/ASEE Summer Faculty Fellowship Program was conducted at JSC, including the White Sands Test Facility, by Texas A&M University and JSC. The objectives of the program, which began nationally in 1964 and at JSC in 1965, are (1) to further the professional knowledge of qualified engineering and science faculty members; (2) to stimulate an exchange of ideas between participants and NASA; (3) to enrich and refresh the research and teaching activities of the participants' institutions; and (4) to contribute to the research objectives of the NASA centers. Each faculty fellow spent at least 10 weeks at JSC engaged in a research project in collaboration with a NASA/JSC colleague. In addition to the faculty participants, the 1995 program included five students. This document is a compilation of the final reports on the research projects completed by the faculty fellows and visiting students during the summer of 1995. The reports of two of the students are integral with that of the respective fellow. Three students wrote separate reports

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    • …
    corecore