10 research outputs found

    Non-linear mapping for exploratory data analysis in functional genomics

    Get PDF
    BACKGROUND: Several supervised and unsupervised learning tools are available to classify functional genomics data. However, relatively less attention has been given to exploratory, visualisation-driven approaches. Such approaches should satisfy the following factors: Support for intuitive cluster visualisation, user-friendly and robust application, computational efficiency and generation of biologically meaningful outcomes. This research assesses a relaxation method for non-linear mapping that addresses these concerns. Its applications to gene expression and protein-protein interaction data analyses are investigated RESULTS: Publicly available expression data originating from leukaemia, round blue-cell tumours and Parkinson disease studies were analysed. The method distinguished relevant clusters and critical analysis areas. The system does not require assumptions about the inherent class structure of the data, its mapping process is controlled by only one parameter and the resulting transformations offer intuitive, meaningful visual displays. Comparisons with traditional mapping models are presented. As a way of promoting potential, alternative applications of the methodology presented, an example of exploratory data analysis of interactome networks is illustrated. Data from the C. elegans interactome were analysed. Results suggest that this method might represent an effective solution for detecting key network hubs and for clustering biologically meaningful groups of proteins. CONCLUSION: A relaxation method for non-linear mapping provided the basis for visualisation-driven analyses using different types of data. This study indicates that such a system may represent a user-friendly and robust approach to exploratory data analysis. It may allow users to gain better insights into the underlying data structure, detect potential outliers and assess assumptions about the cluster composition of the data

    An investigation of the design and use of feed-forward artificial neural networks in the classification of remotely sensed images

    Get PDF
    Artificial neural networks (ANNs) have attracted the attention of researchers in many fields, and have been used to solve a wide range of problems. In the field of remote sensing they have been used in a variety of applications, including land cover mapping, image compression, geological mapping and meteorological image classification, and have generally proved to be more powerful than conventional statistical classifiers, especially when training data are limited and the data in each class are not normally distributed. The use of ANNs requires some critical decisions on the part of the user. These decisions, which are mainly concerned with the determinations of the components of the network structure and the parameters defined for the learning algorithm, can significantly affect the accuracy of the resulting classification. Although there are some discussions in the literature regarding the issues that affect network performance, there is no standard method or approach that is universally accepted to determine the optimum values of these parameters for a particular problem. In this thesis, a feed-forward network structure that learns the characteristics of the training data through the backpropagation learning algorithm is employed to classify land cover features using multispectral, multitemporal, and multisensory image data. The thesis starts with a review and discussion of general principles of classification and the use of artificial neural networks. Special emphasis is put on the issue of feature selection, due to the availability of hyperspectral image data from recent sensors. The primary aims of this research are to comprehensively investigate the impact of the choice of network architecture and initial parameter estimates, and to compare a number of heuristics developed by researchers. The most effective heuristics are identified on the basis of a large number of experiments employing two real-world datasets, and the superiority of the optimum settings using the 'best' heuristics is then validated using an independent dataset. The results are found to be promising in terms of ease of design and use of ANNs, and in producing considerably higher classification accuracies than either the maximum likelihood or neural network classifiers constructed using ad hoc design and implementation strategies. A number of conclusions are drawn and later used to generate a comprehensive set of guidelines that will facilitate the process of design and use of artificial neural networks in remote sensing image classification. This study also explores the use of visualisation techniques in understanding the behaviour of artificial neural networks and the results produced by them. A number of visual analysis techniques are employed to examine the internal characteristics of the training data. For this purpose, a toolkit allowing the analyst to perform a variety of visualisation and analysis procedures was created using the MATLAB software package, and is available in the accompanying CD-ROM. This package was developed during the course of this research, and contains the tools used during the investigations reported in this thesis. The contribution to knowledge of the research work reported in this thesis lies in the identification of optimal strategies for the use of ANNs in land cover classifications based on remotely sensed data. Further contributions include an indepth analysis of feature selection methods for use with high-dimensional datasets, and the production of a MATLAB toolkit that implements the methods used in this study

    An investigation of the design and use of feed-forward artificial neural networks in the classification of remotely sensed images

    Get PDF
    Artificial neural networks (ANNs) have attracted the attention of researchers in many fields, and have been used to solve a wide range of problems. In the field of remote sensing they have been used in a variety of applications, including land cover mapping, image compression, geological mapping and meteorological image classification, and have generally proved to be more powerful than conventional statistical classifiers, especially when training data are limited and the data in each class are not normally distributed. The use of ANNs requires some critical decisions on the part of the user. These decisions, which are mainly concerned with the determinations of the components of the network structure and the parameters defined for the learning algorithm, can significantly affect the accuracy of the resulting classification. Although there are some discussions in the literature regarding the issues that affect network performance, there is no standard method or approach that is universally accepted to determine the optimum values of these parameters for a particular problem. In this thesis, a feed-forward network structure that learns the characteristics of the training data through the backpropagation learning algorithm is employed to classify land cover features using multispectral, multitemporal, and multisensory image data. The thesis starts with a review and discussion of general principles of classification and the use of artificial neural networks. Special emphasis is put on the issue of feature selection, due to the availability of hyperspectral image data from recent sensors. The primary aims of this research are to comprehensively investigate the impact of the choice of network architecture and initial parameter estimates, and to compare a number of heuristics developed by researchers. The most effective heuristics are identified on the basis of a large number of experiments employing two real-world datasets, and the superiority of the optimum settings using the 'best' heuristics is then validated using an independent dataset. The results are found to be promising in terms of ease of design and use of ANNs, and in producing considerably higher classification accuracies than either the maximum likelihood or neural network classifiers constructed using ad hoc design and implementation strategies. A number of conclusions are drawn and later used to generate a comprehensive set of guidelines that will facilitate the process of design and use of artificial neural networks in remote sensing image classification. This study also explores the use of visualisation techniques in understanding the behaviour of artificial neural networks and the results produced by them. A number of visual analysis techniques are employed to examine the internal characteristics of the training data. For this purpose, a toolkit allowing the analyst to perform a variety of visualisation and analysis procedures was created using the MATLAB software package, and is available in the accompanying CD-ROM. This package was developed during the course of this research, and contains the tools used during the investigations reported in this thesis. The contribution to knowledge of the research work reported in this thesis lies in the identification of optimal strategies for the use of ANNs in land cover classifications based on remotely sensed data. Further contributions include an indepth analysis of feature selection methods for use with high-dimensional datasets, and the production of a MATLAB toolkit that implements the methods used in this study

    Non Linear Modelling of Financial Data Using Topologically Evolved Neural Network Committees

    No full text
    Most of artificial neural network modelling methods are difficult to use as maximising or minimising an objective function in a non-linear context involves complex optimisation algorithms. Problems related to the efficiency of these algorithms are often mixed with the difficulty of the a priori estimation of a network's fixed topology for a specific problem making it even harder to appreciate the real power of neural networks. In this thesis, we propose a method that overcomes these issues by using genetic algorithms to optimise a network's weights and topology, simultaneously. The proposed method searches for virtually any kind of network whether it is a simple feed forward, recurrent, or even an adaptive network. When the data is high dimensional, modelling its often sophisticated behaviour is a very complex task that requires the optimisation of thousands of parameters. To enable optimisation techniques to overpass their limitations or failure, practitioners use methods to reduce the dimensionality of the data space. However, some of these methods are forced to make unrealistic assumptions when applied to non-linear data while others are very complex and require a priori knowledge of the intrinsic dimension of the system which is usually unknown and very difficult to estimate. The proposed method is non-linear and reduces the dimensionality of the input space without any information on the system's intrinsic dimension. This is achieved by first searching in a low dimensional space of simple networks, and gradually making them more complex as the search progresses by elaborating on existing solutions. The high dimensional space of the final solution is only encountered at the very end of the search. This increases the system's efficiency by guaranteeing that the network becomes no more complex than necessary. The modelling performance of the system is further improved by searching not only for one network as the ideal solution to a specific problem, but a combination of networks. These committces of networks are formed by combining a diverse selection of network species from a population of networks derived by the proposed method. This approach automatically exploits the strengths and weaknesses of each member of the committee while avoiding having all members giving the same bad judgements at the same time. In this thesis, the proposed method is used in the context of non-linear modelling of high-dimensional financial data. Experimental results are'encouraging as both robustness and complexity are concerned.Imperial Users onl

    Applications of multivariate statistics in honey bee research, analysis of metabolomics data from samples of honey bee propolis

    Get PDF
    This thesis was previously held under moratorium from 20/04/2020 to 20/04/2022Honey bees play a significant role both ecologically and economically, through the pollination of flowering plants and crops. Additionally, honey is an ancient food source that is highly valued by different religions and cultures and has been shown to possess a wide range of beneficial uses, including cosmetic treatment, eye disease, bronchial asthma and hiccups. In addition to honey, honey bees also produce beeswax, pollen, royal jelly and propolis. In this thesis, data is studied which comes from samples of propolis from various geographical locations. Propolis is a resinous product, which consists of a combination of beeswax, saliva and resins that have been gathered by honey bees from the exudates of various surrounding plants. It is used by the bees to seal small gaps and maintain the hives, but is also an anti-microbial substance that may protect them against disease. The appearance and consistency of propolis changes depending on the temperature; it becomes elastic and sticky when warm, but hard and brittle when cold. Furthermore, its composition and colour varies from yellowish-green to dark brown, depending on its age and the sources of resin from the environment. Propolis is a highly biochemically active substance with many potential benefits in health care, which have attracted much attention. Biochemical analysis of propolis leads to highly multivariate metabolomics data. The main benefit of metabolomics is to generate a spectrum, in which peaks correspond to different chemical components, making possible the detection of multiple substances simultaneously. Relevant spectral features may be used for pattern recognition. The purpose of this research is to study methods used for statistical analysis of biochemical data arising from propolis samples. We investigate the use of different statistical methods for metabolomics data from chemical analysis of propolis samples using Mass Spectrometry (MS). Methods studied will include pre-treatment methods and multivariate analysis techniques including principal component analysis (PCA), multidimensional scaling (MDS), and clustering methods including hierarchical cluster analysis (HCA), k-means clustering and self organising maps (SOMs). Background material and results of data analysis will be presented from samples of propolis from beehives in Scotland, Libya and Europe. Conclusions are drawn in terms of the data sets themselves as well as the properties of the different methods studied for analysing such metabolomics data.Honey bees play a significant role both ecologically and economically, through the pollination of flowering plants and crops. Additionally, honey is an ancient food source that is highly valued by different religions and cultures and has been shown to possess a wide range of beneficial uses, including cosmetic treatment, eye disease, bronchial asthma and hiccups. In addition to honey, honey bees also produce beeswax, pollen, royal jelly and propolis. In this thesis, data is studied which comes from samples of propolis from various geographical locations. Propolis is a resinous product, which consists of a combination of beeswax, saliva and resins that have been gathered by honey bees from the exudates of various surrounding plants. It is used by the bees to seal small gaps and maintain the hives, but is also an anti-microbial substance that may protect them against disease. The appearance and consistency of propolis changes depending on the temperature; it becomes elastic and sticky when warm, but hard and brittle when cold. Furthermore, its composition and colour varies from yellowish-green to dark brown, depending on its age and the sources of resin from the environment. Propolis is a highly biochemically active substance with many potential benefits in health care, which have attracted much attention. Biochemical analysis of propolis leads to highly multivariate metabolomics data. The main benefit of metabolomics is to generate a spectrum, in which peaks correspond to different chemical components, making possible the detection of multiple substances simultaneously. Relevant spectral features may be used for pattern recognition. The purpose of this research is to study methods used for statistical analysis of biochemical data arising from propolis samples. We investigate the use of different statistical methods for metabolomics data from chemical analysis of propolis samples using Mass Spectrometry (MS). Methods studied will include pre-treatment methods and multivariate analysis techniques including principal component analysis (PCA), multidimensional scaling (MDS), and clustering methods including hierarchical cluster analysis (HCA), k-means clustering and self organising maps (SOMs). Background material and results of data analysis will be presented from samples of propolis from beehives in Scotland, Libya and Europe. Conclusions are drawn in terms of the data sets themselves as well as the properties of the different methods studied for analysing such metabolomics data

    An informatics based approach to respiratory healthcare.

    Get PDF
    By 2005 one person in every five UK households suffered with asthma. Research has shown that episodes of poor air quality can have a negative effect on respiratory health and is a growing concern for the asthmatic. To better inform clinical staff and patients to the contribution of poor air quality on patient health, this thesis defines an IT architecture that can be used by systems to identify environmental predictors leading to a decline in respiratory health of an individual patient. Personal environmental predictors of asthma exacerbation are identified by validating the delay between environmental predictors and decline in respiratory health. The concept is demonstrated using prototype software, and indicates that the analytical methods provide a mechanism to produce an early warning of impending asthma exacerbation due to poor air quality. The author has introduced the term enviromedics to describe this new field of research. Pattern recognition techniques are used to analyse patient-specific environments, and extract meaningful health predictors from the large quantities of data involved (often in the region of '/o million data points). This research proposes a suitable architecture that defines processes and techniques that enable the validation of patient-specific environmental predictors of respiratory decline. The design of the architecture was validated by implementing prototype applications that demonstrate, through hospital admissions data and personal lung function monitoring, that air quality can be used as a predictor of patient-specific health. The refined techniques developed during the research (such as Feature Detection Analysis) were also validated by the application prototypes. This thesis makes several contributions to knowledge, including: the process architecture; Feature Detection Analysis (FDA) that automates the detection of trend reversals within time series data; validation of the delay characteristic using a Self-organising Map (SOM) that is used as an unsupervised method of pattern recognition; Frequency, Boundary and Cluster Analysis (FBCA), an additional technique developed by this research to refine the SOM

    Hybridization of machine learning for advanced manufacturing

    Get PDF
    Tesis por compendio de publicacioines[ES] En el contexto de la industria, hoy por hoy, los términos “Fabricación Avanzada”, “Industria 4.0” y “Fábrica Inteligente” están convirtiéndose en una realidad. Las empresas industriales buscan ser más competitivas, ya sea en costes, tiempo, consumo de materias primas, energía, etc. Se busca ser eficiente en todos los ámbitos y además ser sostenible. El futuro de muchas compañías depende de su grado de adaptación a los cambios y su capacidad de innovación. Los consumidores son cada vez más exigentes, buscando productos personalizados y específicos con alta calidad, a un bajo coste y no contaminantes. Por todo ello, las empresas industriales implantan innovaciones tecnológicas para conseguirlo. Entre estas innovaciones tecnológicas están la ya mencionada Fabricación Avanzada (Advanced Manufacturing) y el Machine Learning (ML). En estos campos se enmarca el presente trabajo de investigación, en el que se han concebido y aplicado soluciones inteligentes híbridas que combinan diversas técnicas de ML para resolver problemas en el campo de la industria manufacturera. Se han aplicado técnicas inteligentes tales como Redes Neuronales Artificiales (RNA), algoritmos genéticos multiobjetivo, métodos proyeccionistas para la reducción de la dimensionalidad, técnicas de agrupamiento o clustering, etc. También se han utilizado técnicas de Identificación de Sistemas con el propósito de obtener el modelo matemático que representa mejor el sistema real bajo estudio. Se han hibridado diversas técnicas con el propósito de construir soluciones más robustas y fiables. Combinando técnicas de ML específicas se crean sistemas más complejos y con una mayor capacidad de representación/solución. Estos sistemas utilizan datos y el conocimiento sobre estos para resolver problemas. Las soluciones propuestas buscan solucionar problemas complejos del mundo real y de un amplio espectro, manejando aspectos como la incertidumbre, la falta de precisión, la alta dimensionalidad, etc. La presente tesis cubre varios casos de estudio reales, en los que se han aplicado diversas técnicas de ML a distintas problemáticas del campo de la industria manufacturera. Los casos de estudio reales de la industria en los que se ha trabajado, con cuatro conjuntos de datos diferentes, se corresponden con: • Proceso de fresado dental de alta precisión, de la empresa Estudio Previo SL. • Análisis de datos para el mantenimiento predictivo de una empresa del sector de la automoción, como es la multinacional Grupo Antolin. Adicionalmente se ha colaborado con el grupo de investigación GICAP de la Universidad de Burgos y con el centro tecnológico ITCL en los casos de estudio que forman parte de esta tesis y otros relacionados. Las diferentes hibridaciones de técnicas de ML desarrolladas han sido aplicadas y validadas con conjuntos de datos reales y originales, en colaboración con empresas industriales o centros de fresado, permitiendo resolver problemas actuales y complejos. De esta manera, el trabajo realizado no ha tenido sólo un enfoque teórico, sino que se ha aplicado de modo práctico permitiendo que las empresas industriales puedan mejorar sus procesos, ahorrar en costes y tiempo, contaminar menos, etc. Los satisfactorios resultados obtenidos apuntan hacia la utilidad y aportación que las técnicas de ML pueden realizar en el campo de la Fabricación Avanzada

    Estimating the upper ocean vertical temperature structure from surface temperature as applied to the southern Benguela

    Get PDF
    Includes bibliographical references.Underwater Sound Velocity Profiles (SVP) are used throughout the world by their respective navies for submarine and surface vessel strategic operations and exercises. Together with the sonar equations, the sound velocity profiles are of paramount importance to solve underwater sound detectability problems as they provide insight into the highly variable sound transmission loss. Oceanographic records of sea temperature-depth profiles are ordinarily incorporated into a sonar propagation model to determine the sound level at any point (range and depth). The ability to predict these environmental conditions with a defined level of confidence and accuracy significantly increases the situational awareness to in-theatre naval operators and fleet planners. The hypothesis in this thesis is that thermal characteristics of the water column in the southern Benguela can be numerically modeled and deduced from a single Sea Surface Temperature (SST) value, if provided with sufficient historic temperature-depth profiles for that region. For operational use, the SST would ideally be provided from near real time remotely sensed satellite derived data

    An informatics based approach to respiratory healthcare

    Get PDF
    By 2005 one person in every five UK households suffered with asthma. Research has shown that episodes of poor air quality can have a negative effect on respiratory health and is a growing concern for the asthmatic. To better inform clinical staff and patients to the contribution of poor air quality on patient health, this thesis defines an IT architecture that can be used by systems to identify environmental predictors leading to a decline in respiratory health of an individual patient. Personal environmental predictors of asthma exacerbation are identified by validating the delay between environmental predictors and decline in respiratory health. The concept is demonstrated using prototype software, and indicates that the analytical methods provide a mechanism to produce an early warning of impending asthma exacerbation due to poor air quality. The author has introduced the term enviromedics to describe this new field of research. Pattern recognition techniques are used to analyse patient-specific environments, and extract meaningful health predictors from the large quantities of data involved (often in the region of '/o million data points). This research proposes a suitable architecture that defines processes and techniques that enable the validation of patient-specific environmental predictors of respiratory decline. The design of the architecture was validated by implementing prototype applications that demonstrate, through hospital admissions data and personal lung function monitoring, that air quality can be used as a predictor of patient-specific health. The refined techniques developed during the research (such as Feature Detection Analysis) were also validated by the application prototypes. This thesis makes several contributions to knowledge, including: the process architecture; Feature Detection Analysis (FDA) that automates the detection of trend reversals within time series data; validation of the delay characteristic using a Self-organising Map (SOM) that is used as an unsupervised method of pattern recognition; Frequency, Boundary and Cluster Analysis (FBCA), an additional technique developed by this research to refine the SOM.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Artificial Intelligence in geospatial analysis: applications of self-organizing maps in the context of geographic information science.

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe size and dimensionality of available geospatial repositories increases every day, placing additional pressure on existing analysis tools, as they are expected to extract more knowledge from these databases. Most of these tools were created in a data poor environment and thus rarely address concerns of efficiency, dimensionality and automatic exploration. In addition, traditional statistical techniques present several assumptions that are not realistic in the geospatial data domain. An example of this is the statistical independence between observations required by most classical statistics methods, which conflicts with the well-known spatial dependence that exists in geospatial data. Artificial intelligence and data mining methods constitute an alternative to explore and extract knowledge from geospatial data, which is less assumption dependent. In this thesis, we study the possible adaptation of existing general-purpose data mining tools to geospatial data analysis. The characteristics of geospatial datasets seems to be similar in many ways with other aspatial datasets for which several data mining tools have been used with success in the detection of patterns and relations. It seems, however that GIS-minded analysis and objectives require more than the results provided by these general tools and adaptations to meet the geographical information scientist‟s requirements are needed. Thus, we propose several geospatial applications based on a well-known data mining method, the self-organizing map (SOM), and analyse the adaptations required in each application to fulfil those objectives and needs. Three main fields of GIScience are covered in this thesis: cartographic representation; spatial clustering and knowledge discovery; and location optimization.(...
    corecore