8 research outputs found

    Soft computing models to analyze atmospheric pollution issues

    Get PDF
    Multidisciplinary research into statistical and soft computing models is detailed that analyses data on emissions of atmospheric pollution in urban areas. The research analyses the impact on atmospheric pollution of an extended bank holiday weekend in Spain. Levels of atmospheric pollution are classified in relation to the days of the week, seeking to differentiate between working days and non-working days by taking account of such aspects as industrial activity and traffic levels. The case study is based on data collected by a station at the city of Burgos, which forms part of the pollution measurement station network within the Spanish Autonomous Region of Castile-Leon

    Methods for Estimation of Intrinsic Dimensionality

    Get PDF
    Dimension reduction is an important tool used to describe the structure of complex data (explicitly or implicitly) through a small but sufficient number of variables, and thereby make data analysis more efficient. It is also useful for visualization purposes. Dimension reduction helps statisticians to overcome the ‘curse of dimensionality’. However, most dimension reduction techniques require the intrinsic dimension of the low-dimensional subspace to be fixed in advance. The availability of reliable intrinsic dimension (ID) estimation techniques is of major importance. The main goal of this thesis is to develop algorithms for determining the intrinsic dimensions of recorded data sets in a nonlinear context. Whilst this is a well-researched topic for linear planes, based mainly on principal components analysis, relatively little attention has been paid to ways of estimating this number for non–linear variable interrelationships. The proposed algorithms here are based on existing concepts that can be categorized into local methods, relying on randomly selected subsets of a recorded variable set, and global methods, utilizing the entire data set. This thesis provides an overview of ID estimation techniques, with special consideration given to recent developments in non–linear techniques, such as charting manifold and fractal–based methods. Despite their nominal existence, the practical implementation of these techniques is far from straightforward. The intrinsic dimension is estimated via Brand’s algorithm by examining the growth point process, which counts the number of points in hyper-spheres. The estimation needs to determine the starting point for each hyper-sphere. In this thesis we provide settings for selecting starting points which work well for most data sets. Additionally we propose approaches for estimating dimensionality via Brand’s algorithm, the Dip method and the Regression method. Other approaches are proposed for estimating the intrinsic dimension by fractal dimension estimation methods, which exploit the intrinsic geometry of a data set. The most popular concept from this family of methods is the correlation dimension, which requires the estimation of the correlation integral for a ball of radius tending to 0. In this thesis we propose new approaches to approximate the correlation integral in this limit. The new approaches are the Intercept method, the Slop method and the Polynomial method. In addition we propose a new approach, a localized global method, which could be defined as a local version of global ID methods. The objective of the localized global approach is to improve the algorithm based on a local ID method, which could significantly reduce the negative bias. Experimental results on real world and simulated data are used to demonstrate the algorithms and compare them to other methodology. A simulation study which verifies the effectiveness of the proposed methods is also provided. Finally, these algorithms are contrasted using a recorded data set from an industrial melter process

    Non Linear Modelling of Financial Data Using Topologically Evolved Neural Network Committees

    No full text
    Most of artificial neural network modelling methods are difficult to use as maximising or minimising an objective function in a non-linear context involves complex optimisation algorithms. Problems related to the efficiency of these algorithms are often mixed with the difficulty of the a priori estimation of a network's fixed topology for a specific problem making it even harder to appreciate the real power of neural networks. In this thesis, we propose a method that overcomes these issues by using genetic algorithms to optimise a network's weights and topology, simultaneously. The proposed method searches for virtually any kind of network whether it is a simple feed forward, recurrent, or even an adaptive network. When the data is high dimensional, modelling its often sophisticated behaviour is a very complex task that requires the optimisation of thousands of parameters. To enable optimisation techniques to overpass their limitations or failure, practitioners use methods to reduce the dimensionality of the data space. However, some of these methods are forced to make unrealistic assumptions when applied to non-linear data while others are very complex and require a priori knowledge of the intrinsic dimension of the system which is usually unknown and very difficult to estimate. The proposed method is non-linear and reduces the dimensionality of the input space without any information on the system's intrinsic dimension. This is achieved by first searching in a low dimensional space of simple networks, and gradually making them more complex as the search progresses by elaborating on existing solutions. The high dimensional space of the final solution is only encountered at the very end of the search. This increases the system's efficiency by guaranteeing that the network becomes no more complex than necessary. The modelling performance of the system is further improved by searching not only for one network as the ideal solution to a specific problem, but a combination of networks. These committces of networks are formed by combining a diverse selection of network species from a population of networks derived by the proposed method. This approach automatically exploits the strengths and weaknesses of each member of the committee while avoiding having all members giving the same bad judgements at the same time. In this thesis, the proposed method is used in the context of non-linear modelling of high-dimensional financial data. Experimental results are'encouraging as both robustness and complexity are concerned.Imperial Users onl

    Enhanced information extraction in the multi-energy x-ray tomography for security

    Full text link
    Thesis (Ph.D.)--Boston UniversityX-ray Computed Tomography (CT) is an effective nondestructive technology widely used for medical diagnosis and security. In CT, three-dimensional images of the interior of an object are generated based on its X-ray attenuation. Conventional CT is performed with a single energy spectrum and materials can only be differentiated based on an averaged measure of the attenuation. Multi-Energy CT (MECT) methods have been developed to provide more information about the chemical composition of the scanned material using multiple energy-selective measurements of the attenuation. Existing literature on MECT is mostly focused on differentiation between body tissues and other medical applications. The problems in security are more challenging due to the larger range of materials and threats which may be found. Objects may appear in high clutter and in different forms of concealment. Thus, the information extracted by the medical domain methods may not be optimal for detection of explosives and improved performance is desired. In this dissertation, learning and adaptive model-based methods are developed to address the challenges of multi-energy material discrimination for security. First, the fundamental information contained in the X-ray attenuation versus energy curves of materials is studied. For this purpose, a database of these curves for a set of explosive and non-explosive compounds was created. The dimensionality and span of the curves is estimated and their space is shown to be larger than two-dimensional, contrary to what is typically assumed. In addition, optimized feature selection methods are developed and applied to the curves and it is demonstrated that detection performance may be improved by using more than two features and when using features different than the standard photoelectric and Compton coefficients. Second, several MECT reconstruction methods are studied and compared. This includes a new structure-preserving inversion technique which can mitigate metal artifacts and provide precise object localization in the estimated parameter images. Finally, a learning-based MECT framework for joint material classification and segmentation is developed, which can produce accurate material labels in the presence of metal and clutter. The methods are tested on simulated and real multi-energy data and it is shown that they outperform previously published MECT techniques

    Artificial Intelligence in geospatial analysis: applications of self-organizing maps in the context of geographic information science.

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe size and dimensionality of available geospatial repositories increases every day, placing additional pressure on existing analysis tools, as they are expected to extract more knowledge from these databases. Most of these tools were created in a data poor environment and thus rarely address concerns of efficiency, dimensionality and automatic exploration. In addition, traditional statistical techniques present several assumptions that are not realistic in the geospatial data domain. An example of this is the statistical independence between observations required by most classical statistics methods, which conflicts with the well-known spatial dependence that exists in geospatial data. Artificial intelligence and data mining methods constitute an alternative to explore and extract knowledge from geospatial data, which is less assumption dependent. In this thesis, we study the possible adaptation of existing general-purpose data mining tools to geospatial data analysis. The characteristics of geospatial datasets seems to be similar in many ways with other aspatial datasets for which several data mining tools have been used with success in the detection of patterns and relations. It seems, however that GIS-minded analysis and objectives require more than the results provided by these general tools and adaptations to meet the geographical information scientist‟s requirements are needed. Thus, we propose several geospatial applications based on a well-known data mining method, the self-organizing map (SOM), and analyse the adaptations required in each application to fulfil those objectives and needs. Three main fields of GIScience are covered in this thesis: cartographic representation; spatial clustering and knowledge discovery; and location optimization.(...

    Cognitive Foundations for Visual Analytics

    Full text link

    Nonlinear dimensionality reduction of data manifolds with essential loops

    No full text
    Numerous methods or algorithms have been designed to solve the problem of nonlinear dimensionality reduction (NLDR). However, very few among them are able to embed efficiently 'circular' manifolds like cylinders or tori, which have one or more essential loops. This paper presents a simple and fast procedure that can tear or cut those manifolds, i.e. break their essential loops, in order to make their embedding in a low-dimensional space easier. The key idea is the following: starting from the available data points, the tearing procedure represents the underlying manifold by a graph and then builds a maximum subgraph with no loops anymore. Because it works with a graph, the procedure can preprocess data for all NLDR techniques that uses the same representation. Recent techniques using geodesic distances (Isomap, geodesic Sammon's mapping, geodesic CCA, etc.) or K-ary neighborhoods (LLE, hLLE, Laplacian eigenmaps) fall in that category. After describing the tearing procedure in details, the paper comments a few experimental results. (c) 2005 Elsevier B.V. All rights reserved
    corecore