13 research outputs found

    Machine Learning Approach-based Big Data Imputation Methods for Outdoor Air Quality Forecasting

    Get PDF
    Missing data from ambient air databases is a typical issue, but it is much worse in small towns or cities. Missing data is a significant concern for environmental epidemiology. These settings have high pollution exposure levels worldwide, and dataset gaps obstruct health investigations that could later affect local and international policies. When a substantial number of observations contain missing values, the standard errors increase due to the smaller sample size, which may significantly affect the final result. Generally, the performance of various missing value imputation algorithms is proportional to the size of the database and the percentage of missing values within it. This paper proposes and demonstrates an ensemble – imputation – classification framework approach to rebuild air quality information using a dataset from Beijing, China, to forecast air quality. Various single and multiple imputation procedures are utilized to fill the missing records. Then ensemble of diverse classifiers is used on the imputed data to find the air pollution level. The recommended model aims to reduce the error rate and improve accuracy. Extensive testing of datasets with actual missing values has revealed that the suggested methodology significantly enhances the air quality forecasting model’s accuracy with multiple imputation and ensemble techniques when compared to other conventional single imputation techniques

    Estimation, Decision and Applications to Target Tracking

    Get PDF
    This dissertation mainly consists of three parts. The first part proposes generalized linear minimum mean-square error (GLMMSE) estimation for nonlinear point estimation. The second part proposes a recursive joint decision and estimation (RJDE) algorithm for joint decision and estimation (JDE). The third part analyzes the performance of sequential probability ratio test (SPRT) when the log-likelihood ratios (LLR) are independent but not identically distributed. The linear minimum mean-square error (LMMSE) estimation plays an important role in nonlinear estimation. It searches for the best estimator in the set of all estimators that are linear in the measurement. A GLMMSE estimation framework is proposed in this disser- tation. It employs a vector-valued measurement transform function (MTF) and finds the best estimator among all estimators that are linear in MTF. Several design guidelines for the MTF based on a numerical example were provided. A RJDE algorithm based on a generalized Bayes risk is proposed in this dissertation for dynamic JDE problems. It is computationally efficient for dynamic problems where data are made available sequentially. Further, since existing performance measures for estimation or decision are effective to evaluate JDE algorithms, a joint performance measure is proposed for JDE algorithms for dynamic problems. The RJDE algorithm is demonstrated by applications to joint tracking and classification as well as joint tracking and detection in target tracking. The characteristics and performance of SPRT are characterized by two important functions—operating characteristic (OC) and average sample number (ASN). These two functions have been studied extensively under the assumption of independent and identically distributed (i.i.d.) LLR, which is too stringent for many applications. This dissertation relaxes the requirement of identical distribution. Two inductive equations governing the OC and ASN are developed. Unfortunately, they have non-unique solutions in the general case. They do have unique solutions in two special cases: (a) the LLR sequence converges in distributions and (b) the LLR sequence has periodic distributions. Further, the analysis can be readily extended to evaluate the performance of the truncated SPRT and the cumulative sum test

    Large-scale inference in the focally damaged human brain

    Get PDF
    Clinical outcomes in focal brain injury reflect the interactions between two distinct anatomically distributed patterns: the functional organisation of the brain and the structural distribution of injury. The challenge of understanding the functional architecture of the brain is familiar; that of understanding the lesion architecture is barely acknowledged. Yet, models of the functional consequences of focal injury are critically dependent on our knowledge of both. The studies described in this thesis seek to show how machine learning-enabled high-dimensional multivariate analysis powered by large-scale data can enhance our ability to model the relation between focal brain injury and clinical outcomes across an array of modelling applications. All studies are conducted on internationally the largest available set of MR imaging data of focal brain injury in the context of acute stroke (N=1333) and employ kernel machines at the principal modelling architecture. First, I examine lesion-deficit prediction, quantifying the ceiling on achievable predictive fidelity for high-dimensional and low-dimensional models, demonstrating the former to be substantially higher than the latter. Second, I determine the marginal value of adding unlabelled imaging data to predictive models within a semi-supervised framework, quantifying the benefit of assembling unlabelled collections of clinical imaging. Third, I compare high- and low-dimensional approaches to modelling response to therapy in two contexts: quantifying the effect of treatment at the population level (therapeutic inference) and predicting the optimal treatment in an individual patient (prescriptive inference). I demonstrate the superiority of the high-dimensional approach in both settings

    An Intelligent Time and Performance Efficient Algorithm for Aircraft Design Optimization

    Get PDF
    Die Optimierung des Flugzeugentwurfs erfordert die Beherrschung der komplexen Zusammenhänge mehrerer Disziplinen. Trotz seiner Abhängigkeit von einer Vielzahl unabhängiger Variablen zeichnet sich dieses komplexe Entwurfsproblem durch starke indirekte Verbindungen und eine daraus resultierende geringe Anzahl lokaler Minima aus. Kürzlich entwickelte intelligente Methoden, die auf selbstlernenden Algorithmen basieren, ermutigten die Suche nach einer diesem Bereich zugeordneten neuen Methode. Tatsächlich wird der in dieser Arbeit entwickelte Hybrid-Algorithmus (Cavus) auf zwei Hauptdesignfälle im Luft- und Raumfahrtbereich angewendet: Flugzeugentwurf- und Flugbahnoptimierung. Der implementierte neue Ansatz ist in der Lage, die Anzahl der Versuchspunkte ohne große Kompromisse zu reduzieren. Die Trendanalyse zeigt, dass der Cavus-Algorithmus für die komplexen Designprobleme, mit einer proportionalen Anzahl von Prüfpunkten konservativer ist, um die erfolgreichen Muster zu finden. Aircraft Design Optimization requires mastering of the complex interrelationships of multiple disciplines. Despite its dependency on a diverse number of independent variables, this complex design problem has favourable nature as having strong indirect links and as a result a low number of local minimums. Recently developed intelligent methods that are based on self-learning algorithms encouraged finding a new method dedicated to this area. Indeed, the hybrid (Cavus) algorithm developed in this thesis is applied two main design cases in aerospace area: aircraft design optimization and trajectory optimization. The implemented new approach is capable of reducing the number of trial points without much compromise. The trend analysis shows that, for the complex design problems the Cavus algorithm is more conservative with a proportional number of trial points in finding the successful patterns

    Random regression models and their impact in the genetic evaluation of binary fertility traits in beef cattle

    Get PDF
    2021 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Data mining in computational finance

    Get PDF
    Computational finance is a relatively new discipline whose birth can be traced back to early 1950s. Its major objective is to develop and study practical models focusing on techniques that apply directly to financial analyses. The large number of decisions and computationally intensive problems involved in this discipline make data mining and machine learning models an integral part to improve, automate, and expand the current processes. One of the objectives of this research is to present a state-of-the-art of the data mining and machine learning techniques applied in the core areas of computational finance. Next, detailed analysis of public and private finance datasets is performed in an attempt to find interesting facts from data and draw conclusions regarding the usefulness of features within the datasets. Credit risk evaluation is one of the crucial modern concerns in this field. Credit scoring is essentially a classification problem where models are built using the information about past applicants to categorise new applicants as ‘creditworthy’ or ‘non-creditworthy’. We appraise the performance of a few classical machine learning algorithms for the problem of credit scoring. Typically, credit scoring databases are large and characterised by redundant and irrelevant features, making the classification task more computationally-demanding. Feature selection is the process of selecting an optimal subset of relevant features. We propose an improved information-gain directed wrapper feature selection method using genetic algorithms and successfully evaluate its effectiveness against baseline and generic wrapper methods using three benchmark datasets. One of the tasks of financial analysts is to estimate a company’s worth. In the last piece of work, this study predicts the growth rate for earnings of companies using three machine learning techniques. We employed the technique of lagged features, which allowed varying amounts of recent history to be brought into the prediction task, and transformed the time series forecasting problem into a supervised learning problem. This work was applied on a private time series dataset

    Vol. 10, No. 1 (Full Issue)

    Get PDF

    Novel Approaches for Structural Health Monitoring

    Get PDF
    The thirty-plus years of progress in the field of structural health monitoring (SHM) have left a paramount impact on our everyday lives. Be it for the monitoring of fixed- and rotary-wing aircrafts, for the preservation of the cultural and architectural heritage, or for the predictive maintenance of long-span bridges or wind farms, SHM has shaped the framework of many engineering fields. Given the current state of quantitative and principled methodologies, it is nowadays possible to rapidly and consistently evaluate the structural safety of industrial machines, modern concrete buildings, historical masonry complexes, etc., to test their capability and to serve their intended purpose. However, old unsolved problematics as well as new challenges exist. Furthermore, unprecedented conditions, such as stricter safety requirements and ageing civil infrastructure, pose new challenges for confrontation. Therefore, this Special Issue gathers the main contributions of academics and practitioners in civil, aerospace, and mechanical engineering to provide a common ground for structural health monitoring in dealing with old and new aspects of this ever-growing research field
    corecore