326,467 research outputs found

    Real-valued feature selection for process approximation and prediction

    Get PDF
    The selection of features for classification, clustering and approximation is an important task in pattern recognition, data mining and soft computing. For real-valued features, this contribution shows how feature selection for a high number of features can be implemented using mutual in-formation. Especially, the common problem for mutual information computation of computing joint probabilities for many dimensions using only a few samples is treated by using the Rènyi mutual information of order two as computational base. For this, the Grassberger-Takens corre-lation integral is used which was developed for estimating probability densities in chaos theory. Additionally, an adaptive procedure for computing the hypercube size is introduced and for real world applications, the treatment of missing values is included. The computation procedure is accelerated by exploiting the ranking of the set of real feature values especially for the example of time series. As example, a small blackbox-glassbox example shows how the relevant features and their time lags are determined in the time series even if the input feature time series determine nonlinearly the output. A more realistic example from chemical industry shows that this enables a better ap-proximation of the input-output mapping than the best neural network approach developed for an international contest. By the computationally efficient implementation, mutual information becomes an attractive tool for feature selection even for a high number of real-valued features

    Energy performance forecasting of residential buildings using fuzzy approaches

    Get PDF
    The energy consumption used for domestic purposes in Europe is, to a considerable extent, due to heating and cooling. This energy is produced mostly by burning fossil fuels, which has a high negative environmental impact. The characteristics of a building are an important factor to determine the necessities of heating and cooling loads. Therefore, the study of the relevant characteristics of the buildings, regarding the heating and cooling needed to maintain comfortable indoor air conditions, could be very useful in order to design and construct energy-efficient buildings. In previous studies, different machine-learning approaches have been used to predict heating and cooling loads from the set of variables: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution. However, none of these methods are based on fuzzy logic. In this research, we study two fuzzy logic approaches, i.e., fuzzy inductive reasoning (FIR) and adaptive neuro fuzzy inference system (ANFIS), to deal with the same problem. Fuzzy approaches obtain very good results, outperforming all the methods described in previous studies except one. In this work, we also study the feature selection process of FIR methodology as a pre-processing tool to select the more relevant variables before the use of any predictive modelling methodology. It is proven that FIR feature selection provides interesting insights into the main building variables causally related to heating and cooling loads. This allows better decision making and design strategies, since accurate cooling and heating load estimations and correct identification of parameters that affect building energy demands are of high importance to optimize building designs and equipment specifications.Peer ReviewedPostprint (published version

    Challenges in interpreting allergen microarrays in relation to clinical symptoms: a machine learning approach.

    Get PDF
    Identifying different patterns of allergens and understanding their predictive ability in relation to asthma and other allergic diseases is crucial for the design of personalized diagnostic tools.Allergen-IgE screening using ImmunoCAP ISAC(®) assay was performed at age 11 yrs in children participating a population-based birth cohort. Logistic regression (LR) and nonlinear statistical learning models, including random forests (RF) and Bayesian networks (BN), coupled with feature selection approaches, were used to identify patterns of allergen responses associated with asthma, rhino-conjunctivitis, wheeze, eczema and airway hyper-reactivity (AHR, positive methacholine challenge). Sensitivity/specificity and area under the receiver operating characteristic (AUROC) were used to assess model performance via repeated validation.Serum sample for IgE measurement was obtained from 461 of 822 (56.1%) participants. Two hundred and thirty-eight of 461 (51.6%) children had at least one of 112 allergen components IgE > 0 ISU. The binary threshold >0.3 ISU performed less well than using continuous IgE values, discretizing data or using other data transformations, but not significantly (p = 0.1). With the exception of eczema (AUROC~0.5), LR, RF and BN achieved comparable AUROC, ranging from 0.76 to 0.82. Dust mite, pollens and pet allergens were highly associated with asthma, whilst pollens and dust mite with rhino-conjunctivitis. Egg/bovine allergens were associated with eczema.After validation, LR, RF and BN demonstrated reasonable discrimination ability for asthma, rhino-conjunctivitis, wheeze and AHR, but not for eczema. However, further improvements in threshold ascertainment and/or value transformation for different components, and better interpretation algorithms are needed to fully capitalize on the potential of the technology
    corecore