16,858 research outputs found

    Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers

    Get PDF
    The quality of water prior to its use for drinking, farming or recreational purposes must comply with several physicochemical and microbiological standards to safeguard society and the environment. In order to satisfy these standards, expensive analyses and highly trained personnel in laboratories are required. Whereas macroinvertebrates have been used as ecological indicators to review the health of aquatic ecosystems. In this research, the relationship between microbial pathogens and macrobenthic invertebrate taxa was examined in the Machangara River located in the southern Andes of Ecuador, in which 33 sites, according to their land use, were chosen to collect physicochemical, microbiological and biological parameters. Decision tree models (DTMs) were used to generate rules that link the presence and abundance of some benthic families to microbial pathogen standards. The aforementioned DTMs provide an indirect, approximate, and quick way of checking the fulfillment of Ecuadorian regulations for water use related to microbial pathogens. The models built and optimized with the WEKA package, were evaluated based on both statistical and ecological criteria to make them as clear and simple as possible. As a result, two different and reliable models were obtained, which could be used as proxy indicators in a preliminary assessment of pollution of microbial pathogens in rivers. The DTMs can be easily applied by staff with minimal training in the identification of the sensitive taxa selected by the models. The presence of selected macroinvertebrate taxa in conjunction with the decision trees can be used as a screening tool to evaluate sites that require additional follow up analyses to confirm whether microbial water quality standards are met

    Imaging time series for the classification of EMI discharge sources

    Get PDF
    In this work, we aim to classify a wider range of Electromagnetic Interference (EMI) discharge sources collected from new power plant sites across multiple assets. This engenders a more complex and challenging classification task. The study involves an investigation and development of new and improved feature extraction and data dimension reduction algorithms based on image processing techniques. The approach is to exploit the Gramian Angular Field technique to map the measured EMI time signals to an image, from which the significant information is extracted while removing redundancy. The image of each discharge type contains a unique fingerprint. Two feature reduction methods called the Local Binary Pattern (LBP) and the Local Phase Quantisation (LPQ) are then used within the mapped images. This provides feature vectors that can be implemented into a Random Forest (RF) classifier. The performance of a previous and the two new proposed methods, on the new database set, is compared in terms of classification accuracy, precision, recall, and F-measure. Results show that the new methods have a higher performance than the previous one, where LBP features achieve the best outcome

    Improved customer choice predictions using ensemble methods

    Get PDF
    In this paper various ensemble learning methods from machinelearning and statistics are considered and applied to the customerchoice modeling problem. The application of ensemble learningusually improves the prediction quality of flexible models likedecision trees and thus leads to improved predictions. We giveexperimental results for two real-life marketing datasets usingdecision trees, ensemble versions of decision trees and thelogistic regression model, which is a standard approach for thisproblem. The ensemble models are found to improve upon individualdecision trees and outperform logistic regression.Next, an additive decomposition of the prediction error of amodel, the bias/variance decomposition, is considered. A modelwith a high bias lacks the flexibility to fit the data well. Ahigh variance indicates that a model is instable with respect todifferent datasets. Decision trees have a high variance componentand a low bias component in the prediction error, whereas logisticregression has a high bias component and a low variance component.It is shown that ensemble methods aim at minimizing the variancecomponent in the prediction error while leaving the bias componentunaltered. Bias/variance decompositions for all models for bothcustomer choice datasets are given to illustrate these concepts.brand choice;data mining;boosting;choice models;Bias/Variance decomposition;Bagging;CART;ensembles

    Entropy-based feature extraction for electromagnetic discharges classification in high-voltage power generation

    Get PDF
    This work exploits four entropy measures known as Sample, Permutation, Weighted Permutation, and Dispersion Entropy to extract relevant information from Electromagnetic Interference (EMI) discharge signals that are useful in fault diagnosis of High-Voltage (HV) equipment. Multi-class classification algorithms are used to classify or distinguish between various discharge sources such as Partial Discharges (PD), Exciter, Arcing, micro Sparking and Random Noise. The signals were measured and recorded on different sites followed by EMI expert’s data analysis in order to identify and label the discharge source type contained within the signal. The classification was performed both within each site and across all sites. The system performs well for both cases with extremely high classification accuracy within site. This work demonstrates the ability to extract relevant entropy-based features from EMI discharge sources from time-resolved signals requiring minimal computation making the system ideal for a potential application to online condition monitoring based on EMI

    On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

    Full text link
    We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

    Improvements on coronal hole detection in SDO/AIA images using supervised classification

    Full text link
    We demonstrate the use of machine learning algorithms in combination with segmentation techniques in order to distinguish coronal holes and filaments in SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques (intensity-based thresholding, SPoCA), we prepared data sets of manually labeled coronal hole and filament channel regions present on the Sun during the time range 2011 - 2013. By mapping the extracted regions from EUV observations onto HMI line-of-sight magnetograms we also include their magnetic characteristics. We computed shape measures from the segmented binary maps as well as first order and second order texture statistics from the segmented regions in the EUV images and magnetograms. These attributes were used for data mining investigations to identify the most performant rule to differentiate between coronal holes and filament channels. We applied several classifiers, namely Support Vector Machine, Linear Support Vector Machine, Decision Tree, and Random Forest and found that all classification rules achieve good results in general, with linear SVM providing the best performances (with a true skill statistic of ~0.90). Additional information from magnetic field data systematically improves the performance across all four classifiers for the SPoCA detection. Since the calculation is inexpensive in computing time, this approach is well suited for applications on real-time data. This study demonstrates how a machine learning approach may help improve upon an unsupervised feature extraction method.Comment: in press for SWS
    • …