100 research outputs found

    Unsupervised spectral sub-feature learning for hyperspectral image classification

    Get PDF
    Spectral pixel classification is one of the principal techniques used in hyperspectral image (HSI) analysis. In this article, we propose an unsupervised feature learning method for classification of hyperspectral images. The proposed method learns a dictionary of sub-feature basis representations from the spectral domain, which allows effective use of the correlated spectral data. The learned dictionary is then used in encoding convolutional samples from the hyperspectral input pixels to an expanded but sparse feature space. Expanded hyperspectral feature representations enable linear separation between object classes present in an image. To evaluate the proposed method, we performed experiments on several commonly used HSI data sets acquired at different locations and by different sensors. Our experimental results show that the proposed method outperforms other pixel-wise classification methods that make use of unsupervised feature extraction approaches. Additionally, even though our approach does not use any prior knowledge, or labelled training data to learn features, it yields either advantageous, or comparable, results in terms of classification accuracy with respect to recent semi-supervised methods

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Comparative study of several machine learning algorithms for classification of unifloral honeys

    Get PDF
    Unifloral honeys are highly demanded by honey consumers, especially in Europe. To ensure that a honey belongs to a very appreciated botanical class, the classical methodology is palynological analysis to identify and count pollen grains. Highly trained personnel are needed to perform this task, which complicates the characterization of honey botanical origins. Organoleptic assessment of honey by expert personnel helps to confirm such classification. In this study, the ability of different machine learning (ML) algorithms to correctly classify seven types of Spanish honeys of single botanical origins (rosemary, citrus, lavender, sunflower, eucalyptus, heather and forest honeydew) was investigated comparatively. The botanical origin of the samples was ascertained by pollen analysis complemented with organoleptic assessment. Physicochemical parameters such as electrical conductivity, pH, water content, carbohydrates and color of unifloral honeys were used to build the dataset. The following ML algorithms were tested: penalized discriminant analysis (PDA), shrinkage discriminant analysis (SDA), high-dimensional discriminant analysis (HDDA), nearest shrunken centroids (PAM), partial least squares (PLS), C5.0 tree, extremely randomized trees (ET), weighted k-nearest neighbors (KKNN), artificial neural networks (ANN), random forest (RF), support vector machine (SVM) with linear and radial kernels and extreme gradient boosting trees (XGBoost). The ML models were optimized by repeated 10-fold cross-validation primarily on the basis of log loss or accuracy metrics, and their performance was compared on a test set in order to select the best predicting model. Built models using PDA produced the best results in terms of overall accuracy on the test set. ANN, ET, RF and XGBoost models also provided good results, while SVM proved to be the worst

    A Review of Kernel Methods for Feature Extraction in Nonlinear Process Monitoring

    Get PDF
    Kernel methods are a class of learning machines for the fast recognition of nonlinear patterns in any data set. In this paper, the applications of kernel methods for feature extraction in industrial process monitoring are systematically reviewed. First, we describe the reasons for using kernel methods and contextualize them among other machine learning tools. Second, by reviewing a total of 230 papers, this work has identified 12 major issues surrounding the use of kernel methods for nonlinear feature extraction. Each issue was discussed as to why they are important and how they were addressed through the years by many researchers. We also present a breakdown of the commonly used kernel functions, parameter selection routes, and case studies. Lastly, this review provides an outlook into the future of kernel-based process monitoring, which can hopefully instigate more advanced yet practical solutions in the process industries

    3rd Workshop in Symbolic Data Analysis: book of abstracts

    Get PDF
    This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

    Evaluating Pro-poor Transfers When Targeting is Weak: The Albanian Ndihma Ekonomike Program Revisited

    Get PDF
    The Albanian Ndihma Ekonomike is one of the first poverty reduction programs launched in transitional economies. Its record has been judged positively during the recession period of the 1990s and negatively during the more recent growth phase. This paper reconsiders the program using a regression-adjusted matching estimator rst suggested by Heckman et al. (1997, 1998) and exploiting discontinuities in program design and targeting failures. We nd the program to have a weak targeting capacity and a negative and signicant impact on welfare. We also nd that recent changes introduced to the program have not improved its performance. An analysis of the distributional impact of treatment based on stochastic dominance theory suggests that our results are robust.Social assistance, Poverty, Impact Evaluation, Albania

    A STUDY OF REMOTELY SENSED AEROSOL PROPERTIES FROM GROUND-BASED SUN AND SKY SCANNING RADIOMETERS

    Get PDF
    Aerosol particles impact human health by degrading air quality and affect climate by heating or cooling the atmosphere. The Indo-Gangetic Plain (IGP) of Northern India, one of the most populous regions in the world, produces and is impacted by a variety of aerosols including pollution, smoke, dust, and mixtures of them. The NASA Aerosol Robotic Network (AERONET) mesoscale distribution of Sun and sky-pointing instruments in India was established to measure aerosol characteristics at sites across the IGP and around Kanpur, India, a large urban and industrial center in the IGP, during the 2008 pre-monsoon (April-June). This study focused on detecting spatial and temporal variability of aerosols, validating satellite retrievals, and classifying the dominant aerosol mixing states and origins. The Kanpur region typically experiences high aerosol loading due to pollution and smoke during the winter and high aerosol loading due to the addition of dust to the pollution and smoke mixture during the pre-monsoon. Aerosol emissions in Kanpur likely contribute up to 20% of the aerosol loading during the pre-monsoon over the IGP. Aerosol absorption also increases significantly downwind of Kanpur indicating the possibility of the black carbon emissions from aerosol sources such as coal-fired power plants and brick kilns. Aerosol retrievals from satellite show a high bias when compared to the mesoscale distributed instruments around Kanpur during the pre-monsoon with few high quality retrievals due to imperfect aerosol type and land surface characteristic assumptions. Aerosol type classification using the aerosol absorption, size, and shape properties can identify dominant aerosol mixing states of absorbing dust and black carbon particles. Using 19 long-term AERONET sites near various aerosol source regions (Dust, Mixed, Urban/Industrial, and Biomass Burning), aerosol absorption property statistics are expanded upon and show significant differences when compared to previous work. The sensitivity of absorption properties is evaluated and quantified with respect to aerosol retrieval uncertainty. Using clustering analysis, aerosol absorption and size relationships provide a simple method to classify aerosol mixing states and origins and potentially improve aerosol retrievals from ground-based and satellite-based instrumentation

    Fusing Small-footprint Waveform LiDAR and Hyperspectral Data for Canopy-level Species Classification and Herbaceous Biomass Modeling in Savanna Ecosystems

    Get PDF
    The study of ecosystem structure, function, and composition has become increasingly important in order to gain a better understanding of how impacts wrought by natural disturbances, climate, and human activity can alter ecosystem services provided to a population. Research groups at Rochester Institute of Technology and Carnegie Institution for Science are focusing on characterization of savanna ecosystems and are using data from the Carnegie Airborne Observatory (CAO), which integrates advanced imaging spectroscopy and waveform light detection and ranging (wLiDAR) data. This component of the larger ecosystem project has as a goal the fusion of imaging spectroscopy and small-footprint wLiDAR data in order to improve per-species structural parameter estimation towards classication and herbaceous biomass modeling. Waveform LiDAR has proven useful for extracting high vertical resolution structural parameters, while imaging spectroscopy is a well-established tool for species classication and biochemistry assessment. We hypothesize that the two modalities provide complementary information that could improve per-species structural assessment, species classication, and herbaceous biomass modeling when compared to single modality sensing systems. We explored a statistical approach to data fusion at the feature level, which hinged on our ability to reduce structural and spectral data dimensionality to those data features best suited to assessing these complex systems. The species classification approach was based on stepwise discrimination analysis (SDA) and used feature metrics from hyperspectral imagery (HSI) combined with wLiDAR data, which could help nding correlated features, and in turn improve classiers. It was found that fusing data with the SDA did not improve classication signicantly, especially compared to the HSI classication results. The overall classication accuracies were 53% for both original and PCA-based wLiDAR variables, 73% for the original HSI variables, 71% for PCA-based HSI variables, 73% for the original fusion of wLiDAR and HSI data set, and 74% for the PCA-based fusion variables. The kappa coecients achieved with the original and PCA-based wLiDAR variable classications were 0.41 and 0.44, respectively. For both original and PCA-based HSI classications, the kappa coecients were 0.63 and 0.60, respectively and 0.62 and 0.64 for original and PCA-based fusion variable classications, respectively. These results show that HSI was more successful in grouping important information in a smaller number of variables than wLiDAR and thus inclusion of structural information did not signicantly improve the classication. As for herbaceous biomass modeling, the statistical approach used for the fusion of wLiDAR and HSI was forward selection modeling (FSM), which selects signicant independent metrics and models those to measured biomass. The results were measured in R2 and RMSE, which indicate the similar ndings. Waveform LiDAR performed the poorest with an R2 of 0.07 for original wLiDAR variables and 0.12 for PCA-based wLiDAR variables. The respective RMSE were 19.99 and 19.41. For both original and PCA-based HSI variables, the results were better with R2 of 0.32 and 0.27 and RMSE of 17.27 and 17.80, respectively. For the fusion of original and PCA-based data, the results were comparable to HSI, with R2 values of 0.35 and 0.29 and RMSE of 16.88 and 17.59, respectively. These results indicate that small scale wLiDAR may not be able to provide accurate measurement of herbaceous biomass, although other factors could have contributed to the relatively poor results, such as the senescent state of grass by April 2008, the narrow biomass range that was measured, and the low biomass values, i.e., the limited laser-target interactions. We concluded that although fusion did not result in signicant improvements over single modality approaches in those two use cases, there is a need for further investigation during peak growing season
    • 

    corecore