3,209 research outputs found

    Twelve numerical, symbolic and hybrid supervised classification methods

    Get PDF
    International audienceSupervised classification has already been the subject of numerous studies in the fields of Statistics, Pattern Recognition and Artificial Intelligence under various appellations which include discriminant analysis, discrimination and concept learning. Many practical applications relating to this field have been developed. New methods have appeared in recent years, due to developments concerning Neural Networks and Machine Learning. These "hybrid" approaches share one common factor in that they combine symbolic and numerical aspects. The former are characterized by the representation of knowledge, the latter by the introduction of frequencies and probabilistic criteria. In the present study, we shall present a certain number of hybrid methods, conceived (or improved) by members of the SYMENU research group. These methods issue mainly from Machine Learning and from research on Classification Trees done in Statistics, and they may also be qualified as "rule-based". They shall be compared with other more classical approaches. This comparison will be based on a detailed description of each of the twelve methods envisaged, and on the results obtained concerning the "Waveform Recognition Problem" proposed by Breiman et al which is difficult for rule based approaches

    Tabular Machine Learning Methods for Predicting Gas Turbine Emissions

    Get PDF
    The work presented here received funding from EPSRC (EP/W522089/1) and Siemens Energy Industrial Turbomachinery Ltd. as part of the iCASE EPSRC PhD studentship “Predictive Emission Monitoring Systems for Gas Turbines”.Peer reviewedPublisher PD

    Intelligence artificielle: Les défis actuels et l'action d'Inria - Livre blanc Inria

    Get PDF
    Livre blanc Inria N°01International audienceInria white papers look at major current challenges in informatics and mathematics and show actions conducted by our project-teams to address these challenges. This document is the first produced by the Strategic Technology Monitoring & Prospective Studies Unit. Thanks to a reactive observation system, this unit plays a lead role in supporting Inria to develop its strategic and scientific orientations. It also enables the institute to anticipate the impact of digital sciences on all social and economic domains. It has been coordinated by Bertrand Braunschweig with contributions from 45 researchers from Inria and from our partners. Special thanks to Peter Sturm for his precise and complete review.Les livres blancs d’Inria examinent les grands défis actuels du numérique et présentent les actions menées par noséquipes-projets pour résoudre ces défis. Ce document est le premier produit par la cellule veille et prospective d’Inria. Cette unité, par l’attention qu’elle porte aux évolutions scientifiques et technologiques, doit jouer un rôle majeur dans la détermination des orientations stratégiques et scientifiques d’Inria. Elle doit également permettre à l’Institut d’anticiper l’impact des sciences du numérique dans tous les domaines sociaux et économiques. Ce livre blanc a été coordonné par Bertrand Braunschweig avec des contributions de 45 chercheurs d’Inria et de ses partenaires. Un grand merci à Peter Sturm pour sa relecture précise et complète. Merci également au service STIP du centre de Saclay – Île-de-France pour la correction finale de la version française

    Hybrid Image Classification Technique for Land-Cover Mapping in the Arctic Tundra, North Slope, Alaska

    Get PDF
    Remotely sensed image classification techniques are very useful to understand vegetation patterns and species combination in the vast and mostly inaccessible arctic region. Previous researches that were done for mapping of land cover and vegetation in the remote areas of northern Alaska have considerably low accuracies compared to other biomes. The unique arctic tundra environment with short growing season length, cloud cover, low sun angles, snow and ice cover hinders the effectiveness of remote sensing studies. The majority of image classification research done in this area as reported in the literature used traditional unsupervised clustering technique with Landsat MSS data. It was also emphasized by previous researchers that SPOT/HRV-XS data lacked the spectral resolution to identify the small arctic tundra vegetation parcels. Thus, there is a motivation and research need to apply a new classification technique to develop an updated, detailed and accurate vegetation map at a higher spatial resolution i.e. SPOT-5 data. Traditional classification techniques in remotely sensed image interpretation are based on spectral reflectance values with an assumption of the training data being normally distributed. Hence it is difficult to add ancillary data in classification procedures to improve accuracy. The purpose of this dissertation was to develop a hybrid image classification approach that effectively integrates ancillary information into the classification process and combines ISODATA clustering, rule-based classifier and the Multilayer Perceptron (MLP) classifier which uses artificial neural network (ANN). The main goal was to find out the best possible combination or sequence of classifiers for typically classifying tundra type vegetation that yields higher accuracy than the existing classified vegetation map from SPOT data. Unsupervised ISODATA clustering and rule-based classification techniques were combined to produce an intermediate classified map which was used as an input to a Multilayer Perceptron (MLP) classifier. The result from the MLP classifier was compared to the previous classified map and for the pixels where there was a disagreement for the class allocations, the class having a higher kappa value was assigned to the pixel in the final classified map. The results were compared to standard classification techniques: simple unsupervised clustering technique and supervised classification with Feature Analyst. The results indicated higher classification accuracy (75.6%, with kappa value of .6840) for the proposed hybrid classification method than the standard classification techniques: unsupervised clustering technique (68.3%, with kappa value of 0.5904) and supervised classification with Feature Analyst (62.44%, with kappa value of 0.5418). The results were statistically significant at 95% confidence level

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences
    • …
    corecore