171 research outputs found

    Instantaneous failure mode remaining useful life estimation using non-uniformly sampled measurements from a reciprocating compressor valve failure

    Get PDF
    One of the major targets in industry is minimisation of downtime and cost, and maximisation of availability and safety, with maintenance considered a key aspect in achieving this objective. The concept of Condition Based Maintenance and Prognostics and Health Management (CBM/PHM) , which is founded on the principles of diagnostics, and prognostics, is a step towards this direction as it offers a proactive means for scheduling maintenance. Reciprocating compressors are vital components in oil and gas industry, though their maintenance cost is known to be relatively high. Compressor valves are the weakest part, being the most frequent failing component, accounting for almost half maintenance cost. To date, there has been limited information on estimating Remaining Useful Life (RUL) of reciprocating compressor in the open literature. This paper compares the prognostic performance of several methods (multiple linear regression, polynomial regression, Self-Organising Map (SOM), K-Nearest Neighbours Regression (KNNR)), in relation to their accuracy and precision, using actual valve failure data captured from an operating industrial compressor. The SOM technique is employed for the first time as a standalone tool for RUL estimation. Furthermore, two variations on estimating RUL based on SOM and KNNR respectively are proposed. Finally, an ensemble method by combining the output of all aforementioned algorithms is proposed and tested. Principal components analysis and statistical process control were implemented to create T^2 and Q metrics, which were proposed to be used as health indicators reflecting degradation processes and were employed for direct RUL estimation for the first time. It was shown that even when RUL is relatively short due to instantaneous nature of failure mode, it is feasible to perform good RUL estimates using the proposed techniques

    Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study

    Get PDF
    Being able to predict software quality is essential, but also it pose significant challenges in software engineering. Historical software project datasets are often being utilized together with various machine learning algorithms for fault-proneness classification. Unfortunately, the missing values in datasets have negative impacts on the estimation accuracy and therefore, could lead to inconsistent results. As a method handling missing data, K nearest neighbor (KNN) imputation gradually gains acceptance in empirical studies by its exemplary performance and simplicity. To date, researchers still call for optimized parameter setting for KNN imputation to further improve its performance. In the work, we develop a novel incomplete-instance based KNN imputation technique, which utilizes a cross-validation scheme to optimize the parameters for each missing value. An experimental assessment is conducted on eight quality datasets under various missingness scenarios. The study also compared the proposed imputation approach with mean imputation and other three KNN imputation approaches. The results show that our proposed approach is superior to others in general. The relatively optimal fixed parameter settings for KNN imputation for software quality data is also determined. It is observed that the classification accuracy is improved or at least maintained by using our approach for missing data imputation

    Intelligent lighting : a machine learning perspective

    Get PDF

    Condition Monitoring of Wind Turbines Using Intelligent Machine Learning Techniques

    Get PDF
    Wind Turbine condition monitoring can detect anomalies in turbine performance which have the potential to result in unexpected failure and financial loss. This study examines common Supervisory Control And Data Acquisition (SCADA) data over a period of 20 months for 21 pitch regulated 2.3 MW turbines and is presented in three manuscripts. First, power curve monitoring is targeted applying various types of Artificial Neural Networks to increase modeling accuracy. It is shown how the proposed method can significantly improve network reliability compared with existing models. Then, an advance technique is utilized to create a smoother dataset for network training followed by establishing dynamic ANFIS network. At this stage, designed network aims to predict power generation in future hours. Finally, a recursive principal component analysis is performed to extract significant features to be used as input parameters of the network. A novel fusion technique is then employed to build an advanced model to make predictions of turbines performance with favorably low errors

    Enhancing extremist data classification through textual analysis

    Get PDF
    The high volume of extremist materials on the Internet has created the need for intelligence gathering via the Web and real-time monitoring of potential websites for evidence of extremist activities. However, the manual classification for such contents is practically difficult and time-consuming. In response to this challenge, the work reported here developed several classification frameworks. Each framework provides a basis of text representation before being fed into machine learning algorithm. The basis of text representation are Sentiment-rule, Posit-textual analysis with word-level features, and an extension of Posit analysis, known as Extended-Posit, which adopts character-level as well as word-level data. Identifying some gaps in the aforementioned techniques created avenues for further improvements, most especially in handling larger datasets with better classification accuracy. Consequently, a novel basis of text representation known as the Composite-based method was developed. This is a computational framework that explores the combination of both sentiment and syntactic features of textual contents of a Web page. Subsequently, these techniques are applied on a dataset that had been subjected to a manual classification process, thereafter fed into machine learning algorithm. This is to generate a measure of how well each page can be classified into their appropriate classes. The classifiers considered are both Neural Network (RNN and MLP) and Machine Learning classifiers (such as J48, Random Forest and KNN). In addition, features selection and model optimisation were evaluated to know the cost when creating machine learning model. However, considering all the result obtained from each of the framework, the results indicated that composite features are preferable to solely syntactic or sentiment features which offer improved classification accuracy when used with machine learning algorithms. Furthermore, the extension of Posit analysis to include both word and character-level data out-performed word-level feature alone when applied on the assembled textual data. Moreover, Random Forest classifier outperformed other classifiers explored. Taking cost into account, feature selection improves classification accuracy and save time better than hyperparameter turning (model optimisation).The high volume of extremist materials on the Internet has created the need for intelligence gathering via the Web and real-time monitoring of potential websites for evidence of extremist activities. However, the manual classification for such contents is practically difficult and time-consuming. In response to this challenge, the work reported here developed several classification frameworks. Each framework provides a basis of text representation before being fed into machine learning algorithm. The basis of text representation are Sentiment-rule, Posit-textual analysis with word-level features, and an extension of Posit analysis, known as Extended-Posit, which adopts character-level as well as word-level data. Identifying some gaps in the aforementioned techniques created avenues for further improvements, most especially in handling larger datasets with better classification accuracy. Consequently, a novel basis of text representation known as the Composite-based method was developed. This is a computational framework that explores the combination of both sentiment and syntactic features of textual contents of a Web page. Subsequently, these techniques are applied on a dataset that had been subjected to a manual classification process, thereafter fed into machine learning algorithm. This is to generate a measure of how well each page can be classified into their appropriate classes. The classifiers considered are both Neural Network (RNN and MLP) and Machine Learning classifiers (such as J48, Random Forest and KNN). In addition, features selection and model optimisation were evaluated to know the cost when creating machine learning model. However, considering all the result obtained from each of the framework, the results indicated that composite features are preferable to solely syntactic or sentiment features which offer improved classification accuracy when used with machine learning algorithms. Furthermore, the extension of Posit analysis to include both word and character-level data out-performed word-level feature alone when applied on the assembled textual data. Moreover, Random Forest classifier outperformed other classifiers explored. Taking cost into account, feature selection improves classification accuracy and save time better than hyperparameter turning (model optimisation)

    On Knowledge Discovery Experimented with Otoneurological Data

    Get PDF
    Diagnosis of otoneurological diseases can be challenging due to similar kind of and overlapping symptoms that can also vary over time. Thus, systems to support and aid diagnosis of vertiginous patients are considered beneficial. This study continues refinement of an otoneurological decision support system ONE and its knowledge base. The aim of the study is to improve the classification accuracy of nine otoneurological diseases in real world situations by applying machine learning methods to knowledge discovery in the otoneurological domain. The phases of the dissertation is divided into three parts: fitness value formation for attribute values, attribute weighting and classification task redefinition. The first phase concentrates on the knowledge update of the ONE with the domain experts and on the knowledge discovery method that forms the fitness values for the values of the attributes. The knowledge base of the ONE needed update due to changes made to data collection questionnaire. The effect of machine learnt fitness values on classification are examined and classification results are compared to the knowledge set by the experts and their combinations. Classification performance of nearest pattern method of the ONE is compared to k-nearest neighbour method (k-NN) and Naïve Bayes (NB). The second phase concentrates on the attribute weighting. Scatter method and instance-based learning algorithms IB4 and IB1w are applied in the attribute weighting. These machine learnt attribute weights in addition to the weights defined by the domain experts and equal weighting are tested with the classification method of the ONE and attribute weighted k-NN with One-vs-All classifiers (wk-NN OVA). Genetic algorithm (GA) approach is examined in the attribute weighting. The machine learnt weight sets are utilized as a starting point with the GA. Populations (the weight sets) are evaluated with the classification method of the ONE, the wk-NN OVA and attribute weighted k-NN using neighbour’s class-based attribute weighting (cwk-NN). In the third phase, the effect of the classification task redefinition is examined. The multi-class classification task is separated into several binary classification tasks. The binary classification is studied without attribute weighting with the k-NN and support vector machines (SVM)

    Reciprocating compressor prognostics of an instantaneous failure mode utilising temperature only measurements

    Get PDF
    Reciprocating compressors are critical components in the oil and gas sector, though their maintenance cost is known to be relatively high. Compressor valves are the weakest component, being the most frequent failure mode, accounting for almost half the maintenance cost. One of the major targets in industry is minimisation of downtime and cost, while maximising availability and safety of a machine, with maintenance considered a key aspect in achieving this objective. The concept of Condition Based Maintenance and Prognostics and Health Management (CBM/PHM) which is founded on the diagnostics and prognostics principles, is a step towards this direction as it offers a proactive means for scheduling maintenance. Despite the fact that diagnostics is an established area for reciprocating compressors, to date there is limited information in the open literature regarding prognostics, especially given the nature of failures can be instantaneous. This work presents an analysis of prognostic performance of several methods (multiple linear regression, polynomial regression, K-Nearest Neighbours Regression (KNNR)), in relation to their accuracy and variability, using actual temperature only valve failure data, an instantaneous failure mode, from an operating industrial compressor. Furthermore, a variation for Remaining Useful Life (RUL) estimation based on KNNR, along with an ensemble technique merging the results of all aforementioned methods are proposed. Prior to analysis, principal components analysis and statistical process control were employed to create !! and ! metrics, which were proposed to be used as health indicators reflecting degradation process of the valve failure mode and are proposed to be used for direct RUL estimation for the first time. Results demonstrated that even when RUL is relatively short due to instantaneous nature of failure mode, it is feasible to perform good RUL estimates using the proposed techniques

    Towards Cognizant Hearing Aids: Modeling of Content, Affect and Attention

    Get PDF

    Machine Learning Approaches for Natural Resource Data

    Get PDF
    Abstract Real life applications involving efficient management of natural resources are dependent on accurate geographical information. This information is usually obtained by manual on-site data collection, via automatic remote sensing methods, or by the mixture of the two. Natural resource management, besides accurate data collection, also requires detailed analysis of this data, which in the era of data flood can be a cumbersome process. With the rising trend in both computational power and storage capacity, together with lowering hardware prices, data-driven decision analysis has an ever greater role. In this thesis, we examine the predictability of terrain trafficability conditions and forest attributes by using a machine learning approach with geographic information system data. Quantitative measures on the prediction performance of terrain conditions using natural resource data sets are given through five distinct research areas located around Finland. Furthermore, the estimation capability of key forest attributes is inspected with a multitude of modeling and feature selection techniques. The research results provide empirical evidence on whether the used natural resource data is sufficiently accurate enough for practical applications, or if further refinement on the data is needed. The results are important especially to forest industry since even slight improvements to the natural resource data sets utilized in practice can result in high saves in terms of operation time and costs. Model evaluation is also addressed in this thesis by proposing a novel method for estimating the prediction performance of spatial models. Classical model goodness of fit measures usually rely on the assumption of independently and identically distributed data samples, a characteristic which normally is not true in the case of spatial data sets. Spatio-temporal data sets contain an intrinsic property called spatial autocorrelation, which is partly responsible for breaking these assumptions. The proposed cross validation based evaluation method provides model performance estimation where optimistic bias due to spatial autocorrelation is decreased by partitioning the data sets in a suitable way. Keywords: Open natural resource data, machine learning, model evaluationTiivistelmä Käytännön sovellukset, joihin sisältyy luonnonvarojen hallintaa ovat riippuvaisia tarkasta paikkatietoaineistosta. Tämä paikkatietoaineisto kerätään usein manuaalisesti paikan päällä, automaattisilla kaukokartoitusmenetelmillä tai kahden edellisen yhdistelmällä. Luonnonvarojen hallinta vaatii tarkan aineiston keräämisen lisäksi myös sen yksityiskohtaisen analysoinnin, joka tietotulvan aikakautena voi olla vaativa prosessi. Nousevan laskentatehon, tallennustilan sekä alenevien laitteistohintojen myötä datapohjainen päätöksenteko on yhä suuremmassa roolissa. Tämä väitöskirja tutkii maaston kuljettavuuden ja metsäpiirteiden ennustettavuutta käyttäen koneoppimismenetelmiä paikkatietoaineistojen kanssa. Maaston kuljettavuuden ennustamista mitataan kvantitatiivisesti käyttäen kaukokartoitusaineistoa viideltä eri tutkimusalueelta ympäri Suomea. Tarkastelemme lisäksi tärkeimpien metsäpiirteiden ennustettavuutta monilla eri mallintamistekniikoilla ja piirteiden valinnalla. Väitöstyön tulokset tarjoavat empiiristä todistusaineistoa siitä, onko käytetty luonnonvaraaineisto riittävän laadukas käytettäväksi käytännön sovelluksissa vai ei. Tutkimustulokset ovat tärkeitä erityisesti metsäteollisuudelle, koska pienetkin parannukset luonnonvara-aineistoihin käytännön sovelluksissa voivat johtaa suuriin säästöihin niin operaatioiden ajankäyttöön kuin kuluihin. Tässä työssä otetaan kantaa myös mallin evaluointiin esittämällä uuden menetelmän spatiaalisten mallien ennustuskyvyn estimointiin. Klassiset mallinvalintakriteerit nojaavat yleensä riippumattomien ja identtisesti jakautuneiden datanäytteiden oletukseen, joka ei useimmiten pidä paikkaansa spatiaalisilla datajoukoilla. Spatio-temporaaliset datajoukot sisältävät luontaisen ominaisuuden, jota kutsutaan spatiaaliseksi autokorrelaatioksi. Tämä ominaisuus on osittain vastuussa näiden oletusten rikkomisesta. Esitetty ristiinvalidointiin perustuva evaluointimenetelmä tarjoaa mallin ennustuskyvyn mitan, missä spatiaalisen autokorrelaation vaikutusta vähennetään jakamalla datajoukot sopivalla tavalla. Avainsanat: Avoin luonnonvara-aineisto, koneoppiminen, mallin evaluoint

    Multi-tier framework for the inferential measurement and data-driven modeling

    Get PDF
    A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic
    corecore