Search CORE

5 research outputs found

Extreme Data Mining: Inference from Small Datasets

Author: Andonie Răzvan
Publication venue: ScholarWorks@CWU
Publication date: 01/09/2010
Field of study

Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets. We have the following goals: i. To discuss the meaning of small in the context of inferring from small datasets. ii. To overview computational intelligence solutions for this problem. iii. To illustrate the introduced concepts with a real-life application

Agora University Editing House: Journals

ScholarWorks at Central Washington University

Multi-tier framework for the inferential measurement and data-driven modeling

Author: Rallo Moya Robert
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 01/01/2007
Field of study

A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional URV

HIV analysis using computational intelligence

Author: Leke Betechuoh Brain
Publication venue
Publication date: 08/01/2009
Field of study

In this study, a new method to analyze HIV using a combination of autoencoder networks and genetic algorithms is proposed. The proposed method is tested on a set of demographic properties of individuals obtained from the South African antenatal survey. The autoencoder model is then compared with a conventional feedforward neural network model and yields a classification accuracy of 92% compared to 84% obtained for the conventional feedforward model. The autoencoder model is then used to propose a new method of approximating missing entries in the HIV database using ant colony optimization. This method is able to estimate missing input to an accuracy of 80%. The estimated missing input values are then used to analyze HIV. The autoencoder network classifier model yields a classification accuracy of 81% in the presence of missing input values. The feedforward neural network classifier model yields a classification accuracy of 82% in the presence of missing input values. A control mechanism is proposed to assess the effect of demographic properties on the HIV status of individuals, based on inverse neural networks, and autoencoder networks-based-on-genetic algorithms. This control mechanism is aimed at understanding whether HIV susceptibility can be controlled by modifying some of the demographic properties. The inverse neural network control model has accuracies of 77% and 82%, meanwhile the genetic algorithm model has accuracies of 77% and 92%, for the prediction of educational level of individuals, and gravidity, respectively. HIV modelling using neuro-fuzzy models is then investigated, and rules are extracted, which provide more valuable insight. The classification accuracy obtained by the neuro-fuzzy model is 86%. A rough set approximation is then investigated for rule extraction, and it is found that the rules present simplistic and understandable relationships on how the demographic properties affect HIV risk. The study concludes by investigating a model for automatic relevance determination, to determine which of the demographic properties is important for HIV modelling. A comparison is done between using the full input data set and the data set using the input parameters selected by the technique for the HIV classification. Age of the individual, gravidity, province, region, reported pregnancy and educational level were amongst the input parameters selected as relevant for classification of an individual’s HIV risk. This study thus proposes models, which can be used to understand HIV dynamics, and can be used by policy-makers to more effectively understand the demographic influences driving HIV infection

Wits Institutional Repository on DSPACE

Faculty Publications and Creative Works 1998

Author: Office of the Vice President for Research
Publication venue: UNM Digital Repository
Publication date: 01/01/1998
Field of study

One of the ways in which we recognize our faculty at the University of New Mexico is through Faculty Publications & Creative Works. An annual publication, it highlights our faculty\u27s scholarly and creative activities and achievements and serves as a compendium of UNM faculty efforts during the 1998 calendar year. Faculty Publications & Creative Works strives to illustrate the depth and breadth of research activities performed throughout our University\u27s laboratories, studios and classrooms. We believe that the communication of individual research is a significant method of sharing concepts and thoughts and ultimately inspiring the birth of new ideas. In support of this, UNM faculty during 1998 produced over 2,457 works, including 1,990 scholarly papers and articles, 69 books, 98 book chapters, 119 reviews, 165 creative works and 16 patents. We are proud of the accomplishments of our faculty which are in part reflected in this book, which illustrates the diversity of intellectual pursuits in support of research and education at the University of New Mexico. Nasir Ahmed, Ph.D. Interim Associate Provost for Research and Dean of Graduate Studie

The impact of knowledge management processes on organizational resilience: data mining as an instrument of measurement.

Author: Frelas Michael
Publication venue
Publication date: 30/04/2017
Field of study

The aim of the research conducted for this thesis is to test the feasibility of using data mining (DM) to assess the relationship between and the impact of knowledge management (KM) on organizational resilience (OR). The emphasis currently placed on the value of intangible assets by private sector organizations and the recent increase in the use of data mining technologies are the key drivers in this evaluation of the use of data mining tools as an alternative to classical statistics when measuring intangibles. Data was collected using a questionnaire that was sent to the senior executives of a number of mid-sized companies located in the mid-west of the USA. Using Microsoft's SQL Server's Analytical Services (MSSAS) and the data provided by the respondents, five predictive models are built to test the suitability of the MSSAS' DM tool for assessing the relationships between and the impact of KM on OR. Of the five models constructed as part of this research, four classification models (two Naïve Bayes models, one neural network model, and one decision tree model) and one clustering model were found to be suitable tools for capturing the intricate relationships that exist between KM and OR. These models made it possible to evaluate the strengths of the relationships between KM and OR and to identify which KM processes contribute, and to what extent, to OR. In addition, the models enabled the collation of predicted OR scores, based on the responses given in the questionnaire. Finally, this research identifies some of the key challenges associated with using DM as a measurement instrument for assessing the relationship between and the impact of KM on OR. This research makes a number of significant contributions to the existing body of knowledge. It contributes to the understanding of the impact of KM on OR, to the understanding of the methods used to measure such impact and to the processes involved in measuring such impact using DM. From a practitioner perspective, this research contributes to the understanding of OR and provides a framework for achieving OR within an organizational context

Open Access Institutional Repository at Robert Gordon University