9 research outputs found

    Heterogeneous Kohonen networks

    Get PDF
    A large number of practical problems involves elements that are described as a mixture of qualitative and quantitative infomation, and whose description is probably incomplete. The self-organizing map is an effective tool for visualization of high-dimensional continuous data. In this work, we extend the network and training algorithm to cope with heterogeneous information, as well as missing values. The classification performance on a collection of benchmarking data sets is compared in different configurations. Various visualization methods are suggested to aid users interpret post-training results.Peer ReviewedPostprint (author's final draft

    Non-Direct Encoding Method Based on Cellular Automata to Design Neural Network Architectures

    Get PDF
    Architecture design is a fundamental step in the successful application of Feed forward Neural Networks. In most cases a large number of neural networks architectures suitable to solve a problem exist and the architecture design is, unfortunately, still a human expert’s job. It depends heavily on the expert and on a tedious trial-and-error process. In the last years, many works have been focused on automatic resolution of the design of neural network architectures. Most of the methods are based on evolutionary computation paradigms. Some of the designed methods are based on direct representations of the parameters of the network. These representations do not allow scalability; thus, for representing large architectures very large structures are required. More interesting alternatives are represented by indirect schemes. They codify a compact representation of the neural network. In this work, an indirect constructive encoding scheme is proposed. This scheme is based on cellular automata representations and is inspired by the idea that only a few seeds for the initial configuration of a cellular automaton can produce a wide variety of feed forward neural networks architectures. The cellular approach is experimentally validated in different domains and compared with a direct codification scheme.Publicad

    Non-Direct Encoding Method Based on Cellular Automata to Design Neural Network Architectures

    Get PDF
    Architecture design is a fundamental step in the successful application of Feed forward Neural Networks. In most cases a large number of neural networks architectures suitable to solve a problem exist and the architecture design is, unfortunately, still a human expert's job. It depends heavily on the expert and on a tedious trial-and-error process. In the last years, many works have been focused on automatic resolution of the design of neural network architectures. Most of the methods are based on evolutionary computation paradigms. Some of the designed methods are based on direct representations of the parameters of the network. These representations do not allow scalability; thus, for representing large architectures very large structures are required. More interesting alternatives are represented by indirect schemes. They codify a compact representation of the neural network. In this work, an indirect constructive encoding scheme is proposed. This scheme is based on cellular automata representations and is inspired by the idea that only a~few seeds for the initial configuration of a cellular automaton can produce a wide variety of feed forward neural networks architectures. The cellular approach is experimentally validated in different domains and compared with a direct codification scheme

    Rejection-oriented learning without complete class information

    Get PDF
    Machine Learning is commonly used to support decision-making in numerous, diverse contexts. Its usefulness in this regard is unquestionable: there are complex systems built on the top of machine learning techniques whose descriptive and predictive capabilities go far beyond those of human beings. However, these systems still have limitations, whose analysis enable to estimate their applicability and confidence in various cases. This is interesting considering that abstention from the provision of a response is preferable to make a mistake in doing so. In the context of classification-like tasks, the indication of such inconclusive output is called rejection. The research which culminated in this thesis led to the conception, implementation and evaluation of rejection-oriented learning systems for two distinct tasks: open set recognition and data stream clustering. These system were derived from WiSARD artificial neural network, which had rejection modelling incorporated into its functioning. This text details and discuss such realizations. It also presents experimental results which allow assess the scientific and practical importance of the proposed state-of-the-art methodology.Aprendizado de Máquina é comumente usado para apoiar a tomada de decisão em numerosos e diversos contextos. Sua utilidade neste sentido é inquestionável: existem sistemas complexos baseados em técnicas de aprendizado de máquina cujas capacidades descritivas e preditivas vão muito além das dos seres humanos. Contudo, esses sistemas ainda possuem limitações, cuja análise permite estimar sua aplicabilidade e confiança em vários casos. Isto é interessante considerando que a abstenção da provisão de uma resposta é preferível a cometer um equívoco ao realizar tal ação. No contexto de classificação e tarefas similares, a indicação desse resultado inconclusivo é chamada de rejeição. A pesquisa que culminou nesta tese proporcionou a concepção, implementação e avaliação de sistemas de aprendizado orientados `a rejeição para duas tarefas distintas: reconhecimento em cenário abertos e agrupamento de dados em fluxo contínuo. Estes sistemas foram derivados da rede neural artificial WiSARD, que teve a modelagem de rejeição incorporada a seu funcionamento. Este texto detalha e discute tais realizações. Ele também apresenta resultados experimentais que permitem avaliar a importância científica e prática da metodologia de ponta proposta

    On the development of decision-making systems based on fuzzy models to assess water quality in rivers

    Get PDF
    There are many situations where a linguistic description of complex phenomena allows better assessments. It is well known that the assessment of water quality continues depending heavily upon subjective judgments and interpretation, despite the huge datasets available nowadays. In that sense, the aim of this study has been to introduce intelligent linguistic operations to analyze databases, and produce self interpretable water quality indicators, which tolerate both imprecision and linguistic uncertainty. Such imprecision typically reflects the ambiguity of human thinking when perceptions need to be expressed. Environmental management concepts such as: "water quality", "level of risk", or "ecological status" are ideally dealt with linguistic variables. In the present Thesis, the flexibility of computing with words offered by fuzzy logic has been considered in these management issues. Firstly, a multipurpose hierarchical water quality index has been designed with fuzzy reasoning. It integrates a wide set of indicators including: organic pollution, nutrients, pathogens, physicochemical macro-variables, and priority micro-contaminants. Likewise, the relative importance of the water quality indicators has been dealt with the analytic hierarchy process, a decision-aiding method. Secondly, a methodology based on a hybrid approach that combines fuzzy inference systems and artificial neural networks has been used to classify ecological status in surface waters according to the Water Framework Directive. This methodology has allowed dealing efficiently with the non-linearity and subjective nature of variables involved in this classification problem. The complexity of inference systems, the appropriate choice of linguistic rules, and the influence of the functions that transform numerical variables into linguistic variables have been studied. Thirdly, a concurrent neuro-fuzzy model based on screening ecological risk assessment has been developed. It has considered the presence of hazardous substances in rivers, and incorporates an innovative ranking and scoring system, based on a self-organizing map, to account for the likely ecological hazards posed by the presence of chemical substances in freshwater ecosystems. Hazard factors are combined with environmental concentrations within fuzzy inference systems to compute ecological risk potentials under linguistic uncertainty. The estimation of ecological risk potentials allows identifying those substances requiring stricter controls and further rigorous risk assessment. Likewise, the aggregation of ecological risk potentials, by means of empirical cumulative distribution functions, has allowed estimating changes in water quality over time. The neuro-fuzzy approach has been validated by comparison with biological monitoring. Finally, a hierarchical fuzzy inference system to deal with sediment based ecological risk assessment has been designed. The study was centered in sediments, since they produce complementary findings to water quality analysis, especially when temporal trends are required. Results from chemical and eco-toxicological analyses have been used as inputs to two parallel inference systems which assess levels of contamination and toxicity, respectively. Results from both inference engines are then treated in a third inference engine which provides a final risk characterization, where the risk is provided in linguistic terms, with their respective degrees of certitude. Inputs to the risk system have been the levels of potentially toxic substances, mainly metals and chlorinated organic compounds, and the toxicity measured with a screening test which uses the photo-luminescent bacteria Vibrio fischeri. The Ebro river basin has been selected as case study, although the methodologies here explained can easily be applied to other rivers. In conclusion, this study has broadly demonstrated that the design of water quality indexes, based on fuzzy logic, emerges as suitable and alternative tool to support decision makers involved in effective sustainable river basin management plans.Existen diversas situaciones en las cuales la descripción en términos lingüísticos de fenómenos complejos permite mejores resultados. A pesar de los volúmenes de información cuantitativa que se manejan actualmente, es bien sabido que la gestión de la calidad del agua todavía obedece a juicios subjetivos y de interpretación de los expertos. Por tanto, el reto en este trabajo ha sido la introducción de operaciones lógicas que computen con palabras durante el análisis de los datos, para la elaboración de indicadores auto-interpretables de calidad del agua, que toleren la imprecisión e incertidumbre lingüística. Esta imprecisión típicamente refleja la ambigüedad del pensamiento humano para expresar percepciones. De allí que las variables lingüísticas se presenten como muy atractivas para el manejo de conceptos de la gestión medioambiental, como es el caso de la "calidad del agua", el "nivel de riesgo" o el "estado ecológico". Por tanto, en la presente Tesis, la flexibilidad de la lógica difusa para computar con palabras se ha adaptado a diversos tópicos en la gestión de la calidad del agua. Primero, se desarrolló un índice jerárquico multipropósito de calidad del agua que se obtuvo mediante razonamiento difuso. El índice integra un extenso grupo de indicadores que incluyen: contaminación orgánica, nutrientes, patógenos, variables macroscópicas, así como sustancias prioritarias micro-contaminantes. La importancia relativa de los indicadores al interior del sistema de inferencia se estimó con un método de análisis de decisiones, llamado proceso jerárquico analítico. En una segunda fase, se utilizó una metodología híbrida que combina los sistemas de inferencia difusos y las redes neuronales artificiales, conocida como neuro-fuzzy, para el estudio de la clasificación del estado ecológico de los ríos, de acuerdo con los lineamientos de la Directiva Marco de Aguas. Esta metodología permitió un manejo adecuado de la no-linealidad y naturaleza subjetiva de las variables involucradas en este problema clasificatorio. Con ella, se estudió la complejidad de los sistemas de inferencia, la selección apropiada de reglas lingüísticas y la influencia de las funciones que transforman las variables numéricas en lingüísticas. En una tercera fase, se desarrolló un modelo conceptual neuro-fuzzy concurrente basado en la metodología de evaluación de riesgo ecológico preliminar. Este modelo consideró la presencia de sustancias peligrosas en los ríos, e incorporó un mapa auto-organizativo para clasificar las sustancias químicas, en términos de su peligrosidad hacia los ecosistemas acuáticos. Con este modelo se estimaron potenciales de riesgo ecológico por combinación de factores de peligrosidad y de concentraciones de las sustancias químicas en el agua. Debido a la alta imprecisión e incertidumbre lingüística, estos potenciales se obtuvieron mediante sistemas de inferencia difusos, y se integraron por medio de distribuciones empíricas acumuladas, con las cuales se pueden analizar cambios espacio-temporales en la calidad del agua. Finalmente, se diseñó un sistema jerárquico de inferencia difuso para la evaluación del riesgo ecológico en sedimentos de ribera. Este sistema estima los grados de contaminación, toxicidad y riesgo en los sedimentos en términos lingüísticos, con sus respectivos niveles de certeza. El sistema se alimenta con información proveniente de análisis químicos, que detectan la presencia de sustancias micro-contaminantes, y de ensayos eco-toxicológicos tipo "screening" que usan la bacteria Vibrio fischeri. Como caso de estudio se seleccionó la cuenca del río Ebro, aunque las metodologías aquí desarrolladas pueden aplicarse fácilmente a otros ríos. En conclusión, este trabajo demuestra ampliamente que el diseño y aplicación de indicadores de calidad de las aguas, basados en la metodología de la lógica difusa, constituyen una herramienta sencilla y útil para los tomadores de decisiones encargados de la gestión sostenible de las cuencas hidrográficas

    Heuristic methods for support vector machines with applications to drug discovery.

    Get PDF
    The contributions to computer science presented in this thesis were inspired by the analysis of the data generated in the early stages of drug discovery. These data sets are generated by screening compounds against various biological receptors. This gives a first indication of biological activity. To avoid screening inactive compounds, decision rules for selecting compounds are required. Such a decision rule is a mapping from a compound representation to an estimated activity. Hand-coding such rules is time-consuming, expensive and subjective. An alternative is to learn these rules from the available data. This is difficult since the compounds may be characterized by tens to thousands of physical, chemical, and structural descriptors and it is not known which are most relevant to the prediction of biological activity. Further, the activity measurements are noisy, so the data can be misleading. The support vector machine (SVM) is a statistically well-founded learning machine that is not adversely affected by high-dimensional representations and is robust with respect to measurement inaccuracies. It thus appears to be ideally suited to the analysis of screening data. The novel application of the SVM to this domain highlights some shortcomings with the vanilla SVM. Three heuristics are developed to overcome these deficiencies: a stopping criterion, HERMES, that allows good solutions to be found in less time; an automated method, LAIKA, for tuning the Gaussian kernel SVM; and, an algorithm, STAR, that outputs a more compact solution. These heuristics achieve their aims on public domain data and are broadly successful when applied to the drug discovery data. The heuristics and associated data analysis are thus of benefit to both pharmacology and computer science

    Complexity and modeling power of insertion-deletion systems

    Get PDF
    SISTEMAS DE INSERCIÓN Y BORRADO: COMPLEJIDAD Y CAPACIDAD DE MODELADO El objetivo central de la tesis es el estudio de los sistemas de inserción y borrado y su capacidad computacional. Más concretamente, estudiamos algunos modelos de generación de lenguaje que usan operaciones de reescritura de dos cadenas. También consideramos una variante distribuida de los sistemas de inserción y borrado en el sentido de que las reglas se separan entre un número finito de nodos de un grafo. Estos sistemas se denominan sistemas controlados mediante grafo, y aparecen en muchas áreas de la Informática, jugando un papel muy importante en los lenguajes formales, la lingüística y la bio-informática. Estudiamos la decidibilidad/ universalidad de nuestros modelos mediante la variación de los parámetros de tamaño del vector. Concretamente, damos respuesta a la cuestión más importante concerniente a la expresividad de la capacidad computacional: si nuestro modelo es equivalente a una máquina de Turing o no. Abordamos sistemáticamente las cuestiones sobre los tamaños mínimos de los sistemas con y sin control de grafo.COMPLEXITY AND MODELING POWER OF INSERTION-DELETION SYSTEMS The central object of the thesis are insertion-deletion systems and their computational power. More specifically, we study language generating models that use two string rewriting operations: contextual insertion and contextual deletion, and their extensions. We also consider a distributed variant of insertion-deletion systems in the sense that rules are separated among a finite number of nodes of a graph. Such systems are refereed as graph-controlled systems. These systems appear in many areas of Computer Science and they play an important role in formal languages, linguistics, and bio-informatics. We vary the parameters of the vector of size of insertion-deletion systems and we study decidability/universality of obtained models. More precisely, we answer the most important questions regarding the expressiveness of the computational model: whether our model is Turing equivalent or not. We systematically approach the questions about the minimal sizes of the insertiondeletion systems with and without the graph-control
    corecore