37 research outputs found

    Neural network controller against environment: A coevolutive approach to generalize robot navigation behavior

    Get PDF
    In this paper, a new coevolutive method, called Uniform Coevolution, is introduced to learn weights of a neural network controller in autonomous robots. An evolutionary strategy is used to learn high-performance reactive behavior for navigation and collisions avoidance. The introduction of coevolutive over evolutionary strategies allows evolving the environment, to learn a general behavior able to solve the problem in different environments. Using a traditional evolutionary strategy method, without coevolution, the learning process obtains a specialized behavior. All the behaviors obtained, with/without coevolution have been tested in a set of environments and the capability of generalization is shown for each learned behavior. A simulator based on a mini-robot Khepera has been used to learn each behavior. The results show that Uniform Coevolution obtains better generalized solutions to examples-based problems.Publicad

    Assessing the utility of mutual information stored in protein-protein interfaces to infer specific protein partners

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Instituto de Ciências Biológicas, Departamento de Biologia Celular, Programa de Pós-Graduação em Biologia Molecular, 2021.Proteínas são essenciais para diversos processos celulares. Assim, um dos objetivos centrais da Biologia é entender as relações entre sequência, estrutura e função dessas macromoléculas. Nesse contexto, as marcas deixadas pelo processo coevolutivo em sequências de proteínas parceiras são uma importante fonte de informação estrutural. De fato, as correlações estatísticas entre sítios de aminoácidos em sequências de proteínas são a base dos métodos mais modernos para a previsão de contatos inter- e intra-proteínas, predição de estrutura tridimensional, identificação de sítios funcionais e resíduos determinantes de especificidade, inferência de interações entre parálogos, entre outras aplicações. Em consonância com isso, o presente trabalho apresenta um conjunto de resultados teóricos sobre como proteínas parceiras específicas podem ser recuperadas com base apenas nas informações da sequência. No primeiro capítulo, é realizada uma decomposição da informação mútua (MI) presente nos complexos proteína-proteína, considerando a hipótese de que a MI em proteínas se origina de uma combinação de diferentes fontes: coevolutiva, evolutiva e estocástica. Foi observado que a interface contém, em média por contato, mais informações do que o restante do complexo protéico, resultado que se mantém quando se considera tanto a MI de Shannon quanto a de Tsallis como medida de informação. Essa observação levou à conclusão de que a interface contém o sinal de informação mais forte para distinguir o conjunto correto de proteínas parceiras em famílias de proteínas que interagem. Com base nisso, a utilidade de usar a MI armazenada em interfaces proteína-proteína para recuperar o conjunto correto de proteínas parceiras é avaliada no segundo capítulo. Um algoritmo genético (GA) foi desenvolvido para explorar o espaço de possíveis concatenações entre um par de famílias de proteínas que interagem usando a MI da interface como função objetivo. Usando o GA, a maximização da MI da interface foi realizada para 26 pares de famílias de proteínas que interagem e foi observado que concatenações otimizadas correspondem a soluções degeneradas com duas fontes de erro distintas, decorrentes de pareamentos errados entre (i) sequências similares e (ii) não similares. Quando os erros cometidos com sequências semelhantes foram desconsiderados, as soluções do tipo (i) apresentaram taxas de verdadeiros positivos (TP) de 70 % - muito acima das mesmas estimativas para soluções do tipo (ii). Esses resultados se mantêm quando as otimizações são feitas com base na MI de Tsallis. Essas descobertas levantam questões sobre os mecanismos por trás da coevolução de proteínas parceiras e ajudam a racionalizar os dados da literatura que mostram uma forte deterioração das taxas de TP com o aumento do número de sequência em abordagens baseadas em MI.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).Proteins are essential for several cellular processes. Hence, one of the central objectives in Biology is to understand the relationships between sequence, structure and function of these macromolecules. In this context, marks left by the coevolutionary process in interacting protein sequences are an important source of structural information. In fact, statistical correlations between amino acid sites in protein sequences are at the basis of state-of-the-art methods for prediction of inter- and intra-protein contacts, template-free structure prediction, identification of functional sites and specificity determining residues, inference of interacting paralogs, among other applications. In line with that, the present work conveys a set of theoretical results on how specific protein partners can be recovered based on sequence information alone. In the first chapter, a decomposition of the mutual information (MI) present in protein-protein complexes is carried out, considering the hypothesis that MI in proteins is originated from a combination of coevolutive, evolutive and stochastic sources. It was observed that the interface contains on average, by contact, more information than the rest of the protein complex, a result that holds when considering both Shannon and Tsallis MI as a measure of information. This observation led to the conclusion that the interface contains the strongest information signal for distinguishing the correct set of protein partners in interacting protein families. Building on that, the utility of using MI encoded on protein-protein interfaces to recover the correct set of protein partners is assessed in the second chapter. A genetic algorithm (GA) was developed to explore the space of possible concatenations between a pair of interacting protein families using the interface MI as objective function. Using the GA, interface MI maximization was performed for 26 different pairs of interacting protein families and it was observed that optimized concatenations corresponded to degenerate solutions with two distinct error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. When mistakes made among similar sequences were disregarded, type-(i) solutions were found to resolve correct pairings at best true positive (TP) rates of 70% - far above the very same estimates in type-(ii) solutions. These results hold when the optimizations are made based on Tsallis MI. These findings raise further questions about the mechanisms behind protein partners coevolution and help rationalize literature data showing a sharp deterioration of TP rates with increasing sequence number in MI-based approaches

    Cooperative coevolution of artificial neural network ensembles for pattern classification

    Get PDF
    This paper presents a cooperative coevolutive approach for designing neural network ensembles. Cooperative coevolution is a recent paradigm in evolutionary computation that allows the effective modeling of cooperative environments. Although theoretically, a single neural network with a sufficient number of neurons in the hidden layer would suffice to solve any problem, in practice many real-world problems are too hard to construct the appropriate network that solve them. In such problems, neural network ensembles are a successful alternative. Nevertheless, the design of neural network ensembles is a complex task. In this paper, we propose a general framework for designing neural network ensembles by means of cooperative coevolution. The proposed model has two main objectives: first, the improvement of the combination of the trained individual networks; second, the cooperative evolution of such networks, encouraging collaboration among them, instead of a separate training of each network. In order to favor the cooperation of the networks, each network is evaluated throughout the evolutionary process using a multiobjective method. For each network, different objectives are defined, considering not only its performance in the given problem, but also its cooperation with the rest of the networks. In addition, a population of ensembles is evolved, improving the combination of networks and obtaining subsets of networks to form ensembles that perform better than the combination of all the evolved networks. The proposed model is applied to ten real-world classification problems of a very different nature from the UCI machine learning repository and proben1 benchmark set. In all of them the performance of the model is better than the performance of standard ensembles in terms of generalization error. Moreover, the size of the obtained ensembles is also smaller

    Artificial intelligence (AI) methods in optical networks: A comprehensive survey

    Get PDF
    Producción CientíficaArtificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future.Ministerio de Economía, Industria y Competitividad (Project EC2014-53071-C3-2-P, TEC2015-71932-REDT

    The emergence of self-organisation in social systems: the case of the geographic industrial clusters

    Get PDF
    The objective of this work is to use complexity theory to propose a new interpretation of industrial clusters. Industrial clusters constitute a specific type of econosphere, whose driving principles are self-organisation, economies of diversity and a configuration that optimises the exploration of diversity starting from the configuration of connectivity of the system. This work shows the centrality of diversity by linking complexity theory (intended as "a method for understanding diversity"') to different concepts such as power law distributions, self-organisation, autocatalytic cycles and connectivity.I propose a method to distinguish self-organising from non self-organising agglomerations, based on the correlation between self-organising dynamics and power law network theories. Self-organised criticality, rank-size rule and scale-free networks theories become three aspects indicating a common underlying pattern, i.e. the edge of chaos dynamic. I propose a general model of development of industrial clusters, based on the mutual interaction between social and economic autocatalytic cycle. Starting from Kauffman's idea(^2) on the autocatalytic properties of diversity, I illustrate how the loops of the economies of diversity are based on the expansion of systemic diversity (product of diversity and connectivity). My thesis provides a way to measure systemic diversity. In particular I introduce the distinction between modular innovation at the agent level and architectural innovation at the network level and show that the cluster constitutes an appropriate organisational form to manage the tension and dynamics of simultaneous modular and architectural innovation. The thesis is structured around two propositions: 1. Self-organising systems are closer to a power law than hierarchical systems or aggregates (collection of parts). For industrial agglomerations (SLLs), the closeness to a power law is related to the degree of self-organisation present in the agglomeration, and emerges in the agglomeration’s structural and/or behavioural properties subject to self-organising dynamic.2. Self-organising systems maximise the product of diversity times connectivity at a rate higher than hierarchical systems

    Functional characterization of single amino acid variants

    Get PDF
    Single amino acid variants (SAVs) are one of the main causes of Mendelian disorders, and play an important role in the development of many complex diseases. At the same time, they are the most common kind of variation affecting coding DNA, without generally presenting any damaging effect. With the advent of next generation sequencing technologies, the detection of these variants in patients and the general population is easier than ever, but the characterization of the functional effects of each variant remains an open challenge. It is our objective in this work to tackle this problem by developing machine learning based in silico SAVs pathology predictors. Having the PMut classic predictor as a starting point, we have rethought the entire supervised learning pipeline, elaborating new training sets, features and classifiers. PMut2017 is the first result of these efforts, a new general-purpose predictor based on SwissVar and trained on 12 different conservation scores. Its performance, evaluated bothby cross-validation and different blind tests, was in line with the best predictors published to date. Continuing our efforts in search for more accurate predictors, especially for those cases were general predictors tend to fail, we developed PMut-S, a suite of 215 protein-specific predictors. Similar to PMut in nature, Pmut-S introduced the use of co-evolution conservation features and balanced training sets, and showed improved performance, specially for those proteins that were more commonly misclassified by PMut. Comparing PMut-S to other specific predictors we proved that it is possible to train specific predictors using a unique automated pipeline and match the results of most gene specific predictors released to date. The implementation of the machine learning pipeline of both PMut and PMut-S was released as an open source Python module: PyMut, which bundles functions implementing the features computation and selection, classifier training and evaluation, plots drawing, among others. Their predictions were also made available in a rich web portal, which includes a precomputed repository with analyses of more than 700 million variants on over 100,000 human proteins, together with relevant contextual information such as 3D visualizationsof protein structures, links to databases, functional annotations, and more.Les mutacions puntuals d’aminoàcids són la principal causa de moltes malalties mendelianes, i juguen un paper important en el desenvolupament de moltes malalties complexes. Alhora, són el tipus de variant més comuna que afecta l’ADN codificant de proteïnes, sense provocar, en general, cap efecte advers. Amb l’adveniment de la seqüenciació de nova generació, la detecció d’aquestes variants en pacients i en la població general és més fàcil que mai, però la caracterització dels efectes funcionals de cada variant segueix sent un repte. El nostre objectiu en aquest treball és abordar aquest problema desenvolupant predictors de patologia in silico basats en l’aprenentatge automàtic. Prenent el predictor clàssic PMut com a punt de partida, hem repensat tot el procés d’aprenentatge supervisat, elaborant nous conjunts d’entrenament, descriptors i classificadors. PMut2017 és el primer resultat d’aquests esforços, un nou predictor basat en SwissVar i entrenat amb 12 mètriques de conservació de seqüència. La seva precisió, mesurada mitjançant validació creuada i amb tests cecs, s’ha mostrar en línia amb els millors predictors publicats a dia d’avui. Continuant els nostres esforços en la cerca de predictors més acurats, hem desenvolupat PMut-S, un conjunt de 215 predictors específics per cada proteïna. Similar a PMut en la seva concepció, PMut-S introdueix l’ús de descriptors basats en la coevolució i conjunts d’entrenament balancejats, millorant el rendiment de PMut2017 en 0.1 punts del coeficient de correlació de Matthews. Comparant PMut-S a d’altres predictors específics hem provat que és possible entrenar predictors específics seguint un únic procediment automatitzat i assolir uns resultats tan bon com els de la majoria de predictors específics publicats. La implementació del procediment d’aprenentatge automàtic tant de PMut com de PMut-S ha sigut publicat com a un mòdul de Python de codi obert: PyMut, el qual inclou les funcions que implementen el càlcul dels descriptors i la seva selecció, l’entrenament i avaluació dels classificadors, el dibuix de diverses gràfiques... Les prediccions també estan disponibles en un portal web que inclou un repositori precalculat amb els anàlisis de més de 700 milions de variants en més de 100 mil proteïnes humanes, junt a rellevant informació de context com visualitzacions 3D de les proteïnes, enllaços a bases de dades, anotacions funcionals i molt més

    Logic-based machine learning using a bounded hypothesis space: the lattice structure, refinement operators and a genetic algorithm approach

    Get PDF
    Rich representation inherited from computational logic makes logic-based machine learning a competent method for application domains involving relational background knowledge and structured data. There is however a trade-off between the expressive power of the representation and the computational costs. Inductive Logic Programming (ILP) systems employ different kind of biases and heuristics to cope with the complexity of the search, which otherwise is intractable. Searching the hypothesis space bounded below by a bottom clause is the basis of several state-of-the-art ILP systems (e.g. Progol and Aleph). However, the structure of the search space and the properties of the refinement operators for theses systems have not been previously characterised. The contributions of this thesis can be summarised as follows: (i) characterising the properties, structure and morphisms of bounded subsumption lattice (ii) analysis of bounded refinement operators and stochastic refinement and (iii) implementation and empirical evaluation of stochastic search algorithms and in particular a Genetic Algorithm (GA) approach for bounded subsumption. In this thesis we introduce the concept of bounded subsumption and study the lattice and cover structure of bounded subsumption. We show the morphisms between the lattice of bounded subsumption, an atomic lattice and the lattice of partitions. We also show that ideal refinement operators exist for bounded subsumption and that, by contrast with general subsumption, efficient least and minimal generalisation operators can be designed for bounded subsumption. In this thesis we also show how refinement operators can be adapted for a stochastic search and give an analysis of refinement operators within the framework of stochastic refinement search. We also discuss genetic search for learning first-order clauses and describe a framework for genetic and stochastic refinement search for bounded subsumption. on. Finally, ILP algorithms and implementations which are based on this framework are described and evaluated.Open Acces

    A Novel Cooperative Algorithm for Clustering Large Databases With Sampling.

    Get PDF
    Agrupamento de dados é uma tarefa recorrente em mineração de dados. Com o passar do tempo, vem se tornando mais importante o agrupamento de bases cada vez maiores. Contudo, aplicar heurísticas de agrupamento tradicionais em grandes bases não é uma tarefa fácil. Essas técnicas geralmente possuem complexidades pelo menos quadráticas no número de pontos da base, tornando o seu uso inviável pelo alto tempo de resposta ou pela baixa qualidade da solução final. A solução mais comumente utilizada para resolver o problema de agrupamento em bases de dados grandes é usar algoritmos especiais, mais fracos no ponto de vista da qualidade. Este trabalho propõe uma abordagem diferente para resolver esse problema: o uso de algoritmos tradicionais, mais fortes, em um sub-conjunto dos dados originais. Esse sub-conjunto dos dados originais é obtido com uso de um algoritmo co-evolutivo que seleciona um sub-conjunto de pontos difícil de agrupar
    corecore