25 research outputs found

    Programmation et indécidabilités dans les systèmes complexes

    No full text
    N/AUn système complexe est un système constitué d'un ensemble d'entités quiinteragissent localement, engendrant des comportements globaux, émergeant dusystème, qu'on ne sait pas expliquer à partir du comportement local, connu, desentités qui le constituent. Nos travaux ont pour objet de mieux cerner lesliens entre certaines propriétés des systèmes complexes et le calcul. Parcalcul, il faut entendre l'objet d'étude de l'informatique, c'est-à-dire ledéplacement et la combinaison d'informations. À l'aide d'outils issus del'informatique, l'algorithmique et la programmation dans les systèmes complexessont abordées selon trois points de vue. Une première forme de programmation,dite externe, consiste à développer l'algorithmique qui permet de simuler lessystèmes étudiés. Une seconde forme de programmation, dite interne, consiste àdévelopper l'algorithmique propre à ces systèmes, qui permet de construire desreprésentants de ces systèmes qui exhibent des comportements programmés. Enfin,une troisième forme de programmation, de réduction, consiste à plonger despropriétés calculatoires complexes dans les représentants de ces systèmes pourétablir des résultats d'indécidabilité -- indice d'une grande complexitécalculatoire qui participe à l'explication de la complexité émergente. Afin demener à bien cette étude, les systèmes complexes sont modélisés par desautomates cellulaires. Le modèle des automates cellulaires offre une dualitépertinente pour établir des liens entre complexité des propriétés globales etcalcul. En effet, un automate cellulaire peut être décrit à la fois comme unréseau d'automates, offrant un point de vue familier de l'informatique, etcomme un système dynamique discret, une fonction définie sur un espacetopologique, offrant un point de vue familier de l'étude des systèmesdynamiques discrets.Une première partie de nos travaux concerne l'étude de l'objet automatecellulaire proprement dit. L'observation expérimentale des automatescellulaires distingue, dans la littérature, deux formes de dynamiques complexesdominantes. Certains automates cellulaires présentent une dynamique danslaquelle émergent des structures simples, sortes de particules qui évoluentdans un domaine régulier, se rencontrent lors de brèves collisions, avant degénérer d'autres particules. Cette forme de complexité, dans laquelletransparaît une notion de quanta d'information localisée en interaction, estl'objet de nos études. Un premier champ de nos investigations est d'établir uneclassification algébrique, le groupage, qui tend à rendre compte de ce type decomportement. Cette classification met à jour un type d'automate cellulaireparticulier : les automates cellulaires intrinsèquement universels. Un automatecellulaire intrinsèquement universel est capable de simuler le comportement detout automate cellulaire. C'est l'objet de notre second champ d'investigation.Nous caractérisons cette propriété et démontrons son indécidabilité. Enfin, untroisième champ d'investigation concerne l'algorithmique des automatescellulaires à particules et collisions. Étant donné un ensemble de particuleset de collisions d'un tel automate cellulaire, nous étudions l'ensemble desinteractions possibles et proposons des outils pour une meilleure programmationinterne à l'aide de ces collisions.Une seconde partie de nos travaux concerne la programmation par réduction. Afinde démontrer l'indécidabilité de propriétés dynamiques des automatescellulaires, nous étudions d'une part les problèmes de pavage du plan par desjeux de tuiles finis et d'autre part les problèmes de mortalité et depériodicité dans les systèmes dynamiques discrets à fonction partielle. Cetteétude nous amène à considérer des objets qui possèdent la même dualité entredescription combinatoire et topologique que les automates cellulaires. Unenotion d'apériodicité joue un rôle central dans l'indécidabilité des propriétésde ces objets

    Protein structure prediction and structure-based protein function annotation

    Get PDF
    Nature tends to modify rather than invent function of protein molecules, and the log of the modifications is encrypted in the gene sequence. Analysis of these modification events in evolutionarily related genes is important for assigning function to hypothetical genes and their products surging in databases, and to improve our understanding of the bioverse. However, random mutations occurring during evolution chisel the sequence to an extent that both decrypting these codes and identifying evolutionary relatives from sequence alone becomes difficult. Thankfully, even after many changes at the sequence level, the protein three-dimensional structures are often conserved and hence protein structural similarity usually provide more clues on evolution of functionally related proteins. In this dissertation, I study the design of three bioinformatics modules that form a new hierarchical approach for structure prediction and function annotation of proteins based on sequence-to-structure-to-function paradigm. First, we design an online platform for structure prediction of protein molecules using multiple threading alignments and iterative structural assembly simulations (I-TASSER). I review the components of this module and have added features that provide function annotation to the protein sequences and help to combine experimental and biological data for improving the structure modeling accuracy. The online service of the system has been supporting more than 20,000 biologists from over 100 countries. Next, we design a new comparative approach (COFACTOR) to identify the location of ligand binding sites on these modeled protein structures and spot the functional residue constellations using an innovative global-to-local structural alignment procedure and functional sites in known protein structures. Based on both large-scale benchmarking and blind tests (CASP), the method demonstrates significant advantages over the state-of-the- art methods of the field in recognizing ligand-binding residues for both metal and non- metal ligands. The major advantage of the method is the optimal combination of the local and global protein structural alignments, which helps to recognize functionally conserved structural motifs among proteins that have taken different evolutionary paths. We further extend the COFACTOR global-to-local approach to annotate the gene- ontology and enzyme classifications of protein molecules. Here, we added two new components to COFACTOR. First, we developed a new global structural match algorithm that allows performing better structural search. Second, a sensitive technique was proposed for constructing local 3D-signature motifs of template proteins that lack known functional sites, which allows us to perform query-template local structural similarity comparisons with all template proteins. A scoring scheme that combines the confidence score of structure prediction with global-local similarity score is used for assigning a confidence score to each of the predicted function. Large scale benchmarking shows that the predicted functions have remarkably improved precision and recall rates and also higher prediction coverage than the state-of-art sequence based methods. To explore the applicability of the method for real-world cases, we applied the method to a subset of ORFs from Chlamydia trachomatis and the functional annotations provided new testable hypothesis for improving the understanding of this phylogenetically distinct bacterium

    Neural Architecture Search for Genomic Sequence Data

    Get PDF

    The ECB's New Multi-Country Model for the euro area: NMCM - with boundedly rational learning expectations

    Get PDF
    Rational expectations has been the dominant way to model expectations, but the literature has quickly moved to a more realistic assumption of boundedly rational learning where agents are assumed to use only a limited set of information to form their expectations. A standard assumption is that agents form expectations by using the correctly specified reduced form model of the economy, the minimal state variable solution (MSV), but they do not know the parameters. However, with medium-sized and large models the closed-form MSV solutions are difficult to attain given the large number of variables that could be included. Therefore, agents base expectations on a misspecified MSV solution. In contrast, we assume agents know the deep parameters of their own optimising frameworks. However, they are not assumed to know the structure nor the parameterisation of the rest of the economy, nor do they know the stochastic processes generating shocks hitting the economy. In addition, agents are assumed to know that the changes (or the growth rates) of fundament variables can be modelled as stationary ARMA (p,q) processes, the exact form of which is not, however, known by agents. This approach avoids the complexities of dealing with a potential vast multitude of alternative mis-specified MSVs. Using a new Multi-country Euro area Model with Boundedly Estimated Rationality we show this approach is compatible with the same limited information assumption that was used in deriving and estimating the behavioral equations of different optimizing agents. We find that there are strong differences in the adjustment path to the shocks to the economy when agent form expectations using our learning approach compared to expectations formed under the assumption of strong rationality. Furthermore, we find that some variation in expansionary fiscal policy in periods of downturns compared to boom periods. JEL Classification: C51, D83, D84, E17, E32bounded rationality, Expectation, heterogeneity, imperfect information, Learning, macro modelling, open-economy macroeconomics

    Bayesian hyper-parameter optimisation for malware detection

    Get PDF
    Malware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often depends significantly on parametric choices, so the question arises as to what parameter choices are optimal. In this paper, we investigate how best to tune the parameters of machine learning algorithms—a process generally known as hyper-parameter optimisation—in the context of malware detection. We examine the effects of some simple (model-free) ways of parameter tuning together with a state-of-the-art Bayesian model-building approach. Our work is carried out using Ember, a major published malware benchmark dataset of Windows Portable Execution metadata samples, and a smaller dataset from kaggle.com (also comprising Windows Portable Execution metadata). We demonstrate that optimal parameter choices may differ significantly from default choices and argue that hyper-parameter optimisation should be adopted as a ‘formal outer loop’ in the research and development of malware detection systems. We also argue that doing so is essential for the development of the discipline since it facilitates a fair comparison of competing machine learning algorithms applied to the malware detection problem

    Development of a WEB application

    Get PDF
    Ankara : The Department of Molecular Biology and Genetics and the Graduate School of Engineering and Science of Bilkent University, 2011.Thesis (Ph. D.) -- Bilkent University, 2011.Includes bibliographical references leaves 99-115.microRNAs, small non-coding RNA molecules with important roles in cellular machinery, target mRNAs for silencing by binding generally to their 3’ UTR sequences via partial base complementation. Thus, microRNAs with similar sequences also might exhibit expression and/or functional similarities. In this study, a modular tool, mESAdb (http://konulab.fen.bilkent.edu.tr/mirna/), was developed allowing for multivariate analysis of sequences and expression of microRNAs from multiple taxa. Its framework comprises PHP, JavaScript, packages in the R language, and a database storing mature microRNA sequences along with microRNA targets and selected expression data sets for human, mouse and zebrafish. mESAdb allows for: (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by a sequence motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HuGE Navigator, KEGG and GO. mESAdb also permits user specified dataset upload for these analyses. Herein, utility of mESAdb was illustrated using different datasets and case studies. First, it was shown that microRNAs carrying the embryonic stem cell specific seed sequence, ‘AAGTGC’, were able to discriminate between normal and tumor tissues from hepatocellular carcinoma patients using dataset GSE10694. Second, mRNA targets of a set of liver specific microRNAs were annotated with human diseases based on HuGE Navigator. Third, the similarity between mouse and human tissue specificity of a given set of microRNAs was demonstrated. Forth, CHRNA5 targeting microRNAs were associated with estrogen receptor status in breast cancer using dataset GSE15885. Finally, a related tool under development for mRNA arrays planned for integration with mESAdb was presented.Kaya, Koray DoğanPh.D

    Stochastic Derivative-Free Optimization of Noisy Functions

    Get PDF
    Optimization problems with numerical noise arise from the growing use of computer simulation of complex systems. This thesis concerns the development, analysis and applications of randomized derivative-free optimization (DFO) algorithms for noisy functions. The first contribution is the introduction of DFO-VASP, an algorithm for solving the problem of finding the optimal volumetric alignment of protein structures. Our method compensates for noisy, variable-time volume evaluations and warm-starts the search for globally optimal superposition. These techniques enable DFO-VASP to generate practical and accurate superpositions in a timely manner. The second algorithm, STARS, is aimed at solving general noisy optimization problems and employs a random search framework while dynamically adjusting the smoothing step-size using noise information. rate analysis of this algorithm is provided in both additive and multiplicative noise settings. STARS outperforms randomized zero-order methods in both additive and multiplicative settings and has an advantage of being insensitive to the level noise in terms of number of function evaluations and final objective value. The third contribution is a trust-region model-based algorithm STORM, that relies on constructing random models and estimates that are sufficiently accurate with high probability. This algorithm is shown to converge with probability one. Numerical experiments show that STORM outperforms other stochastic DFO methods in solving noisy functions

    Visualisation Support for Biological Bayesian Network Inference

    Get PDF
    Extracting valuable information from the visualisation of biological data and turning it into a network model is the main challenge addressed in this thesis. Biological networks are mathematical models that describe biological entities as nodes and their relationships as edges. Because they describe patterns of relationships, networks can show how a biological system works as a whole. However, network inference is a challenging optimisation problem impossible to resolve computationally in polynomial time. Therefore, the computational biologists (i.e. modellers) combine clustering and heuristic search algorithms with their tacit knowledge to infer networks. Visualisation can play an important role in supporting them in their network inference workflow. The main research question is: “How can visualisation support modellers in their workflow to infer networks from biological data?” To answer this question, it was required to collaborate with computational biologists to understand the challenges in their workflow and form research questions. Following the nested model methodology helped to characterise the domain problem, abstract data and tasks, design effective visualisation components and implement efficient algorithms. Those steps correspond to the four levels of the nested model for collaborating with domain experts to design effective visualisations. We found that visualisation can support modellers in three steps of their workflow. (a) To select variables, (b) to infer a consensus network and (c) to incorporate information about its dynamics.To select variables (a), modellers first apply a hierarchical clustering algorithm which produces a dendrogram (i.e. a tree structure). Then they select a similarity threshold (height) to cut the tree so that branches correspond to clusters. However, applying a single-height similarity threshold is not effective for clustering heterogeneous multidimensional data because clusters may exist at different heights. The research question is: Q1 “How to provide visual support for the effective hierarchical clustering of many multidimensional variables?” To answer this question, MLCut, a novel visualisation tool was developed to enable the application of multiple similarity thresholds. Users can interact with a representation of the dendrogram, which is coordinated with a view of the original multidimensional data, select branches of the tree at different heights and explore different clustering scenarios. Using MLCut in two case studies has shown that this method provides transparency in the clustering process and enables the effective allocation of variables into clusters.Selected variables and clusters constitute nodes in the inferred network. In the second step (b), modellers apply heuristic search algorithms which sample a solution space consisting of all possible networks. The result of each execution of the algorithm is a collection of high-scoring Bayesian networks. The task is to guide the heuristic search and help construct a consensus network. However, this is challenging because many network results contain different scores produced by different executions of the algorithm. The research question is: Q2 “How to support the visual analysis of heuristic search results, to infer representative models for biological systems?” BayesPiles, a novel interactive visual analytics tool, was developed and evaluated in three case studies to support modellers explore, combine and compare results, to understand the structure of the solution space and to construct a consensus network.As part of the third step (c), when the biological data contain measurements over time, heuristics can also infer information about the dynamics of the interactions encoded as different types of edges in the inferred networks. However, representing such multivariate networks is a challenging visualisation problem. The research question is: Q3 “How to effectively represent information related to the dynamics of biological systems, encoded in the edges of inferred networks?” To help modellers explore their results and to answer Q3, a human-centred crowdsourcing experiment took place to evaluate the effectiveness of four visual encodings for multiple edge types in matrices. The design of the tested encodings combines three visual variables: position, orientation, and colour. The study showed that orientation outperforms position and that colour is helpful in most tasks. The results informed an extension to the design of BayePiles, which modellers evaluated exploring dynamic Bayesian networks. The feedback of most participants confirmed the results of the crowdsourcing experiment.This thesis focuses on the investigation, design, and application of visualisation approaches for gaining insights from biological data to infer network models. It shows how visualisation can help modellers in their workflow to select variables, to construct representative network models and to explore their different types of interactions, contributing in gaining a better understanding of how biological processes within living organisms work

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
    corecore