22 research outputs found

    Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources

    Get PDF
    BACKGROUND: In order to improve gene prediction, extrinsic evidence on the gene structure can be collected from various sources of information such as genome-genome comparisons and EST and protein alignments. However, such evidence is often incomplete and usually uncertain. The extrinsic evidence is usually not sufficient to recover the complete gene structure of all genes completely and the available evidence is often unreliable. Therefore extrinsic evidence is most valuable when it is balanced with sequence-intrinsic evidence. RESULTS: We present a fairly general method for integration of external information. Our method is based on the evaluation of hints to potentially protein-coding regions by means of a Generalized Hidden Markov Model (GHMM) that takes both intrinsic and extrinsic information into account. We used this method to extend the ab initio gene prediction program AUGUSTUS to a versatile tool that we call AUGUSTUS+. In this study, we focus on hints derived from matches to an EST or protein database, but our approach can be used to include arbitrary user-defined hints. Our method is only moderately effected by the length of a database match. Further, it exploits the information that can be derived from the absence of such matches. As a special case, AUGUSTUS+ can predict genes under user-defined constraints, e.g. if the positions of certain exons are known. With hints from EST and protein databases, our new approach was able to predict 89% of the exons in human chromosome 22 correctly. CONCLUSION: Sensitive probabilistic modeling of extrinsic evidence such as sequence database matches can increase gene prediction accuracy. When a match of a sequence interval to an EST or protein sequence is used it should be treated as compound information rather than as information about individual positions

    Proceedings of Mathsport international 2017 conference

    Get PDF
    Proceedings of MathSport International 2017 Conference, held in the Botanical Garden of the University of Padua, June 26-28, 2017. MathSport International organizes biennial conferences dedicated to all topics where mathematics and sport meet. Topics include: performance measures, optimization of sports performance, statistics and probability models, mathematical and physical models in sports, competitive strategies, statistics and probability match outcome models, optimal tournament design and scheduling, decision support systems, analysis of rules and adjudication, econometrics in sport, analysis of sporting technologies, financial valuation in sport, e-sports (gaming), betting and sports

    Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments

    Get PDF
    Reinforcement Learning (RL) is based on the Markov Decision Process (MDP) framework, but not all the problems of interest can be modeled with MDPs because some of them have non-markovian temporal dependencies. To handle them, one of the solutions proposed in literature is Hierarchical Reinforcement Learning (HRL). HRL takes inspiration from hierarchical planning in artificial intelligence literature and it is an emerging sub-discipline for RL, in which RL methods are augmented with some kind of prior knowledge about the high-level structure of behavior in order to decompose the underlying problem into simpler sub-problems. The high-level goal of our thesis is to investigate the advantages that a HRL approach may have over a simple RL approach. Thus, we study problems of interest (rarely tackled by mean of RL) like Sentiment Analysis, Rogue and Car Controller, showing how the ability of RL algorithms to solve them in a partially observable environment is affected by using (or not) generic hierarchical architectures based on RL algorithms of the Actor-Critic family. Remarkably, we claim that especially our work in Sentiment Analysis is very innovative for RL, resulting in state-of-the-art performances; as far as the author knows, Reinforcement Learning approach is only rarely applied to the domain of computational linguistic and sentiment analysis. Furthermore, our work on the famous video-game Rogue is probably the first example of Deep RL architecture able to explore Rogue dungeons and fight against its monsters achieving a success rate of more than 75% on the first game level. While our work on Car Controller allowed us to make some interesting considerations on the nature of some components of the policy gradient equation

    Fuzzy Sets in Business Management, Finance, and Economics

    Get PDF
    This book collects fifteen papers published in s Special Issue of Mathematics titled “Fuzzy Sets in Business Management, Finance, and Economics”, which was published in 2021. These paper cover a wide range of different tools from Fuzzy Set Theory and applications in many areas of Business Management and other connected fields. Specifically, this book contains applications of such instruments as, among others, Fuzzy Set Qualitative Comparative Analysis, Neuro-Fuzzy Methods, the Forgotten Effects Algorithm, Expertons Theory, Fuzzy Markov Chains, Fuzzy Arithmetic, Decision Making with OWA Operators and Pythagorean Aggregation Operators, Fuzzy Pattern Recognition, and Intuitionistic Fuzzy Sets. The papers in this book tackle a wide variety of problems in areas such as strategic management, sustainable decisions by firms and public organisms, tourism management, accounting and auditing, macroeconomic modelling, the evaluation of public organizations and universities, and actuarial modelling. We hope that this book will be useful not only for business managers, public decision-makers, and researchers in the specific fields of business management, finance, and economics but also in the broader areas of soft mathematics in social sciences. Practitioners will find methods and ideas that could be fruitful in current management issues. Scholars will find novel developments that may inspire further applications in the social sciences

    Transcript identification from deep sequencing data

    Get PDF
    Ribonucleic acid (RNA) sequences are polymeric molecules ubiquitous in every living cell. RNA molecules mediate the flow of information from the DNA sequence to most functional elements in the cell. Therefore, it is of great interest in biological and biomedical research to associate RNA molecules to a biological function and to understand mechanisms of their regulation. The goal of this study is the characterization of the RNA sequence composi- tion of biological samples (transcriptome) to facilitate the understanding of RNA function and regulation. Traditionally, a similar task has been addressed by algorithms called gene finding systems, predicting RNA sequences (transcripts) from features of the genomic DNA sequence. Lacking sufficient experimental evidence for most of the genes, these systems learn sequence patterns on a few genes with direct evidence to identify many additional genes in the genome. High-throughput sequencing of RNA (RNA-Seq) has recently become a powerful tech- nology in studying the transcriptome. This technology identifies millions of short RNA fragments (reads of ≈100 letters length), holding direct evidence for a large fraction of the genes. However, the analysis of RNA-Seq data faces profound challenges. Firstly, the distribution of RNA-Seq reads is highly uneven among genes, resulting in a considerable fraction of genes with very few reads and the stochastic nature of the technology leads to gaps even for well covered genes. To accurately predict transcripts in cases with incomplete evidence, we need to combine RNA-Seq evidence with features derived from the genomic DNA sequence. We therefore developed a method to learn the integration of both information sources and implemented this strategy as an extension of the gene finder mGene. The system, now called mGene.ngs, determines close approximations of potentially non-linear transformations for all features on the training set, such that the prediction performance is maximized. With this ability, which is to our knowledge unique among gene finding systems, mGene.ngs can not only learn complex relationships between the two mentioned information sources, but gains the flexibility to take many additional information sources into account. mGene.ngs has been independently evaluated within the context of an international competition (RGASP) for RNA-Seq-based reannotation and has shown very favourable performance for two out of three model organisms. Moreover, we generated and analyzed RNA-Seq-based annotations for 20 Arabidopsis thaliana strains, to facilitate a deeper understanding of phenotypic variation in this natural plant population. A second major challenge in transcriptome reconstruction lies in the complexity of the transcriptome itself. A process called alternative splicing generates multiple mature RNA sequences from a single primary RNA sequence by cutting out so-called introns, typically in a tightly regulated manner. Inference algorithms of almost all gene finding systems are limited to predict transcripts not overlapping in their genomic region of origin. To overcome this limitation, purely RNA-Seq-based approaches have been developed. However, biologically implausible assumptions or the neglect of available information often led to unsatisfactory results. A major contribution of this study is the integer optimization-based transcriptome reconstruction approach MiTie. MiTie utilizes a biologically motivated loss function, can take advantage of a priori known genome annotations and gains predictive power by considering multiple RNA-Seq samples simultaneously. Based on simulated data for the human genome as well as on an extensive RNA-Seq data set for the model organism Drosophila melanogaster we show that MiTie predicts transcripts significantly more accurate than state-of-the-art methods like Cufflinks and Trinity

    Leistungsbasierte Steuerung der Dienstleistungsnetzwerke von Service-Integratoren in der Logistik

    Get PDF
    Die vorliegende Arbeit setzt sich mit den Steuerungsanforderungen in einem Wertschöpfungsnetzwerk eines Service-Integrators auseinander. Hierzu wird zunächst das Geschäftsmodell des Service-Integrators abgegrenzt und eine Einordnung in eine Typologie von Netzwerkunternehmen vorgenommen. Anschließend erfolgt eine Untersuchung der Steuerungsbedarfe aus organisationstheoretischer Sicht und ein Abgleich mit entsprechenden Instrumenten von bestehenden Arbeiten aus dem Bereich der Steuerung von Unternehmensnetzwerken. Es wird dabei gezeigt, dass die Konzepte Vertrauen und Reputation nicht inkludiert sind, woraufhin ein leistungsbasiertes Verständnis von Reputation, auf Basis von Vertragsverletzungen, aufgebaut wird, nachdem ein entsprechendes Verständnis von Vertrauen und Reputation aus betriebswirtschaftlicher Literatur abgeleitet wurde. Dieses Instrument wird, im Rahmen eines auf Basis von SCOR entwickelten Kennzahlenmodells, als Instrument zur Steuerung von Netzwerken von Service Integratoren aufgebaut. Abschließend erfolgt eine prototypische Umsetzung und damit die Evaluierung der Machbarkeit und Gültigkeit unter den getroffenen Annahmen, dieses Instruments im Rahmen eines Instruments für einen logistischen Service-Integrator. Im Zuge der Arbeit wird gezeigt, dass die Steuerung des Netzwerks eines Service-Integrators besondere Anforderungen an eine Netzwerksteuerung stellt, welche durch ein erweitertes, leistungsbasiertes Instrument, auf Basis der Verhaltenshistorie erfüllt werden können

    A Statistical Investigation into Factors Affecting Results of One Day International Cricket Matches

    Get PDF
    The effect of playing “home” or “away” and many other factors, such as batting first or second, winning or losing the toss, have been hypothesised as influencing the outcome of major cricket matches. Anecdotally, it has often been noted that Subcontinental sides (India, Pakistan, Sri Lanka and Bangladesh) tend to perform much better on the Subcontinent than away from it, whilst England do better in Australia during cooler, damper Australian Summers than during hotter, drier ones. In this paper, focusing on results of men’s One Day International (ODI) matches involving England, we investigate the extent to which a number of factors – including playing home or away (or the continent of the venue), batting or fielding first, winning or losing the toss, the weather conditions during the game, the condition of the pitch, and the strength of each team’s top batting and bowling resources – influence the outcome of matches. By employing a variety of Statistical techniques, we find that the continent of the venue does appear to be a major factor affecting the result, but winning the toss does not. We then use the factors identified as significant in an attempt to build a Binary Logistic Regression Model that will estimate the probability of England winning at various stages of a game. Finally, we use this model to predict the results of some England ODI games not used in training the model

    Statistical Foundations of Actuarial Learning and its Applications

    Get PDF
    This open access book discusses the statistical modeling of insurance problems, a process which comprises data collection, data analysis and statistical model building to forecast insured events that may happen in the future. It presents the mathematical foundations behind these fundamental statistical concepts and how they can be applied in daily actuarial practice. Statistical modeling has a wide range of applications, and, depending on the application, the theoretical aspects may be weighted differently: here the main focus is on prediction rather than explanation. Starting with a presentation of state-of-the-art actuarial models, such as generalized linear models, the book then dives into modern machine learning tools such as neural networks and text recognition to improve predictive modeling with complex features. Providing practitioners with detailed guidance on how to apply machine learning methods to real-world data sets, and how to interpret the results without losing sight of the mathematical assumptions on which these methods are based, the book can serve as a modern basis for an actuarial education syllabus

    Advances in Reliability, Risk and Safety Analysis with Big Data: Proceedings of the 57th ESReDA Seminar: Hosted by the Technical University of Valencia, 23-24 October, 2019, Valencia, Spain

    Get PDF
    The publication presents 57th Seminar organized by ESReDA that took place at the Polytechnic University of Valencia/Universitat PolitĂšcnica de Valencia, Spain. The Seminar was jointly organized by ESReDA and CMT Motores Termicos, a research unit at the Polytechnic University of Valencia. In accordance with the theme proposed for the Seminar, communications were presented that made it possible to discuss and better understand the role of the latest big data, machine learning and artificial intelligence technologies in the development of reliability, risk and safety analyses for industrial systems. The world is moving fast towards wide applications of big data techniques and artificial intelligence is considered to be the future of our societies. Rapid development of 5G telecommunications infrastructure would only speed up deployment of big data analytic tools. However, despite the recent advances in the these fields, there is still a long way to go for integrated applications of big data, machine learning and artificial intelligence tools in business practice. We would like to express our gratitude to the authors and key note speakers in particular and to all those who shared with us these moments of discussion on subjects of great importance and topicality for the members of ESReDA. The editorial work for this volume was supported by the Joint Research Centre of the European Commission in the frame of JRC support to ESReDA activities.JRC.C.3-Energy Security, Distribution and Market

    Statistical Foundations of Actuarial Learning and its Applications

    Get PDF
    This open access book discusses the statistical modeling of insurance problems, a process which comprises data collection, data analysis and statistical model building to forecast insured events that may happen in the future. It presents the mathematical foundations behind these fundamental statistical concepts and how they can be applied in daily actuarial practice. Statistical modeling has a wide range of applications, and, depending on the application, the theoretical aspects may be weighted differently: here the main focus is on prediction rather than explanation. Starting with a presentation of state-of-the-art actuarial models, such as generalized linear models, the book then dives into modern machine learning tools such as neural networks and text recognition to improve predictive modeling with complex features. Providing practitioners with detailed guidance on how to apply machine learning methods to real-world data sets, and how to interpret the results without losing sight of the mathematical assumptions on which these methods are based, the book can serve as a modern basis for an actuarial education syllabus
    corecore