152 research outputs found
A mixed integer linear program to compress transition probability matrices in Markov chain bootstrapping
Bootstrapping time series is one of the most acknowledged tools to study the statistical properties of an evolutive phenomenon. An important class of bootstrapping methods is based on the assumption that the sampled phenomenon evolves according to a Markov chain. This assumption does not apply when the process takes values in a continuous set, as it frequently happens with time series related to economic and financial phenomena. In this paper we apply the Markov chain theory for bootstrapping continuous-valued processes, starting from a suitable discretization of the support that provides the state space of a Markov chain of order k≥1. Even for small k, the number of rows of the transition probability matrix is generally too large and, in many practical cases, it may incorporate much more information than it is really required to replicate the phenomenon satisfactorily. The paper aims to study the problem of compressing the transition probability matrix while preserving the “law” characterising the process that generates the observed time series, in order to obtain bootstrapped series that maintain the typical features of the observed time series. For this purpose, we formulate a partitioning problem of the set of rows of such a matrix and propose a mixed integer linear program specifically tailored for this particular problem. We also provide an empirical analysis by applying our model to the time series of Spanish and German electricity prices, and we show that, in these medium size real-life instances, bootstrapped time series reproduce the typical features of the ones under observation.
This is a post-peer-review, pre-copyedit version of an article published in Annals of Operations Research volume. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10479-016-2181-
Killing Two Birds with One Stone: The Concurrent Development of the Novel Alignment Free Tree Building Method, Scrawkov-Phy, and the Extensible Phyloinformatics Utility, EMU-Phy.
Many components of phylogenetic inference belong to the most computationally challenging and complex domain of problems. To further escalate the challenge, the genomics revolution has exponentially increased the amount of data available for analysis. This, combined with the foundational nature of phylogenetic analysis, has prompted the development of novel methods for managing and analyzing phylogenomic data, as well as improving or intelligently utilizing current ones. In this study, a novel alignment tree building algorithm using Quasi-Hidden Markov Models (QHMMs), Scrawkov-Phy, is introduced. Additionally, exploratory work in the design and implementation of an extensible phyloinformatics tool, EMU-Phy, is described. Lastly, features of the best-practice tools are inspected and provisionally incorporated into Scrawkov-Phy to evaluate the algorithm’s suitability for said features.
This study shows that Scrawkov-Phy, as utilized through EMU-Phy, captures phylogenetic signal and reconstructs reasonable phylogenies without the need for multiple-sequence alignment or high-order statistical models. There are numerous additions to both Scrawkov-Phy and EMU-Phy which would improve their efficacy and the results of the provisional study shows that such additions are compatible
Recommended from our members
Quantum Stochastic Processes and Quantum Many-Body Physics
This dissertation investigates the theory of quantum stochastic processes and its applications in quantum many-body physics.
The main goal is to analyse complexity-theoretic aspects of both static and dynamic properties of physical systems modelled by quantum stochastic processes.
The thesis consists of two parts: the first one addresses the computational complexity of certain quantum and classical divisibility questions, whereas the second one addresses the topic of Hamiltonian complexity theory.
In the divisibility part, we discuss the question whether one can efficiently sub-divide a map describing the evolution of a system in a noisy environment, i.e. a CPTP- or stochastic map for quantum and classical processes, respectively, and we prove that taking the nth root of a CPTP or stochastic map is an NP-complete problem.
Furthermore, we show that answering the question whether one can divide up a random variable into a sum of iid random variables , i.e. , is poly-time computable; relaxing the iid condition renders the problem NP-hard.
In the local Hamiltonian part, we study computation embedded into the ground state of a many-body quantum system, going beyond "history state" constructions with a linear clock.
We first develop a series of mathematical techniques which allow us to study the energy spectrum of the resulting Hamiltonian, and extend classical string rewriting to the quantum setting.
This allows us to construct the most physically-realistic QMAEXP-complete instances for the LOCAL HAMILTONIAN problem (i.e. the question of estimating the ground state energy of a quantum many-body system) known to date, both in one- and three dimensions.
Furthermore, we study weighted versions of linear history state constructions, allowing us to obtain tight lower and upper bounds on the promise gap of the LOCAL HAMILTONIAN problem in various cases.
We finally study a classical embedding of a Busy Beaver Turing Machine into a low-dimensional lattice spin model, which allows us to dictate a transition from a purely classical phase to a Toric Code phase at arbitrarily large and potentially even uncomputable system sizes
Graphical models beyond standard settings: lifted decimation, labeling, and counting
With increasing complexity and growing problem sizes in AI and Machine Learning, inference and learning are still major issues in Probabilistic Graphical Models (PGMs). On the other hand, many problems are specified in such a way that symmetries arise from the underlying model structure. Exploiting these symmetries during inference, which is referred to as "lifted inference", has lead to significant efficiency gains. This thesis provides several enhanced versions of known algorithms that show to be liftable too and thereby applies lifting in "non-standard" settings. By doing so, the understanding of the applicability of lifted inference and lifting in general is extended. Among various other experiments, it is shown how lifted inference in combination with an innovative Web-based data harvesting pipeline is used to label author-paper-pairs with geographic information in online bibliographies. This results is a large-scale transnational bibliography containing affiliation information over time for roughly one million authors. Analyzing this dataset reveals the importance of understanding count data. Although counting is done literally everywhere, mainstream PGMs have widely been neglecting count data. In the case where the ranges of the random variables are defined over the natural numbers, crude approximations to the true distribution are often made by discretization or a Gaussian assumption. To handle count data, Poisson Dependency Networks (PDNs) are introduced which presents a new class of non-standard PGMs naturally handling count data
Sublinear Computation Paradigm
This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms
IoT and Smart Cities: Modelling and Experimentation
Internet of Things (IoT) is a recent paradigm that envisions a near future, in which
the objects of everyday life will communicate with one another and with the users,
becoming an integral part of the Internet. The application of the IoT paradigm to
an urban context is of particular interest, as it responds to the need to adopt ICT
solutions in the city management, thus realizing the Smart City concept.
Creating IoT and Smart City platforms poses many issues and challenges. Building
suitable solutions that guarantee an interoperability of platform nodes and easy
access, requires appropriate tools and approaches that allow to timely understand
the effectiveness of solutions. This thesis investigates the above mentioned issues
through two methodological approaches: mathematical modelling and experimenta-
tion. On one hand, a mathematical model for multi-hop networks based on semi-
Markov chains is presented, allowing to properly capture the behaviour of each node
in the network while accounting for the dependencies among all links. On the other
hand, a methodology for spatial downscaling of testbeds is proposed, implemented,
and then exploited for experimental performance evaluation of proprietary but also
standardised protocol solutions, considering smart lighting and smart building scenarios.
The proposed downscaling procedure allows to create an indoor well-accessible
testbed, such that experimentation conditions and performance on this testbed closely
match the typical operating conditions and performance where the final solutions are
expected to be deployed
- …