Search CORE

10,925 research outputs found

Optimised determinisation and completion of finite tree automata

Author: Balland
Bouajjani
Bouajjani
Bryant
Carrasco
Comon
Durand
Elgaard
Feuillade
Gallagher
Gallagher
Genet
Hosoya
Kafle
Klarlund
Lengál
Monniaux
Suda
Ullman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Determinisation and completion of finite tree automata are important operations with applications in program analysis and verification. However, the complexity of the classical procedures for determinisation and completion is high. They are not practical procedures for manipulating tree automata beyond very small ones. In this paper we develop an algorithm for determinisation and completion of finite tree automata, whose worst-case complexity remains unchanged, but which performs far better than existing algorithms in practice. The critical aspect of the algorithm is that the transitions of the determinised (and possibly completed) automaton are generated in a potentially very compact form called product form, which can reduce the size of the representation dramatically. Furthermore, the representation can often be used directly when manipulating the determinised automaton. The paper contains an experimental evaluation of the algorithm on a large set of tree automata examples

arXiv.org e-Print Archive

Crossref

Roskilde Universitet

The IT University of Copenhagen's Repository

The OTree: multidimensional indexing with efficient data sampling for HPC

Author: Becerra Fontal Yolanda
Calmet Hadrien
Cugnasco Cesare
Eguzkitza Ane Beatriz
Houzeaux Guillaume
Labarta Mancho Jesús José
Santamaria Mateu Pol
Sirvent Pardell Raül
Torres Viñals Jordi
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2019
Field of study

Spatial big data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of petabytes of spatial data per year. However, many authors have pointed out that the lack of specialized frameworks for multidimensional Big Data is limiting possible applications and precluding many scientific breakthroughs. Paramount in achieving High-Performance Data Analytics is to optimize and reduce the I/O operations required to analyze large data sets. To do so, we need to organize and index the data according to its multidimensional attributes. At the same time, to enable fast and interactive exploratory analysis, it is vital to generate approximate representations of large datasets efficiently. In this paper, we propose the Outlook Tree (or OTree), a novel Multidimensional Indexing with efficient data Sampling (MIS) algorithm. The OTree enables exploratory analysis of large multidimensional datasets with arbitrary precision, a vital missing feature in current distributed data management solutions. Our algorithm reduces the indexing overhead and achieves high performance even for write-intensive HPC applications. Indeed, we use the OTree to store the scientific results of a study on the efficiency of drug inhalers. Then we compare the OTree implementation on Apache Cassandra, named Qbeast, with PostgreSQL and plain storage. Lastly, we demonstrate that our proposal delivers better performance and scalability.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

On the Complexity of Constrained Determinantal Point Processes

Author: Celis L. Elisa
Deshpande Amit
Kathuria Tarun
Straszak Damian
Vishnoi Nisheeth K.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)
Publication date: 01/01/2017
Field of study

Determinantal Point Processes (DPPs) are probabilistic models that arise in quantum physics and random matrix theory and have recently found numerous applications in theoretical computer science and machine learning. DPPs define probability distributions over subsets of a given ground set, they exhibit interesting properties such as negative correlation, and, unlike other models of negative correlation such as Markov random fields, have efficient algorithms for sampling. When applied to kernel methods in machine learning, DPPs favor subsets of the given data with more diverse features. However, many real-world applications require efficient algorithms to sample from DPPs with additional constraints on the sampled subset, e.g., partition or matroid constraints that are important from the viewpoint of ensuring priors, resource or fairness constraints on the sampled subset. Whether one can efficiently sample from DPPs in such constrained settings is an important problem that was first raised in a survey of DPPs for machine learning by Kulesza and Taskar and studied in some recent works. The main contribution of this paper is the first resolution of the complexity of sampling from DPPs with constraints. On the one hand, we give exact efficient algorithms for sampling from constrained DPPs when the description of the constraints is in unary; this includes special cases of practical importance such as a small number of partition, knapsack or budget constraints. On the other hand, we prove that when the constraints are specified in binary, this problem is #P-hard via a reduction from the problem of computing mixed discriminants; implying that it may be unlikely that there is an FPRAS. Technically, our algorithmic result benefits from viewing the constrained sampling problem via the lens of polynomials and we obtain our complexity results by providing an equivalence between computing mixed discriminants and sampling from partition constrained DPPs. As a consequence, we obtain a few corollaries of independent interest: 1) An algorithm to count, sample (and, hence, optimize) over the base polytope of regular matroids when there are additional (succinct) budget constraints and, 2) An algorithm to evaluate and compute mixed characteristic polynomials, that played a central role in the resolution of the Kadison-Singer problem, for certain special cases

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server