2,374 research outputs found
Evolving concurrent Petri net models of epistasis
A genetic algorithm is used to learn a non-deterministic Petri netbased model of non-linear gene interactions, or statistical epistasis. Petri nets are computational models of concurrent processes. However, often certain global assumptions (e.g. transition priorities) are required in order to convert a non-deterministic Petri net into a simpler deterministic model for easier analysis and evaluation. We show, by converting a Petri net into a set of state trees, that it is possible to both retain Petri net non-determinism (i.e. allowing local interactions only, thereby making the model more realistic), whilst also learning useful Petri nets with practical applications. Our Petri nets produce predictions of genetic disease risk assessments derived from clinical data that match with over 92% accuracy
Modelling epistasis in genetic disease using Petri nets, evolutionary computation and frequent itemset mining
Petri nets are useful for mathematically modelling disease-causing genetic epistasis. A Petri net model of an interaction has the potential to lead to biological insight into the cause of a genetic disease. However, defining a Petri net by hand for a particular interaction is extremely difficult because of the sheer complexity of the problem and degrees of freedom inherent in a Petri netâs architecture.
We propose therefore a novel method, based on evolutionary computation and data mining, for automatically constructing Petri net models of non-linear gene interactions. The method comprises two main steps. Firstly, an initial partial Petri net is set up with several repeated sub-nets that model individual genes and a set of constraints, comprising relevant common sense and biological knowledge, is also defined. These constraints characterise the class of Petri nets that are desired. Secondly, this initial Petri net structure and the constraints are used as the input to a genetic algorithm. The genetic algorithm searches for a Petri net architecture that is both a superset of the initial net, and also conforms to all of the given constraints. The genetic algorithm evaluation function that we employ gives equal weighting to both the accuracy of the net and also its parsimony.
We demonstrate our method using an epistatic model related to the presence of digital ulcers in systemic sclerosis patients that was recently reported in the literature. Our results show that although individual âperfectâ Petri nets can frequently be discovered for this interaction, the true value of this approach lies in generating many different perfect nets, and applying data mining techniques to them in order to elucidate common and statistically significant patterns of interaction
Extension of the survival dimensionality reduction algorithm to detect epistasis in competing risks models (SDR-CR)
AbstractBackgroundThe discovery and the description of the genetic background of common human diseases is hampered by their complexity and dynamic behavior. Appropriate bioinformatic tools are needed to account all the facets of complex diseases and to this end we recently described the survival dimensionality reduction (SDR) algorithm in the effort to model geneâgene interactions in the context of survival analysis. When one event precludes the occurrence of another event under investigation in the âcompeting risk modelâ, survival algorithms require particular adjustment to avoid the risk of reporting wrong or biased conclusions.MethodsThe SDR algorithm was modified to incorporate the cumulative incidence function as well as an adapted version of the Brier score for mutually exclusive outcomes, to better search for epistatic models in the competing risk setting. The applicability of the new SDR algorithm (SDR-CR) was evaluated using synthetic lifetime epistatic datasets with competing risks and on a dataset of scleroderma patients.Results/conclusionsThe SDR-CR algorithms retains a satisfactory power to detect the causative variants in simulated datasets under different scenarios of sample size and degrees of type I or type II censoring. In the real-world dataset, SDR-CR was capable of detecting a significant interaction between the IL-1Îą C-889T and the IL-1β C-511T single-nucleotide polymorphisms to predict the occurrence of restrictive lung disease vs. isolated pulmonary hypertension.We provide an useful extension of the SDR algorithm to analyze epistatic interactions in the competing risk settings that may be of use to unveil the genetic background of complex human diseases. Availability: http://sourceforge.net/projects/sdrproject/files/
Abatacept to treat chronic intestinal pseudo-obstruction in five systemic sclerosis patients with a description of the index case:
Chronic intestinal pseudo-obstruction is a severe complication of systemic sclerosis. Inflammatory neuropathy and immunological alterations have a prominent role in the development of systemic scle..
Online Packing to Minimize Area or Perimeter
We consider online packing problems where we get a stream of axis-parallel rectangles. The rectangles have to be placed in the plane without overlapping, and each rectangle must be placed without knowing the subsequent rectangles. The goal is to minimize the perimeter or the area of the axis-parallel bounding box of the rectangles. We either allow rotations by 90^? or translations only.
For the perimeter version we give algorithms with an absolute competitive ratio slightly less than 4 when only translations are allowed and when rotations are also allowed.
We then turn our attention to minimizing the area and show that the competitive ratio of any algorithm is at least ?(?n), where n is the number of rectangles in the stream, and this holds with and without rotations. We then present algorithms that match this bound in both cases and the competitive ratio is thus optimal to within a constant factor. We also show that the competitive ratio cannot be bounded as a function of Opt. We then consider two special cases.
The first is when all the given rectangles have aspect ratios bounded by some constant. The particular variant where all the rectangles are squares and we want to minimize the area of the bounding square has been studied before and an algorithm with a competitive ratio of 8 has been given [Fekete and Hoffmann, Algorithmica, 2017]. We improve the analysis of the algorithm and show that the ratio is at most 6, which is tight.
The second special case is when all edges have length at least 1. Here, the ?(?n) lower bound still holds, and we turn our attention to lower bounds depending on Opt. We show that any algorithm for the translational case has a competitive ratio of at least ?(?{Opt}). If rotations are allowed, we show a lower bound of ?(?{Opt}). For both versions, we give algorithms that match the respective lower bounds: With translations only, this is just the algorithm from the general case with competitive ratio O(?n) = O(?{Opt}). If rotations are allowed, we give an algorithm with competitive ratio O(min{?n,?{Opt}}), thus matching both lower bounds simultaneously
Approximate Earth Mover's Distance in Truly-Subquadratic Time
We design an additive approximation scheme for estimating the cost of the
min-weight bipartite matching problem: given a bipartite graph with
non-negative edge costs and , our algorithm estimates the cost
of matching all but -fraction of the vertices in truly
subquadratic time .
Our algorithm has a natural interpretation for computing the Earth Mover's
Distance (EMD), up to a -additive approximation. Notably, we make
no assumptions about the underlying metric (more generally, the costs do not
have to satisfy triangle inequality). Note that compared to the size of the
instance (an arbitrary cost matrix), our algorithm runs in {\em
sublinear} time.
Our algorithm can approximate a slightly more general problem:
max-cardinality bipartite matching with a knapsack constraint, where the goal
is to maximize the number of vertices that can be matched up to a total cost
optical method to measure mesh tensioning
Abstract The present paper presents a method to estimate the tensional status of a knitted mesh. To reach this result, the relationship between the frequencies of vibration, recorded by a high-sampling camera and analysed through image processing, and different tensioning on the mesh itself, has been investigated. After having conducted several tests, all the collected pairs frequency-tensional status have been used to extrapolate an optimal (in a least-squares sense) correlation between frequency of vibration and tension of the mesh
Multi-Swap -Means++
The -means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often
the practitioners' choice algorithm for optimizing the popular -means
clustering objective and is known to give an -approximation in
expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML
2019) proposed augmenting -means++ with local search
steps obtained through the -means++ sampling distribution to yield a
-approximation to the -means clustering problem, where is a large
absolute constant. Here we generalize and extend their local search algorithm
by considering larger and more sophisticated local search neighborhoods hence
allowing to swap multiple centers at the same time. Our algorithm achieves a approximation ratio, which is the best possible for local
search. Importantly we show that our approach yields substantial practical
improvements, we show significant quality improvements over the approach of
Lattanzi and Sohler (ICML 2019) on several datasets.Comment: NeurIPS 202
Locally Uniform Hashing
Hashing is a common technique used in data processing, with a strong impact
on the time and resources spent on computation. Hashing also affects the
applicability of theoretical results that often assume access to (unrealistic)
uniform/fully-random hash functions. In this paper, we are concerned with
designing hash functions that are practical and come with strong theoretical
guarantees on their performance.
To this end, we present tornado tabulation hashing, which is simple, fast,
and exhibits a certain full, local randomness property that provably makes
diverse algorithms perform almost as if (abstract) fully-random hashing was
used. For example, this includes classic linear probing, the widely used
HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA 97] for
counting distinct elements, and the one-permutation hashing of Li, Owen, and
Zhang [NIPS 12] for large-scale machine learning. We also provide a very
efficient solution for the classical problem of obtaining fully-random hashing
on a fixed (but unknown to the hash function) set of keys using
space. As a consequence, we get more efficient implementations of the splitting
trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform
hashing of Pagh and Pagh [SICOMP'08].
Tornado tabulation hashing is based on a simple method to systematically
break dependencies in tabulation-based hashing techniques.Comment: FOCS 202
- âŚ