178,507 research outputs found

    Streaming complexity of CSPs with randomly ordered constraints

    Full text link
    We initiate a study of the streaming complexity of constraint satisfaction problems (CSPs) when the constraints arrive in a random order. We show that there exists a CSP, namely Max-DICUT\textsf{Max-DICUT}, for which random ordering makes a provable difference. Whereas a 4/90.4454/9 \approx 0.445 approximation of DICUT\textsf{DICUT} requires Ω(n)\Omega(\sqrt{n}) space with adversarial ordering, we show that with random ordering of constraints there exists a 0.480.48-approximation algorithm that only needs O(logn)O(\log n) space. We also give new algorithms for Max-DICUT\textsf{Max-DICUT} in variants of the adversarial ordering setting. Specifically, we give a two-pass O(logn)O(\log n) space 0.480.48-approximation algorithm for general graphs and a single-pass O~(n)\tilde{O}(\sqrt{n}) space 0.480.48-approximation algorithm for bounded degree graphs. On the negative side, we prove that CSPs where the satisfying assignments of the constraints support a one-wise independent distribution require Ω(n)\Omega(\sqrt{n})-space for any non-trivial approximation, even when the constraints are randomly ordered. This was previously known only for adversarially ordered constraints. Extending the results to randomly ordered constraints requires switching the hard instances from a union of random matchings to simple Erd\"os-Renyi random (hyper)graphs and extending tools that can perform Fourier analysis on such instances. The only CSP to have been considered previously with random ordering is Max-CUT\textsf{Max-CUT} where the ordering is not known to change the approximability. Specifically it is known to be as hard to approximate with random ordering as with adversarial ordering, for o(n)o(\sqrt{n}) space algorithms. Our results show a richer variety of possibilities and motivate further study of CSPs with randomly ordered constraints

    Gamma-based clustering via ordered means with application to gene-expression analysis

    Full text link
    Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite catalog of structures, each one representing equality and inequality constraints among latent expected values. Computations depend on the probability that independent gamma-distributed variables attain each of their possible orderings. Each ordering event is equivalent to an event in independent negative-binomial random variables, and this finding guides a dynamic-programming calculation. The structuring of mixture-model components according to constraints among latent means leads to strict concavity of the mixture log likelihood. In addition to its beneficial numerical properties, the clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the Approximability of Digraph Ordering

    Full text link
    Given an n-vertex digraph D = (V, A) the Max-k-Ordering problem is to compute a labeling :V[k]\ell : V \to [k] maximizing the number of forward edges, i.e. edges (u,v) such that \ell(u) < \ell(v). For different values of k, this reduces to Maximum Acyclic Subgraph (k=n), and Max-Dicut (k=2). This work studies the approximability of Max-k-Ordering and its generalizations, motivated by their applications to job scheduling with soft precedence constraints. We give an LP rounding based 2-approximation algorithm for Max-k-Ordering for any k={2,..., n}, improving on the known 2k/(k-1)-approximation obtained via random assignment. The tightness of this rounding is shown by proving that for any k={2,..., n} and constant ε>0\varepsilon > 0, Max-k-Ordering has an LP integrality gap of 2 - ε\varepsilon for nΩ(1/loglogk)n^{\Omega\left(1/\log\log k\right)} rounds of the Sherali-Adams hierarchy. A further generalization of Max-k-Ordering is the restricted maximum acyclic subgraph problem or RMAS, where each vertex v has a finite set of allowable labels SvZ+S_v \subseteq \mathbb{Z}^+. We prove an LP rounding based 42/(2+1)2.3444\sqrt{2}/(\sqrt{2}+1) \approx 2.344 approximation for it, improving on the 222.8282\sqrt{2} \approx 2.828 approximation recently given by Grandoni et al. (Information Processing Letters, Vol. 115(2), Pages 182-185, 2015). In fact, our approximation algorithm also works for a general version where the objective counts the edges which go forward by at least a positive offset specific to each edge. The minimization formulation of digraph ordering is DAG edge deletion or DED(k), which requires deleting the minimum number of edges from an n-vertex directed acyclic graph (DAG) to remove all paths of length k. We show that both, the LP relaxation and a local ratio approach for DED(k) yield k-approximation for any k[n]k\in [n].Comment: 21 pages, Conference version to appear in ESA 201

    It's Good to Be First: Order Bias in Reading and Citing NBER Working Papers

    Get PDF
    When choices are made from ordered lists, individuals can exhibit biases toward selecting certain options as a result of the ordering. We examine this phenomenon in the context of consumer response to the ordering of economics papers in an e-mail announcement issued by the NBER. We show that despite the effectively random list placement, papers listed first each week are about 30% more likely to be viewed, downloaded, and subsequently cited. We suggest that a model of “skimming” behavior, where individuals focus on the first few papers in the list due to time constraints, would be most consistent with our findings

    An Approximately Optimal Algorithm for Scheduling Phasor Data Transmissions in Smart Grid Networks

    Full text link
    In this paper, we devise a scheduling algorithm for ordering transmission of synchrophasor data from the substation to the control center in as short a time frame as possible, within the realtime hierarchical communications infrastructure in the electric grid. The problem is cast in the framework of the classic job scheduling with precedence constraints. The optimization setup comprises the number of phasor measurement units (PMUs) to be installed on the grid, a weight associated with each PMU, processing time at the control center for the PMUs, and precedence constraints between the PMUs. The solution to the PMU placement problem yields the optimum number of PMUs to be installed on the grid, while the processing times are picked uniformly at random from a predefined set. The weight associated with each PMU and the precedence constraints are both assumed known. The scheduling problem is provably NP-hard, so we resort to approximation algorithms which provide solutions that are suboptimal yet possessing polynomial time complexity. A lower bound on the optimal schedule is derived using branch and bound techniques, and its performance evaluated using standard IEEE test bus systems. The scheduling policy is power grid-centric, since it takes into account the electrical properties of the network under consideration.Comment: 8 pages, published in IEEE Transactions on Smart Grid, October 201

    Exploiting Spatial Code Proximity and Order for Improved Source Code Retrieval for Bug Localization

    Get PDF
    Abstract—Practically all Information Retrieval (IR) based approaches developed to date for automatic bug localization are based on the bag-of-words assumption that ignores any positional and ordering relationships between the terms in a query. In this paper we argue that bug reports are ill-served by this assumption since such reports frequently contain various types of structural information whose terms must obey certain positional and ordering constraints. It therefore stands to reason that the quality of retrieval for bug localization would improve if these constraints could be taken into account when searching for the most relevant files. In this paper, we demonstrate that such is indeed the case. We show how the well-known Markov Random Field (MRF) based retrieval framework can be used for taking into account the term-term proximity and ordering relationships in a query vis-a-vis the same relationships in the files of a source-code library to greatly improve the quality of retrieval of the most relevant source files. We have carried out our experimental evaluations on popular large software projects using over 4 thousand bug reports. The results we present demonstrate unequivocally that the new proposed approach is far superior to the widely used bag-of-words based approaches

    Improved Parameterized Algorithms for Constraint Satisfaction

    Full text link
    For many constraint satisfaction problems, the algorithm which chooses a random assignment achieves the best possible approximation ratio. For instance, a simple random assignment for {\sc Max-E3-Sat} allows 7/8-approximation and for every \eps >0 there is no polynomial-time (7/8+\eps)-approximation unless P=NP. Another example is the {\sc Permutation CSP} of bounded arity. Given the expected fraction ρ\rho of the constraints satisfied by a random assignment (i.e. permutation), there is no (\rho+\eps)-approximation algorithm for every \eps >0, assuming the Unique Games Conjecture (UGC). In this work, we consider the following parameterization of constraint satisfaction problems. Given a set of mm constraints of constant arity, can we satisfy at least ρm+k\rho m +k constraint, where ρ\rho is the expected fraction of constraints satisfied by a random assignment? {\sc Constraint Satisfaction Problems above Average} have been posed in different forms in the literature \cite{Niedermeier2006,MahajanRamanSikdar09}. We present a faster parameterized algorithm for deciding whether m/2+k/2m/2+k/2 equations can be simultaneously satisfied over F2{\mathbb F}_2. As a consequence, we obtain O(k)O(k)-variable bikernels for {\sc boolean CSPs} of arity cc for every fixed cc, and for {\sc permutation CSPs} of arity 3. This implies linear bikernels for many problems under the "above average" parameterization, such as {\sc Max-cc-Sat}, {\sc Set-Splitting}, {\sc Betweenness} and {\sc Max Acyclic Subgraph}. As a result, all the parameterized problems we consider in this paper admit 2O(k)2^{O(k)}-time algorithms. We also obtain non-trivial hybrid algorithms for every Max cc-CSP: for every instance II, we can either approximate II beyond the random assignment threshold in polynomial time, or we can find an optimal solution to II in subexponential time.Comment: A preliminary version of this paper has been accepted for IPEC 201

    Stochastic Ordering under Conditional Modelling of Extreme Values: Drug-Induced Liver Injury

    Full text link
    Drug-induced liver injury (DILI) is a major public health issue and of serious concern for the pharmaceutical industry. Early detection of signs of a drug's potential for DILI is vital for pharmaceutical companies' evaluation of new drugs. A combination of extreme values of liver specific variables indicate potential DILI (Hy's Law). We estimate the probability of severe DILI using the Heffernan and Tawn (2004) conditional dependence model which arises naturally in applications where a multidimensional random variable is extreme in at least one component. We extend the current model by including the assumption of stochastically ordered survival curves for different doses in a Phase 3 study.Comment: 24 pages, 5 figure
    corecore