65,319 research outputs found

    Asymptotic Error Free Partitioning over Noisy Boolean Multiaccess Channels

    Full text link
    In this paper, we consider the problem of partitioning active users in a manner that facilitates multi-access without collision. The setting is of a noisy, synchronous, Boolean, multi-access channel where KK active users (out of a total of NN users) seek to access. A solution to the partition problem places each of the NN users in one of KK groups (or blocks) such that no two active nodes are in the same block. We consider a simple, but non-trivial and illustrative case of K=2K=2 active users and study the number of steps TT used to solve the partition problem. By random coding and a suboptimal decoding scheme, we show that for any T(C1+ξ1)logNT\geq (C_1 +\xi_1)\log N, where C1C_1 and ξ1\xi_1 are positive constants (independent of NN), and ξ1\xi_1 can be arbitrary small, the partition problem can be solved with error probability Pe(N)0P_e^{(N)} \to 0, for large NN. Under the same scheme, we also bound TT from the other direction, establishing that, for any T(C2ξ2)logNT \leq (C_2 - \xi_2) \log N, the error probability Pe(N)1P_e^{(N)} \to 1 for large NN; again C2C_2 and ξ2\xi_2 are constants and ξ2\xi_2 can be arbitrarily small. These bounds on the number of steps are lower than the tight achievable lower-bound in terms of T(Cg+ξ)logNT \geq (C_g +\xi)\log N for group testing (in which all active users are identified, rather than just partitioned). Thus, partitioning may prove to be a more efficient approach for multi-access than group testing.Comment: This paper was submitted in June 2014 to IEEE Transactions on Information Theory, and is under review no

    An effective and efficient testing methodology for correctness testing for file recovery tools

    Full text link
    We hereby develop an effective and efficient testing methodology for correctness testing for file recovery tools across different file systems. We assume that the tool tester is familiar with the formats of common file types and has the ability to use the tools correctly. Our methodology first derives a testing plan to minimize the number of runs required to identify the differences in tools with respect to correctness. We also present a case study on correctness testing for file carving tools, which allows us to confirm that the number of necessary testing runs is bounded and our results are statistically sound. <br /

    A Cost-based Optimizer for Gradient Descent Optimization

    Full text link
    As the use of machine learning (ML) permeates into diverse application domains, there is an urgent need to support a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. We propose a cost-based GD optimizer that selects the best GD plan for a given ML task. To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201

    Software component testing : a standard and the effectiveness of techniques

    Get PDF
    This portfolio comprises two projects linked by the theme of software component testing, which is also often referred to as module or unit testing. One project covers its standardisation, while the other considers the analysis and evaluation of the application of selected testing techniques to an existing avionics system. The evaluation is based on empirical data obtained from fault reports relating to the avionics system. The standardisation project is based on the development of the BC BSI Software Component Testing Standard and the BCS/BSI Glossary of terms used in software testing, which are both included in the portfolio. The papers included for this project consider both those issues concerned with the adopted development process and the resolution of technical matters concerning the definition of the testing techniques and their associated measures. The test effectiveness project documents a retrospective analysis of an operational avionics system to determine the relative effectiveness of several software component testing techniques. The methodology differs from that used in other test effectiveness experiments in that it considers every possible set of inputs that are required to satisfy a testing technique rather than arbitrarily chosen values from within this set. The three papers present the experimental methodology used, intermediate results from a failure analysis of the studied system, and the test effectiveness results for ten testing techniques, definitions for which were taken from the BCS BSI Software Component Testing Standard. The creation of the two standards has filled a gap in both the national and international software testing standards arenas. Their production required an in-depth knowledge of software component testing techniques, the identification and use of a development process, and the negotiation of the standardisation process at a national level. The knowledge gained during this process has been disseminated by the author in the papers included as part of this portfolio. The investigation of test effectiveness has introduced a new methodology for determining the test effectiveness of software component testing techniques by means of a retrospective analysis and so provided a new set of data that can be added to the body of empirical data on software component testing effectiveness

    Testing probability distributions underlying aggregated data

    Full text link
    In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution DD over [n][n]. More precisely, we define both the dual and cumulative dual access models, in which the algorithm AA can both sample from DD and respectively, for any i[n]i\in[n], - query the probability mass D(i)D(i) (query access); or - get the total mass of {1,,i}\{1,\dots,i\}, i.e. j=1iD(j)\sum_{j=1}^i D(j) (cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

    Testing the existence of clustering in the extreme values

    Get PDF
    This paper introduces an estimator for the extremal index as the ratio of the number of elements of two point processes defined by threshold sequences un, vn and a partition of the sequence in different blocks of the same size. The first point process is defined by the sequence of the block maxima that exceed un. This paper introduces a thinning of this point process, defined by a threshold vn with vn > un, and with the appealing property that under some mild conditions the ratio of the number of elements of both point processes is a consistent estimator of the extremal index. The method supports a hypothesis test for the extremal index, and hence for testing the existence of clustering in the extreme values. Other advantages are that it allows some freedom to choose un, and it is not very sensitive to the choice of the partition. Finally, the stylized facts found in financial returns (clustering, skewness, heavy tails) are tested via the extremal index, in this case for the DaX return

    Graph removal lemmas

    Get PDF
    The graph removal lemma states that any graph on n vertices with o(n^{v(H)}) copies of a fixed graph H may be made H-free by removing o(n^2) edges. Despite its innocent appearance, this lemma and its extensions have several important consequences in number theory, discrete geometry, graph theory and computer science. In this survey we discuss these lemmas, focusing in particular on recent improvements to their quantitative aspects.Comment: 35 page

    Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

    Get PDF
    The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in regression, and group testing. In this paper, we take a unified approach to support recovery problems, considering general probabilistic models relating a sparse data vector to an observation vector. We study the information-theoretic limits of both exact and partial support recovery, taking a novel approach motivated by thresholding techniques in channel coding. We provide general achievability and converse bounds characterizing the trade-off between the error probability and number of measurements, and we specialize these to the linear, 1-bit, and group testing models. In several cases, our bounds not only provide matching scaling laws in the necessary and sufficient number of measurements, but also sharp thresholds with matching constant factors. Our approach has several advantages over previous approaches: For the achievability part, we obtain sharp thresholds under broader scalings of the sparsity level and other parameters (e.g., signal-to-noise ratio) compared to several previous works, and for the converse part, we not only provide conditions under which the error probability fails to vanish, but also conditions under which it tends to one.Comment: Accepted to IEEE Transactions on Information Theory; presented in part at ISIT 2015 and SODA 201
    corecore