65,319 research outputs found
Asymptotic Error Free Partitioning over Noisy Boolean Multiaccess Channels
In this paper, we consider the problem of partitioning active users in a
manner that facilitates multi-access without collision. The setting is of a
noisy, synchronous, Boolean, multi-access channel where active users (out
of a total of users) seek to access. A solution to the partition problem
places each of the users in one of groups (or blocks) such that no two
active nodes are in the same block. We consider a simple, but non-trivial and
illustrative case of active users and study the number of steps used
to solve the partition problem. By random coding and a suboptimal decoding
scheme, we show that for any , where and
are positive constants (independent of ), and can be
arbitrary small, the partition problem can be solved with error probability
, for large . Under the same scheme, we also bound from
the other direction, establishing that, for any ,
the error probability for large ; again and
are constants and can be arbitrarily small. These bounds on the number
of steps are lower than the tight achievable lower-bound in terms of for group testing (in which all active users are identified,
rather than just partitioned). Thus, partitioning may prove to be a more
efficient approach for multi-access than group testing.Comment: This paper was submitted in June 2014 to IEEE Transactions on
Information Theory, and is under review no
An effective and efficient testing methodology for correctness testing for file recovery tools
We hereby develop an effective and efficient testing methodology for correctness testing for file recovery tools across different file systems. We assume that the tool tester is familiar with the formats of common file types and has the ability to use the tools correctly. Our methodology first derives a testing plan to minimize the number of runs required to identify the differences in tools with respect to correctness. We also present a case study on correctness testing for file carving tools, which allows us to confirm that the number of necessary testing runs is bounded and our results are statistically sound. <br /
A Cost-based Optimizer for Gradient Descent Optimization
As the use of machine learning (ML) permeates into diverse application
domains, there is an urgent need to support a declarative framework for ML.
Ideally, a user will specify an ML task in a high-level and easy-to-use
language and the framework will invoke the appropriate algorithms and system
configurations to execute it. An important observation towards designing such a
framework is that many ML tasks can be expressed as mathematical optimization
problems, which take a specific form. Furthermore, these optimization problems
can be efficiently solved using variations of the gradient descent (GD)
algorithm. Thus, to decouple a user specification of an ML task from its
execution, a key component is a GD optimizer. We propose a cost-based GD
optimizer that selects the best GD plan for a given ML task. To build our
optimizer, we introduce a set of abstract operators for expressing GD
algorithms and propose a novel approach to estimate the number of iterations a
GD algorithm requires to converge. Extensive experiments on real and synthetic
datasets show that our optimizer not only chooses the best GD plan but also
allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201
Software component testing : a standard and the effectiveness of techniques
This portfolio comprises two projects linked by the theme of software component testing, which is also
often referred to as module or unit testing. One project covers its standardisation, while the other
considers the analysis and evaluation of the application of selected testing techniques to an existing
avionics system. The evaluation is based on empirical data obtained from fault reports relating to the
avionics system.
The standardisation project is based on the development of the BC BSI Software Component Testing
Standard and the BCS/BSI Glossary of terms used in software testing, which are both included in the
portfolio. The papers included for this project consider both those issues concerned with the adopted
development process and the resolution of technical matters concerning the definition of the testing
techniques and their associated measures.
The test effectiveness project documents a retrospective analysis of an operational avionics system to
determine the relative effectiveness of several software component testing techniques. The methodology
differs from that used in other test effectiveness experiments in that it considers every possible set of
inputs that are required to satisfy a testing technique rather than arbitrarily chosen values from within
this set. The three papers present the experimental methodology used, intermediate results from a failure
analysis of the studied system, and the test effectiveness results for ten testing techniques, definitions for
which were taken from the BCS BSI Software Component Testing Standard.
The creation of the two standards has filled a gap in both the national and international software testing
standards arenas. Their production required an in-depth knowledge of software component testing
techniques, the identification and use of a development process, and the negotiation of the
standardisation process at a national level. The knowledge gained during this process has been
disseminated by the author in the papers included as part of this portfolio. The investigation of test
effectiveness has introduced a new methodology for determining the test effectiveness of software
component testing techniques by means of a retrospective analysis and so provided a new set of data that
can be added to the body of empirical data on software component testing effectiveness
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
Testing the existence of clustering in the extreme values
This paper introduces an estimator for the extremal index as the ratio of the number of elements of two point processes defined by threshold sequences un, vn and a partition of the sequence in different blocks of the same size. The first point process is defined by the sequence of the block maxima that exceed un. This paper introduces a thinning of this point process, defined by a threshold vn with vn > un, and with the appealing property that under some mild conditions the ratio of the number of elements of both point processes is a consistent estimator of the extremal index. The method supports a hypothesis test for the extremal index, and hence for testing the existence of clustering in the extreme values. Other advantages are that it allows some freedom to choose un, and it is not very sensitive to the choice of the partition. Finally, the stylized facts found in financial returns (clustering, skewness, heavy tails) are tested via the extremal index, in this case for the DaX return
Graph removal lemmas
The graph removal lemma states that any graph on n vertices with o(n^{v(H)})
copies of a fixed graph H may be made H-free by removing o(n^2) edges. Despite
its innocent appearance, this lemma and its extensions have several important
consequences in number theory, discrete geometry, graph theory and computer
science. In this survey we discuss these lemmas, focusing in particular on
recent improvements to their quantitative aspects.Comment: 35 page
Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework
The support recovery problem consists of determining a sparse subset of a set
of variables that is relevant in generating a set of observations, and arises
in a diverse range of settings such as compressive sensing, and subset
selection in regression, and group testing. In this paper, we take a unified
approach to support recovery problems, considering general probabilistic models
relating a sparse data vector to an observation vector. We study the
information-theoretic limits of both exact and partial support recovery, taking
a novel approach motivated by thresholding techniques in channel coding. We
provide general achievability and converse bounds characterizing the trade-off
between the error probability and number of measurements, and we specialize
these to the linear, 1-bit, and group testing models. In several cases, our
bounds not only provide matching scaling laws in the necessary and sufficient
number of measurements, but also sharp thresholds with matching constant
factors. Our approach has several advantages over previous approaches: For the
achievability part, we obtain sharp thresholds under broader scalings of the
sparsity level and other parameters (e.g., signal-to-noise ratio) compared to
several previous works, and for the converse part, we not only provide
conditions under which the error probability fails to vanish, but also
conditions under which it tends to one.Comment: Accepted to IEEE Transactions on Information Theory; presented in
part at ISIT 2015 and SODA 201
- …