14,025 research outputs found
Estimation of the density of regression errors
Estimation of the density of regression errors is a fundamental issue in
regression analysis and it is typically explored via a parametric approach.
This article uses a nonparametric approach with the mean integrated squared
error (MISE) criterion. It solves a long-standing problem, formulated two
decades ago by Mark Pinsker, about estimation of a nonparametric error density
in a nonparametric regression setting with the accuracy of an oracle that knows
the underlying regression errors. The solution implies that, under a mild
assumption on the differentiability of the design density and regression
function, the MISE of a data-driven error density estimator attains minimax
rates and sharp constants known for the case of directly observed regression
errors. The result holds for error densities with finite and infinite supports.
Some extensions of this result for more general heteroscedastic models with
possibly dependent errors and predictors are also obtained; in the latter case
the marginal error density is estimated. In all considered cases a
blockwise-shrinking Efromovich--Pinsker density estimate, based on plugged-in
residuals, is used. The obtained results imply a theoretical justification of a
customary practice in applied regression analysis to consider residuals as
proxies for underlying regression errors. Numerical and real examples are
presented and discussed, and the S-PLUS software is available.Comment: Published at http://dx.doi.org/10.1214/009053605000000435 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The essence of component-based design and coordination
Is there a characteristic of coordination languages that makes them
qualitatively different from general programming languages and deserves special
academic attention? This report proposes a nuanced answer in three parts. The
first part highlights that coordination languages are the means by which
composite software applications can be specified using components that are only
available separately, or later in time, via standard interfacing mechanisms.
The second part highlights that most currently used languages provide
mechanisms to use externally provided components, and thus exhibit some
elements of coordination. However not all do, and the availability of an
external interface thus forms an objective and qualitative criterion that
distinguishes coordination. The third part argues that despite the qualitative
difference, the segregation of academic attention away from general language
design and implementation has non-obvious cost trade-offs.Comment: 8 pages, 2 figures, 3 table
What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing
Driven by new software development processes and testing in clouds, system
and integration testing nowadays tends to produce enormous number of alarms.
Such test alarms lay an almost unbearable burden on software testing engineers
who have to manually analyze the causes of these alarms. The causes are
critical because they decide which stakeholders are responsible to fix the bugs
detected during the testing. In this paper, we present a novel approach that
aims to relieve the burden by automating the procedure. Our approach, called
Cause Analysis Model, exploits information retrieval techniques to efficiently
infer test alarm causes based on test logs. We have developed a prototype and
evaluated our tool on two industrial datasets with more than 14,000 test
alarms. Experiments on the two datasets show that our tool achieves an accuracy
of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by
up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per
cause analysis. Due to the attractive experimental results, our industrial
partner, a leading information and communication technology company in the
world, has deployed the tool and it achieves an average accuracy of 72% after
two months of running, nearly three times more accurate than a previous
strategy based on regular expressions.Comment: 12 page
Spatial Weighting Matrix Selection in Spatial Lag Econometric Model
This paper investigates the choice of spatial weighting matrix in a spatial lag model framework. In the empirical literature the choice of spatial weighting matrix has been characterized by a great deal of arbitrariness. The number of possible spatial weighting matrices is large, which until recently was considered to prevent investigation into the appropriateness of the empirical choices. Recently Kostov (2010) proposed a new approach that transforms the problem into an equivalent variable selection problem. This article expands the latter transformation approach into a two-step selection procedure. The proposed approach aims at reducing the arbitrariness in the selection of spatial weighting matrix in spatial econometrics. This allows for a wide range of variable selection methods to be applied to the high dimensional problem of selection of spatial weighting matrix. The suggested approach consists of a screening step that reduces the number of candidate spatial weighting matrices followed by an estimation step selecting the final model. An empirical application of the proposed methodology is presented. In the latter a range of different combinations of screening and estimation methods are employed and found to produce similar results. The proposed methodology is shown to be able to approximate and provide indications to what the ‘true’ spatial weighting matrix could be even when it is not amongst the considered alternatives. The similarity in results obtained using different methods suggests that their relative computational costs could be primary reasons for their choice. Some further extensions and applications are also discussed
The Oracle Problem When Testing from MSCs
Message Sequence Charts (MSCs) form a popular language in which scenario-based specifications and models can be written. There has been significant interest in automating aspects of testing from MSCs. This paper concerns the Oracle Problem, in which we have an observation made in testing and wish to know whether this is consistent with the specification. We assume that there is an MSC specification and consider the case where we have entirely independent local testers (local observability) and where the observations of the local testers are logged and brought together (tester observability). It transpires that under local observability the Oracle Problem can be solved in low-order polynomial time if we use sequencing, loops and choices but becomes NP-complete if we also allow parallel components; if we place a bound on the number of parallel components then it again can be solved in polynomial time. For tester observability, the problem is NP-complete when we have either loops or choices. However, it can be solved in low-order polynomial time if we have only one loop, no choices, and no parallel components. If we allow parallel components then the Oracle Problem is NP-complete for tester observability even if we restrict to the case where there are at most two processes
UK utility data integration: overcoming schematic heterogeneity
In this paper we discuss syntactic, semantic and schematic issues which inhibit the integration of utility data in the UK. We then focus on the techniques employed within the VISTA project to overcome schematic heterogeneity. A Global
Schema based architecture is employed. Although automated approaches to Global Schema definition were attempted
the heterogeneities of the sector were too great. A manual approach to Global Schema definition was employed. The
techniques used to define and subsequently map source utility data models to this schema are discussed in detail. In order to ensure a coherent integrated model, sub and cross domain validation issues are then highlighted. Finally the proposed framework and data flow for schematic integration is introduced
Structure learning of antiferromagnetic Ising models
In this paper we investigate the computational complexity of learning the
graph structure underlying a discrete undirected graphical model from i.i.d.
samples. We first observe that the notoriously difficult problem of learning
parities with noise can be captured as a special case of learning graphical
models. This leads to an unconditional computational lower bound of for learning general graphical models on nodes of maximum degree
, for the class of so-called statistical algorithms recently introduced by
Feldman et al (2013). The lower bound suggests that the runtime
required to exhaustively search over neighborhoods cannot be significantly
improved without restricting the class of models.
Aside from structural assumptions on the graph such as it being a tree,
hypertree, tree-like, etc., many recent papers on structure learning assume
that the model has the correlation decay property. Indeed, focusing on
ferromagnetic Ising models, Bento and Montanari (2009) showed that all known
low-complexity algorithms fail to learn simple graphs when the interaction
strength exceeds a number related to the correlation decay threshold. Our
second set of results gives a class of repelling (antiferromagnetic) models
that have the opposite behavior: very strong interaction allows efficient
learning in time . We provide an algorithm whose performance
interpolates between and depending on the strength of the
repulsion.Comment: 15 pages. NIPS 201
- …