14,025 research outputs found

    Estimation of the density of regression errors

    Full text link
    Estimation of the density of regression errors is a fundamental issue in regression analysis and it is typically explored via a parametric approach. This article uses a nonparametric approach with the mean integrated squared error (MISE) criterion. It solves a long-standing problem, formulated two decades ago by Mark Pinsker, about estimation of a nonparametric error density in a nonparametric regression setting with the accuracy of an oracle that knows the underlying regression errors. The solution implies that, under a mild assumption on the differentiability of the design density and regression function, the MISE of a data-driven error density estimator attains minimax rates and sharp constants known for the case of directly observed regression errors. The result holds for error densities with finite and infinite supports. Some extensions of this result for more general heteroscedastic models with possibly dependent errors and predictors are also obtained; in the latter case the marginal error density is estimated. In all considered cases a blockwise-shrinking Efromovich--Pinsker density estimate, based on plugged-in residuals, is used. The obtained results imply a theoretical justification of a customary practice in applied regression analysis to consider residuals as proxies for underlying regression errors. Numerical and real examples are presented and discussed, and the S-PLUS software is available.Comment: Published at http://dx.doi.org/10.1214/009053605000000435 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The essence of component-based design and coordination

    Full text link
    Is there a characteristic of coordination languages that makes them qualitatively different from general programming languages and deserves special academic attention? This report proposes a nuanced answer in three parts. The first part highlights that coordination languages are the means by which composite software applications can be specified using components that are only available separately, or later in time, via standard interfacing mechanisms. The second part highlights that most currently used languages provide mechanisms to use externally provided components, and thus exhibit some elements of coordination. However not all do, and the availability of an external interface thus forms an objective and qualitative criterion that distinguishes coordination. The third part argues that despite the qualitative difference, the segregation of academic attention away from general language design and implementation has non-obvious cost trade-offs.Comment: 8 pages, 2 figures, 3 table

    What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing

    Full text link
    Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.Comment: 12 page

    Spatial Weighting Matrix Selection in Spatial Lag Econometric Model

    Get PDF
    This paper investigates the choice of spatial weighting matrix in a spatial lag model framework. In the empirical literature the choice of spatial weighting matrix has been characterized by a great deal of arbitrariness. The number of possible spatial weighting matrices is large, which until recently was considered to prevent investigation into the appropriateness of the empirical choices. Recently Kostov (2010) proposed a new approach that transforms the problem into an equivalent variable selection problem. This article expands the latter transformation approach into a two-step selection procedure. The proposed approach aims at reducing the arbitrariness in the selection of spatial weighting matrix in spatial econometrics. This allows for a wide range of variable selection methods to be applied to the high dimensional problem of selection of spatial weighting matrix. The suggested approach consists of a screening step that reduces the number of candidate spatial weighting matrices followed by an estimation step selecting the final model. An empirical application of the proposed methodology is presented. In the latter a range of different combinations of screening and estimation methods are employed and found to produce similar results. The proposed methodology is shown to be able to approximate and provide indications to what the ‘true’ spatial weighting matrix could be even when it is not amongst the considered alternatives. The similarity in results obtained using different methods suggests that their relative computational costs could be primary reasons for their choice. Some further extensions and applications are also discussed

    The Oracle Problem When Testing from MSCs

    Get PDF
    Message Sequence Charts (MSCs) form a popular language in which scenario-based specifications and models can be written. There has been significant interest in automating aspects of testing from MSCs. This paper concerns the Oracle Problem, in which we have an observation made in testing and wish to know whether this is consistent with the specification. We assume that there is an MSC specification and consider the case where we have entirely independent local testers (local observability) and where the observations of the local testers are logged and brought together (tester observability). It transpires that under local observability the Oracle Problem can be solved in low-order polynomial time if we use sequencing, loops and choices but becomes NP-complete if we also allow parallel components; if we place a bound on the number of parallel components then it again can be solved in polynomial time. For tester observability, the problem is NP-complete when we have either loops or choices. However, it can be solved in low-order polynomial time if we have only one loop, no choices, and no parallel components. If we allow parallel components then the Oracle Problem is NP-complete for tester observability even if we restrict to the case where there are at most two processes

    UK utility data integration: overcoming schematic heterogeneity

    Get PDF
    In this paper we discuss syntactic, semantic and schematic issues which inhibit the integration of utility data in the UK. We then focus on the techniques employed within the VISTA project to overcome schematic heterogeneity. A Global Schema based architecture is employed. Although automated approaches to Global Schema definition were attempted the heterogeneities of the sector were too great. A manual approach to Global Schema definition was employed. The techniques used to define and subsequently map source utility data models to this schema are discussed in detail. In order to ensure a coherent integrated model, sub and cross domain validation issues are then highlighted. Finally the proposed framework and data flow for schematic integration is introduced

    Structure learning of antiferromagnetic Ising models

    Full text link
    In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of Ω(pd/2)\Omega (p^{d/2}) for learning general graphical models on pp nodes of maximum degree dd, for the class of so-called statistical algorithms recently introduced by Feldman et al (2013). The lower bound suggests that the O(pd)O(p^d) runtime required to exhaustively search over neighborhoods cannot be significantly improved without restricting the class of models. Aside from structural assumptions on the graph such as it being a tree, hypertree, tree-like, etc., many recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari (2009) showed that all known low-complexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the opposite behavior: very strong interaction allows efficient learning in time O(p2)O(p^2). We provide an algorithm whose performance interpolates between O(p2)O(p^2) and O(pd+2)O(p^{d+2}) depending on the strength of the repulsion.Comment: 15 pages. NIPS 201
    • …
    corecore