5,905 research outputs found

    Optimality of Graphlet Screening in High Dimensional Variable Selection

    Full text link
    Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of GS is to use GOSD to guide both the screening and cleaning. Compared to m-variate brute-forth screening that has a computational cost of p^m, the GS only has a computational cost of p (up to some multi-log(p) factors) in screening. We measure the performance of any variable selection procedure by the minimax Hamming distance. We show that in a very broad class of situations, GS achieves the optimal rate of convergence in terms of the Hamming distance. Somewhat surprisingly, the well-known procedures subset selection and the lasso are rate non-optimal, even in very simple settings and even when their tuning parameters are ideally set

    The geometric mean is a Bernstein function

    Full text link
    In the paper, the authors establish, by using Cauchy integral formula in the theory of complex functions, an integral representation for the geometric mean of nn positive numbers. From this integral representation, the geometric mean is proved to be a Bernstein function and a new proof of the well known AG inequality is provided.Comment: 10 page

    Identification-method research for open-source software ecosystems

    Get PDF
    In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework

    Anomalous gauge couplings of the Higgs boson at the CERN LHC: Semileptonic mode in WW scatterings

    Full text link
    We make a full tree level study of the signatures of anomalous gauge couplings of the Higgs boson at the CERN LHC via the semileptonic decay mode in WW scatterings. Both signals and backgrounds are studied at the hadron level for the Higgs mass in the range 115 GeV to 200 GeV. We carefully impose suitable kinematical cuts for suppressing the backgrounds. To the same sensitivity as in the pure leptonic mode, our result shows that the semileptonic mode can reduce the required integrated luminosity by a factor of 3. If the anomalous couplings in nature are actually larger than the sensitivity bounds shown in the text, the experiment can start the test for an integrated luminosity of 50 inverse fb.Comment: PACS numbers updated. Version published in Phys.Rev.D79,055010(2009
    • …
    corecore