5,905 research outputs found
Optimality of Graphlet Screening in High Dimensional Variable Selection
Consider a linear regression model where the design matrix X has n rows and p
columns. We assume (a) p is much large than n, (b) the coefficient vector beta
is sparse in the sense that only a small fraction of its coordinates is
nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row
has relatively few large coordinates (diagonals of G are normalized to 1).
The sparsity in G naturally induces the sparsity of the so-called graph of
strong dependence (GOSD). We find an interesting interplay between the signal
sparsity and the graph sparsity, which ensures that in a broad context, the set
of true signals decompose into many different small-size components of GOSD,
where different components are disconnected.
We propose Graphlet Screening (GS) as a new approach to variable selection,
which is a two-stage Screen and Clean method. The key methodological innovation
of GS is to use GOSD to guide both the screening and cleaning. Compared to
m-variate brute-forth screening that has a computational cost of p^m, the GS
only has a computational cost of p (up to some multi-log(p) factors) in
screening.
We measure the performance of any variable selection procedure by the minimax
Hamming distance. We show that in a very broad class of situations, GS achieves
the optimal rate of convergence in terms of the Hamming distance. Somewhat
surprisingly, the well-known procedures subset selection and the lasso are rate
non-optimal, even in very simple settings and even when their tuning parameters
are ideally set
The geometric mean is a Bernstein function
In the paper, the authors establish, by using Cauchy integral formula in the
theory of complex functions, an integral representation for the geometric mean
of positive numbers. From this integral representation, the geometric mean
is proved to be a Bernstein function and a new proof of the well known AG
inequality is provided.Comment: 10 page
Identification-method research for open-source software ecosystems
In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework
Anomalous gauge couplings of the Higgs boson at the CERN LHC: Semileptonic mode in WW scatterings
We make a full tree level study of the signatures of anomalous gauge
couplings of the Higgs boson at the CERN LHC via the semileptonic decay mode in
WW scatterings. Both signals and backgrounds are studied at the hadron level
for the Higgs mass in the range 115 GeV to 200 GeV. We carefully impose
suitable kinematical cuts for suppressing the backgrounds. To the same
sensitivity as in the pure leptonic mode, our result shows that the
semileptonic mode can reduce the required integrated luminosity by a factor of
3. If the anomalous couplings in nature are actually larger than the
sensitivity bounds shown in the text, the experiment can start the test for an
integrated luminosity of 50 inverse fb.Comment: PACS numbers updated. Version published in Phys.Rev.D79,055010(2009
- …