17,569 research outputs found
Localized Linear Discriminant Analysis
Despite its age, the Linear Discriminant Analysis performs well even in situations where the underlying premises like normally distributed data with constant covariance matrices over all classes are not met. It is, however, a global technique that does not regard the nature of an individual observation to be classified. By weighting each training observation according to its distance to the observation of interest, a global classifier can be transformed into an observation specific approach. So far, this has been done for logistic discrimination. By using LDA instead, the computation of the local classifier is much simpler. Moreover, it is ready for applications in multi-class situations. --classification,local models,LDA
Localized Regression
The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen dataÂĄadaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures
Codimension-3 Singularities and Yukawa Couplings in F-theory
F-theory is one of the frameworks where all the Yukawa couplings of grand
unified theories are generated and their computation is possible. The Yukawa
couplings of charged matter multiplets are supposed to be generated around
codimension-3 singularity points of a base complex 3-fold, and that has been
confirmed for the simplest type of codimension-3 singularities in recent
studies. However, the geometry of F-theory compactifications is much more
complicated. For a generic F-theory compactification, such issues as flux
configuration around the codimension-3 singularities, field-theory formulation
of the local geometry and behavior of zero-mode wavefunctions have virtually
never been addressed before. We address all these issues in this article, and
further discuss nature of Yukawa couplings generated at such singularities. In
order to calculate the Yukawa couplings of low-energy effective theory,
however, the local descriptions of wavefunctions on complex surfaces and a
global characterization of zero-modes over a complex curve have to be combined
together. We found the relation between them by re-examining how chiral charged
matters are characterized in F-theory compactification. An intrinsic definition
of spectral surfaces in F-theory turns out to be the key concept. As a
biproduct, we found a new way to understand the Heterotic--F theory duality,
which improves the precision of existing duality map associated with
codimension-3 singularities.Comment: 91 pages; minor clarification, typos corrected and a reference added
(v3
ELM regime classification by conformal prediction on an information manifold
Characterization and control of plasma instabilities known as edge-localized modes (ELMs) is crucial for the operation of fusion reactors. Recently, machine learning methods have demonstrated good potential in making useful inferences from stochastic fusion data sets. However, traditional classification methods do not offer an inherent estimate of the goodness of their prediction. In this paper, a distance-based conformal predictor classifier integrated with a geometric-probabilistic framework is presented. The first benefit of the approach lies in its comprehensive treatment of highly stochastic fusion data sets, by modeling the measurements with probability distributions in a metric space. This enables calculation of a natural distance measure between probability distributions: the Rao geodesic distance. Second, the predictions are accompanied by estimates of their accuracy and reliability. The method is applied to the classification of regimes characterized by different types of ELMs based on the measurements of global parameters and their error bars. This yields promising success rates and outperforms state-of-the-art automatic techniques for recognizing ELM signatures. The estimates of goodness of the predictions increase the confidence of classification by ELM experts, while allowing more reliable decisions regarding plasma control and at the same time increasing the robustness of the control system
Gene ranking and biomarker discovery under correlation
Biomarker discovery and gene ranking is a standard task in genomic high
throughput analysis. Typically, the ordering of markers is based on a
stabilized variant of the t-score, such as the moderated t or the SAM
statistic. However, these procedures ignore gene-gene correlations, which may
have a profound impact on the gene orderings and on the power of the subsequent
tests.
We propose a simple procedure that adjusts gene-wise t-statistics to take
account of correlations among genes. The resulting correlation-adjusted
t-scores ("cat" scores) are derived from a predictive perspective, i.e. as a
score for variable selection to discriminate group membership in two-class
linear discriminant analysis. In the absence of correlation the cat score
reduces to the standard t-score. Moreover, using the cat score it is
straightforward to evaluate groups of features (i.e. gene sets). For
computation of the cat score from small sample data we propose a shrinkage
procedure. In a comparative study comprising six different synthetic and
empirical correlation structures we show that the cat score improves estimation
of gene orderings and leads to higher power for fixed true discovery rate, and
vice versa. Finally, we also illustrate the cat score by analyzing metabolomic
data.
The shrinkage cat score is implemented in the R package "st" available from
URL http://cran.r-project.org/web/packages/st/Comment: 18 pages, 5 figures, 1 tabl
Exotic matter on singular divisors in F-theory
We analyze exotic matter representations that arise on singular seven-brane
configurations in F-theory. We develop a general framework for analyzing such
representations, and work out explicit descriptions for models with matter in
the 2-index and 3-index symmetric representations of SU() and SU(2)
respectively, associated with double and triple point singularities in the
seven-brane locus. These matter representations are associated with Weierstrass
models whose discriminants vanish to high order thanks to nontrivial
cancellations possible only in the presence of a non-UFD algebraic structure.
This structure can be described using the normalization of the ring of
intrinsic local functions on a singular divisor. We consider the connection
between geometric constraints on singular curves and corresponding constraints
on the low-energy spectrum of 6D theories, identifying some new examples of
apparent "swampland" theories that cannot be realized in F-theory but have no
apparent low-energy inconsistency.Comment: 71 page
On Two Simple and Effective Procedures for High Dimensional Classification of General Populations
In this paper, we generalize two criteria, the determinant-based and
trace-based criteria proposed by Saranadasa (1993), to general populations for
high dimensional classification. These two criteria compare some distances
between a new observation and several different known groups. The
determinant-based criterion performs well for correlated variables by
integrating the covariance structure and is competitive to many other existing
rules. The criterion however requires the measurement dimension be smaller than
the sample size. The trace-based criterion in contrast, is an independence rule
and effective in the "large dimension-small sample size" scenario. An appealing
property of these two criteria is that their implementation is straightforward
and there is no need for preliminary variable selection or use of turning
parameters. Their asymptotic misclassification probabilities are derived using
the theory of large dimensional random matrices. Their competitive performances
are illustrated by intensive Monte Carlo experiments and a real data analysis.Comment: 5 figures; 22 pages. To appear in "Statistical Papers
E(lementary) Strings in Six-Dimensional Heterotic F-Theory
Using E-strings, we can analyze not only six-dimensional superconformal field
theories but also probe vacua of non-perturabative heterotic string. We study
strings made of D3-branes wrapped on various two-cycles in the global F-theory
setup. We claim that E-strings are elementary in the sense that various
combinations of E-strings can form M-strings as well as heterotic strings and
new kind of strings, called G-strings. Using them, we show that emissions and
combinations of heterotic small instantons generate most of known
six-dimensional superconformal theories, their affinizations and little string
theories. Taking account of global structure of compact internal geometry, we
also show that special combinations of E-strings play an important role in
constructing six-dimensional theories of - and -types. We check global
consistency conditions from anomaly cancellation conditions, both from
five-branes and strings, and show that they are given in terms of elementary
E-string combinations.Comment: 58 pages, 16 figures; v2. version to appear in JHE
- âŠ