1,217 research outputs found
Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm
Microbial clades modeling is a challenging problem in biology based on
microarray genome sequences, especially in new species gene isolates discovery
and category. Marker family genome sequences play important roles in describing
specific microbial clades within species, a framework of support vector machine
(SVM) based microbial species classification with N-best algorithm is
constructed to classify the centroid marker genome fragments randomly generated
from marker genome sequences on MetaRef. A time series feature extraction
method is proposed by segmenting the centroid gene sequences and mapping into
different dimensional spaces. Two ways of data splitting are investigated
according to random splitting fragments along genome sequence (DI) , or
separating genome sequences into two parts (DII).Two strategies of fragments
recognition tasks, dimension-by-dimension and sequence--by--sequence, are
investigated. The k-mer size selection, overlap of segmentation and effects of
random split percents are also discussed. Experiments on 12390 maker genome
sequences belonging to marker families of 17 species from MetaRef show that,
both for DI and DII in dimension-by-dimension and sequence-by-sequence
recognition, the recognition accuracy rates can achieve above 28\% in top-1
candidate, and above 91\% in top-10 candidate both on training and testing sets
overall.Comment: 17 pages, 59 figure
Approximative Theorem of Incomplete Riemann-Stieltjes Sum of Stochastic Integral
The approximative theorems of incomplete Riemann-Stieltjes sums of Ito
stochastic integral, mean square integral and Stratonovich stochastic integral
with respect to Brownian motion are investigated. Some sufficient conditions of
incomplete Riemann-Stieltjes sums approaching stochastic integral are
developed, which establish the alternative ways to converge stochastic
integral. And, Two simulation examples of incomplete Riemann-Stieltjes sums
about Ito stochastic integral and Stratonovich stochastic integral are given
for demonstration.Comment: 4 figure
Extension of Three-Variable Counterfactual Casual Graphic Model: from Two-Value to Three-Value Random Variable
The extension of counterfactual causal graphic model with three variables of
vertex set in directed acyclic graph (DAG) is discussed in this paper by
extending two- value distribution to three-value distribution of the variables
involved in DAG. Using the conditional independence as ancillary information, 6
kinds of extension counterfactual causal graphic models with some variables are
extended from two-value distribution to three-value distribution and the
sufficient conditions of identifiability are derived
Extension and Application of Deleting Items and Disturbing Mesh Theorem of Riemann Integral
The deleting items and disturbing mesh theorems of Riemann Integral are
extended to multiple integral,line integral and surface integral respectively
by constructing various of incomplete Riemann sum and non-Riemann sum sequences
which converge to the same limit of classical Riemann sum. And, the deleting
items and disturbing mesh formulae of Green's theorem, Stokes' theorem and
divergence theorem (Gauss's or Ostrogradsky 's theorem) are also deduced. Then,
the deleting items and disturbing mesh theorems of general Stokes' theorem on
differential manifold are also derived.Comment: 42 page
Parameter Estimation of Jelinski-Moranda Model Based on Weighted Nonlinear Least Squares and Heteroscedasticity
Parameter estimation method of Jelinski-Moranda (JM) model based on weighted
nonlinear least squares (WNLS) is proposed. The formulae of resolving the
parameter WNLS estimation (WNLSE) are derived, and the empirical weight
function and heteroscedasticity problem are discussed. The effects of
optimization parameter estimation selection based on maximum likelihood
estimation (MLE) method, least squares estimation (LSE) method and weighted
nonlinear least squares estimation (WNLSE) method are also investigated. Two
strategies of heteroscedasticity decision and weighting methods embedded in JM
model prediction process are also investigated. The experimental results on
standard software reliability analysis database-Naval Tactical Data System
(NTDS) and three datasets used by J.D. Musa demonstrate that WNLSE method can
be superior to LSE and MLE under the relative error (RE) criterion.Comment: 17 pages, 10 figure
Implied volatility formula of European Power Option Pricing
We derive the implied volatility estimation formula in European power call
options pricing, where the payoff functions are in the form of
and
()respectively. Using quadratic Taylor approximations, We develop the
computing formula of implied volatility in European power call option and
extend the traditional implied volatility formula of Charles J.Corrado, et al
(1996) to general power option pricing. And the Monte-Carlo simulations are
also given
Confounding of three binary-variables counterfactual model
Confounding of three binary-variables counterfactual model is discussed in
this paper. According to the effect between the control variable and the
covariate variable, we investigate three counterfactual models: the control
variable is independent of the covariate variable, the control variable has the
effect on the covariate variable and the covariate variable affects the control
variable. Using the ancillary information based on conditional independence
hypotheses, the sufficient conditions to determine whether the covariate
variable is an irrelevant factor or a confounder in each counterfactual model
are obtained
Function Based Nonlinear Least Squares and Application to Jelinski--Moranda Software Reliability Model
A function based nonlinear least squares estimation (FNLSE) method is
proposed and investigated in parameter estimation of Jelinski-Moranda software
reliability model. FNLSE extends the potential fitting functions of traditional
least squares estimation (LSE), and takes the logarithm transformed nonlinear
least squares estimation (LogLSE) as a special case. A novel power
transformation function based nonlinear least squares estimation (powLSE) is
proposed and applied to the parameter estimation of Jelinski-Moranda model.
Solved with Newton-Raphson method, Both LogLSE and powLSE of Jelinski-Moranda
models are applied to the mean time between failures (MTBF) predications on six
standard software failure time data sets. The experimental results demonstrate
the effectiveness of powLSE with optimal power index compared to the classical
least--squares estimation (LSE), maximum likelihood estimation (MLE) and LogLSE
in terms of recursively relative error (RE) index and Braun statistic index
An Optimal Transport View on Generalization
We derive upper bounds on the generalization error of learning algorithms
based on their \emph{algorithmic transport cost}: the expected Wasserstein
distance between the output hypothesis and the output hypothesis conditioned on
an input example. The bounds provide a novel approach to study the
generalization of learning algorithms from an optimal transport view and impose
less constraints on the loss function, such as sub-gaussian or bounded. We
further provide several upper bounds on the algorithmic transport cost in terms
of total variation distance, relative entropy (or KL-divergence), and VC
dimension, thus further bridging optimal transport theory and information
theory with statistical learning theory. Moreover, we also study different
conditions for loss functions under which the generalization error of a
learning algorithm can be upper bounded by different probability metrics
between distributions relating to the output hypothesis and/or the input data.
Finally, under our established framework, we analyze the generalization in deep
learning and conclude that the generalization error in deep neural networks
(DNNs) decreases exponentially to zero as the number of layers increases. Our
analyses of generalization error in deep learning mainly exploit the
hierarchical structure in DNNs and the contraction property of -divergence,
which may be of independent interest in analyzing other learning models with
hierarchical structure.Comment: 27 pages, 2 figures, 1 tabl
Understanding MCMC Dynamics as Flows on the Wasserstein Space
It is known that the Langevin dynamics used in MCMC is the gradient flow of
the KL divergence on the Wasserstein space, which helps convergence analysis
and inspires recent particle-based variational inference methods (ParVIs). But
no more MCMC dynamics is understood in this way. In this work, by developing
novel concepts, we propose a theoretical framework that recognizes a general
MCMC dynamics as the fiber-gradient Hamiltonian flow on the Wasserstein space
of a fiber-Riemannian Poisson manifold. The "conservation + convergence"
structure of the flow gives a clear picture on the behavior of general MCMC
dynamics. The framework also enables ParVI simulation of MCMC dynamics, which
enriches the ParVI family with more efficient dynamics, and also adapts ParVI
advantages to MCMCs. We develop two ParVI methods for a particular MCMC
dynamics and demonstrate the benefits in experiments.Comment: References refine
- …