1,217 research outputs found

    Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm

    Full text link
    Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family genome sequences play important roles in describing specific microbial clades within species, a framework of support vector machine (SVM) based microbial species classification with N-best algorithm is constructed to classify the centroid marker genome fragments randomly generated from marker genome sequences on MetaRef. A time series feature extraction method is proposed by segmenting the centroid gene sequences and mapping into different dimensional spaces. Two ways of data splitting are investigated according to random splitting fragments along genome sequence (DI) , or separating genome sequences into two parts (DII).Two strategies of fragments recognition tasks, dimension-by-dimension and sequence--by--sequence, are investigated. The k-mer size selection, overlap of segmentation and effects of random split percents are also discussed. Experiments on 12390 maker genome sequences belonging to marker families of 17 species from MetaRef show that, both for DI and DII in dimension-by-dimension and sequence-by-sequence recognition, the recognition accuracy rates can achieve above 28\% in top-1 candidate, and above 91\% in top-10 candidate both on training and testing sets overall.Comment: 17 pages, 59 figure

    Approximative Theorem of Incomplete Riemann-Stieltjes Sum of Stochastic Integral

    Full text link
    The approximative theorems of incomplete Riemann-Stieltjes sums of Ito stochastic integral, mean square integral and Stratonovich stochastic integral with respect to Brownian motion are investigated. Some sufficient conditions of incomplete Riemann-Stieltjes sums approaching stochastic integral are developed, which establish the alternative ways to converge stochastic integral. And, Two simulation examples of incomplete Riemann-Stieltjes sums about Ito stochastic integral and Stratonovich stochastic integral are given for demonstration.Comment: 4 figure

    Extension of Three-Variable Counterfactual Casual Graphic Model: from Two-Value to Three-Value Random Variable

    Full text link
    The extension of counterfactual causal graphic model with three variables of vertex set in directed acyclic graph (DAG) is discussed in this paper by extending two- value distribution to three-value distribution of the variables involved in DAG. Using the conditional independence as ancillary information, 6 kinds of extension counterfactual causal graphic models with some variables are extended from two-value distribution to three-value distribution and the sufficient conditions of identifiability are derived

    Extension and Application of Deleting Items and Disturbing Mesh Theorem of Riemann Integral

    Full text link
    The deleting items and disturbing mesh theorems of Riemann Integral are extended to multiple integral,line integral and surface integral respectively by constructing various of incomplete Riemann sum and non-Riemann sum sequences which converge to the same limit of classical Riemann sum. And, the deleting items and disturbing mesh formulae of Green's theorem, Stokes' theorem and divergence theorem (Gauss's or Ostrogradsky 's theorem) are also deduced. Then, the deleting items and disturbing mesh theorems of general Stokes' theorem on differential manifold are also derived.Comment: 42 page

    Parameter Estimation of Jelinski-Moranda Model Based on Weighted Nonlinear Least Squares and Heteroscedasticity

    Full text link
    Parameter estimation method of Jelinski-Moranda (JM) model based on weighted nonlinear least squares (WNLS) is proposed. The formulae of resolving the parameter WNLS estimation (WNLSE) are derived, and the empirical weight function and heteroscedasticity problem are discussed. The effects of optimization parameter estimation selection based on maximum likelihood estimation (MLE) method, least squares estimation (LSE) method and weighted nonlinear least squares estimation (WNLSE) method are also investigated. Two strategies of heteroscedasticity decision and weighting methods embedded in JM model prediction process are also investigated. The experimental results on standard software reliability analysis database-Naval Tactical Data System (NTDS) and three datasets used by J.D. Musa demonstrate that WNLSE method can be superior to LSE and MLE under the relative error (RE) criterion.Comment: 17 pages, 10 figure

    Implied volatility formula of European Power Option Pricing

    Full text link
    We derive the implied volatility estimation formula in European power call options pricing, where the payoff functions are in the form of V=(STα−K)+V=(S^{\alpha}_T-K)^{+} and V=(STα−Kα)+V=(S^{\alpha}_T-K^{\alpha})^{+} (α>0\alpha>0)respectively. Using quadratic Taylor approximations, We develop the computing formula of implied volatility in European power call option and extend the traditional implied volatility formula of Charles J.Corrado, et al (1996) to general power option pricing. And the Monte-Carlo simulations are also given

    Confounding of three binary-variables counterfactual model

    Full text link
    Confounding of three binary-variables counterfactual model is discussed in this paper. According to the effect between the control variable and the covariate variable, we investigate three counterfactual models: the control variable is independent of the covariate variable, the control variable has the effect on the covariate variable and the covariate variable affects the control variable. Using the ancillary information based on conditional independence hypotheses, the sufficient conditions to determine whether the covariate variable is an irrelevant factor or a confounder in each counterfactual model are obtained

    Function Based Nonlinear Least Squares and Application to Jelinski--Moranda Software Reliability Model

    Full text link
    A function based nonlinear least squares estimation (FNLSE) method is proposed and investigated in parameter estimation of Jelinski-Moranda software reliability model. FNLSE extends the potential fitting functions of traditional least squares estimation (LSE), and takes the logarithm transformed nonlinear least squares estimation (LogLSE) as a special case. A novel power transformation function based nonlinear least squares estimation (powLSE) is proposed and applied to the parameter estimation of Jelinski-Moranda model. Solved with Newton-Raphson method, Both LogLSE and powLSE of Jelinski-Moranda models are applied to the mean time between failures (MTBF) predications on six standard software failure time data sets. The experimental results demonstrate the effectiveness of powLSE with optimal power index compared to the classical least--squares estimation (LSE), maximum likelihood estimation (MLE) and LogLSE in terms of recursively relative error (RE) index and Braun statistic index

    An Optimal Transport View on Generalization

    Full text link
    We derive upper bounds on the generalization error of learning algorithms based on their \emph{algorithmic transport cost}: the expected Wasserstein distance between the output hypothesis and the output hypothesis conditioned on an input example. The bounds provide a novel approach to study the generalization of learning algorithms from an optimal transport view and impose less constraints on the loss function, such as sub-gaussian or bounded. We further provide several upper bounds on the algorithmic transport cost in terms of total variation distance, relative entropy (or KL-divergence), and VC dimension, thus further bridging optimal transport theory and information theory with statistical learning theory. Moreover, we also study different conditions for loss functions under which the generalization error of a learning algorithm can be upper bounded by different probability metrics between distributions relating to the output hypothesis and/or the input data. Finally, under our established framework, we analyze the generalization in deep learning and conclude that the generalization error in deep neural networks (DNNs) decreases exponentially to zero as the number of layers increases. Our analyses of generalization error in deep learning mainly exploit the hierarchical structure in DNNs and the contraction property of ff-divergence, which may be of independent interest in analyzing other learning models with hierarchical structure.Comment: 27 pages, 2 figures, 1 tabl

    Understanding MCMC Dynamics as Flows on the Wasserstein Space

    Full text link
    It is known that the Langevin dynamics used in MCMC is the gradient flow of the KL divergence on the Wasserstein space, which helps convergence analysis and inspires recent particle-based variational inference methods (ParVIs). But no more MCMC dynamics is understood in this way. In this work, by developing novel concepts, we propose a theoretical framework that recognizes a general MCMC dynamics as the fiber-gradient Hamiltonian flow on the Wasserstein space of a fiber-Riemannian Poisson manifold. The "conservation + convergence" structure of the flow gives a clear picture on the behavior of general MCMC dynamics. The framework also enables ParVI simulation of MCMC dynamics, which enriches the ParVI family with more efficient dynamics, and also adapts ParVI advantages to MCMCs. We develop two ParVI methods for a particular MCMC dynamics and demonstrate the benefits in experiments.Comment: References refine
    • …
    corecore