306 research outputs found

    Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

    Full text link
    Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier Ξ±i\alpha_i's are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only O(k2)O(k^2), where kk is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.Comment: 18 pages, 1 table, 4 figure

    Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data

    Get PDF
    How to combine heterogeneous data sources for reliable prediction of transcriptional regulation is a challenge. Here we present an easy but powerful method to integrate Chromatin immunoprecipitation (ChIP)-chip and knock-out data. Since these two types of data provide complementary (physical and functional) information about transcription, the method combining them is expected to achieve high detection rates and very low false positive rates. We try to seek the optimal integration of these two data using hyper-geometric distribution. We evaluate our method on yeast data and compare our predictions with YEASTRACT, high-quality ChIP-chip data, and literature. The results show that even using low-quality ChIP-chip data, our method uncovers more relations than those inferred before from high-quality data. Furthermore our method achieves a low false positive rate. We find experimental and computational evidence in literature for most transcription factor (TF)-gene relations uncovered by our method

    A Simple and Efficient Lattice Summation Method for Metallic Electrodes in Constant Potential Molecular Dynamics Simulation

    Full text link
    The constant potential molecular dynamics simulation method proposed by Siepmann and Sprik and reformulated later by Reed (SR-CPM) has been widely employed to investigate the metallic electrolyte/electrode interfaces, especially for conducting nanochannels with complex connectivity, *e.g.*, carbide-derived carbon or graphene-assembled membrane. This work makes substantial extensions of this seminal SR-CPM approach. First, we introduce two numerical techniques to determine electrode atom charges with an order of magnitude improvement in computational efficiency compared with those widely employed methods. The first numerical technique dramatically accelerates the to calculation of the Ewald interaction matrix E\mathbf{E}, which takes advantage of the existing highly optimised electrostatic codes. The second technique introduces a new preconditioning technique in the conjugate gradient method to considerably increase the computational efficiency of a linear equation system that determines electrode atomic charges. Our improved SR-CPM implemented in the LAMMPS package can handle extra-large systems, *e.g.*, over 8.1 million electrode atoms. Second, after demonstrating the importance of the electroneutrality constraint, we propose a two-step method to enforce electroneutrality in the following post-treatment step, applicable for matrix and iterative techniques. Third, we propose a solid theoretical analysis for the adjustable parameter Ξ±i\alpha_i (namely the atomic Hubbard-U Ui0U_i^0), which is arbitrarily selected in many SR-CPM simulation practices. We proposed that the optimised Ξ±i\alpha_i or Ui0U_i^0 should compensate for the electrical potential/energy discrepancy between the discrete atomistic model and the continuum limit. The analytical and optimal Ξ±i0{\alpha}_i^0 values are derived for a series of 2D materials
    • …
    corecore