306 research outputs found
Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel
Support vector data description (SVDD) is a machine learning technique that
is used for single-class classification and outlier detection. The idea of SVDD
is to find a set of support vectors that defines a boundary around data. When
dealing with online or large data, existing batch SVDD methods have to be rerun
in each iteration. We propose an incremental learning algorithm for SVDD that
uses the Gaussian kernel. This algorithm builds on the observation that all
support vectors on the boundary have the same distance to the center of sphere
in a higher-dimensional feature space as mapped by the Gaussian kernel
function. Each iteration involves only the existing support vectors and the new
data point. Moreover, the algorithm is based solely on matrix manipulations;
the support vectors and their corresponding Lagrange multiplier 's
are automatically selected and determined in each iteration. It can be seen
that the complexity of our algorithm in each iteration is only , where
is the number of support vectors. Experimental results on some real data
sets indicate that FISVDD demonstrates significant gains in efficiency with
almost no loss in either outlier detection accuracy or objective function
value.Comment: 18 pages, 1 table, 4 figure
Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data
How to combine heterogeneous data sources for reliable prediction of transcriptional regulation is a challenge. Here we present an easy but powerful method to integrate Chromatin immunoprecipitation (ChIP)-chip and knock-out data. Since these two types of data provide complementary (physical and functional) information about transcription, the method combining them is expected to achieve high detection rates and very low false positive rates. We try to seek the optimal integration of these two data using hyper-geometric distribution. We evaluate our method on yeast data and compare our predictions with YEASTRACT, high-quality ChIP-chip data, and literature. The results show that even using low-quality ChIP-chip data, our method uncovers more relations than those inferred before from high-quality data. Furthermore our method achieves a low false positive rate. We find experimental and computational evidence in literature for most transcription factor (TF)-gene relations uncovered by our method
A Simple and Efficient Lattice Summation Method for Metallic Electrodes in Constant Potential Molecular Dynamics Simulation
The constant potential molecular dynamics simulation method proposed by
Siepmann and Sprik and reformulated later by Reed (SR-CPM) has been widely
employed to investigate the metallic electrolyte/electrode interfaces,
especially for conducting nanochannels with complex connectivity, *e.g.*,
carbide-derived carbon or graphene-assembled membrane. This work makes
substantial extensions of this seminal SR-CPM approach. First, we introduce two
numerical techniques to determine electrode atom charges with an order of
magnitude improvement in computational efficiency compared with those widely
employed methods. The first numerical technique dramatically accelerates the to
calculation of the Ewald interaction matrix , which takes advantage
of the existing highly optimised electrostatic codes. The second technique
introduces a new preconditioning technique in the conjugate gradient method to
considerably increase the computational efficiency of a linear equation system
that determines electrode atomic charges. Our improved SR-CPM implemented in
the LAMMPS package can handle extra-large systems, *e.g.*, over 8.1 million
electrode atoms. Second, after demonstrating the importance of the
electroneutrality constraint, we propose a two-step method to enforce
electroneutrality in the following post-treatment step, applicable for matrix
and iterative techniques. Third, we propose a solid theoretical analysis for
the adjustable parameter (namely the atomic Hubbard-U ),
which is arbitrarily selected in many SR-CPM simulation practices. We proposed
that the optimised or should compensate for the electrical
potential/energy discrepancy between the discrete atomistic model and the
continuum limit. The analytical and optimal values are derived
for a series of 2D materials
- β¦