108 research outputs found
Deep Stacked Stochastic Configuration Networks for Lifelong Learning of Non-Stationary Data Streams
The concept of SCN offers a fast framework with universal approximation
guarantee for lifelong learning of non-stationary data streams. Its adaptive
scope selection property enables for proper random generation of hidden unit
parameters advancing conventional randomized approaches constrained with a
fixed scope of random parameters. This paper proposes deep stacked stochastic
configuration network (DSSCN) for continual learning of non-stationary data
streams which contributes two major aspects: 1) DSSCN features a
self-constructing methodology of deep stacked network structure where hidden
unit and hidden layer are extracted automatically from continuously generated
data streams; 2) the concept of SCN is developed to randomly assign inverse
covariance matrix of multivariate Gaussian function in the hidden node addition
step bypassing its computationally prohibitive tuning phase. Numerical
evaluation and comparison with prominent data stream algorithms under two
procedures: periodic hold-out and prequential test-then-train processes
demonstrate the advantage of proposed methodology.Comment: This paper has been published in Information Science
Stochastic Configuration Machines: FPGA Implementation
Neural networks for industrial applications generally have additional
constraints such as response speed, memory size and power usage. Randomized
learners can address some of these issues. However, hardware solutions can
provide better resource reduction whilst maintaining the model's performance.
Stochastic configuration networks (SCNs) are a prime choice in industrial
applications due to their merits and feasibility for data modelling. Stochastic
Configuration Machines (SCMs) extend this to focus on reducing the memory
constraints by limiting the randomized weights to a binary value with a scalar
for each node and using a mechanism model to improve the learning performance
and result interpretability. This paper aims to implement SCM models on a field
programmable gate array (FPGA) and introduce binary-coded inputs to the
algorithm. Results are reported for two benchmark and two industrial datasets,
including SCM with single-layer and deep architectures.Comment: 19 pages, 9 figures, 8 table
SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
<p>Abstract</p> <p>Background</p> <p>Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.</p> <p>Results</p> <p>This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.</p> <p>Conclusions</p> <p>Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.</p
Stochastic Configuration Machines for Industrial Artificial Intelligence
Real-time predictive modelling with desired accuracy is highly expected in
industrial artificial intelligence (IAI), where neural networks play a key
role. Neural networks in IAI require powerful, high-performance computing
devices to operate a large number of floating point data. Based on stochastic
configuration networks (SCNs), this paper proposes a new randomized learner
model, termed stochastic configuration machines (SCMs), to stress effective
modelling and data size saving that are useful and valuable for industrial
applications. Compared to SCNs and random vector functional-link (RVFL) nets
with binarized implementation, the model storage of SCMs can be significantly
compressed while retaining favourable prediction performance. Besides the
architecture of the SCM learner model and its learning algorithm, as an
important part of this contribution, we also provide a theoretical basis on the
learning capacity of SCMs by analysing the model's complexity. Experimental
studies are carried out over some benchmark datasets and three industrial
applications. The results demonstrate that SCM has great potential for dealing
with industrial data analytics.Comment: 23 pages, 7 figures, 12 table
MISCORE: Mismatch-Based Matrix Similarity Scores for DNA Motif Detection
To detect or discover motifs in DNA sequences, two important concepts related to existing computational approaches are motif model and similarity score. One of motif models, represented by a position frequency matrix (PFM), has been widely employed to search for putative motifs. Detection and discovery of motifs can be done by comparing kmers with a motif model, or clustering kmers according to some criteria. In the past, information content based similarity scores have been widely used in searching tools. In this paper, we present a mismatchbased
matrix similarity score (namely, MISCORE) for motif searching and discovering purpose. The proposed MISCORE can be biologically interpreted as an evolutionary metric for predicting a kmer as a motif member or not. Weighting factors, which are meaningful for biological data mining practice, are introduced in the MISCORE. The effectiveness of the MISCORE is investigated through exploring its separability, recognizability and robustness. Three well-known information contentbased matrix similarity scores are compared, and results show that our MISCORE works well
Optimization of MISCORE-based Motif Identification Systems
Identification of motifs in DNA sequences using classification techniques is one of computational approaches to discovering novel binding sites. In the previous work [16], we proposed a simple and effective method for motif detection using a single crisp rule governed by a mismatch-based matrix similarity score (MISCORE). In this paper, we consider the problem
of finding suitable motif cut-off value for MISCORE-based motif identification systems using cost-sensitivity metric. We utilize phylogenetic footprinting data to estimate the parameters in the cost function. We also extend the MISCORE to include entropy to weigh each motif model position to minimize the false positive rate. The performance evaluation is done by using artificial and real DNA sequences. The results demonstrate the feasibility and
usefulness of our proposed approach for model based cut-off
value estimation
Realization of Generalized RBF Network
Neural classifiers have been widely used in many application areas. This paper describes generalized neural classifier based on the radial basis function network. The contributions of this work are: i) improvement on the standard radial basis function network architecture, ii) proposed a new cost function for classification,
iii) hidden units feature subset selection algorithm, and iv) optimizing the neural classifier using the genetic algorithm with a new cost function. Comparative studies on the proposed neural classifier on protein classification problem are given
Computational Discovery of Motifs Using Hierarchical Clustering Techniques
Discovery of motifs plays a key role in understanding
gene regulation in organisms. Existing tools for motif discovery demonstrate some weaknesses in dealing with reliability and scalability. Therefore, development of advanced algorithms for resolving this problem will be useful. This paper aims to develop data mining techniques for discovering motifs. A mismatch based hierarchical clustering algorithm is proposed in this paper, where three heuristic rules for classifying clusters and a post-processing for ranking and refining the clusters are employed in the algorithm. Our algorithm is evaluated using two sets of DNA sequences with comparisons. Results demonstrate that the proposed techniques in this paper outperform MEME, AlignACE and SOMBRERO for most of the testing datasets
SOMIX: Motifs Discovery in Gene Regulatory Sequences Using Self-Organizing Maps
We present a clustering algorithm called Self-organizing Map Neural Network with mixed signals discrimination (SOMIX), to discover binding sites in a set of regulatory regions. Our framework integrates a novel intra-node soft competitive procedure in each node model to achieve maximum discrimination of motif from background signals. The intra-node competition is based on an adaptive weighting technique on two different signal models: position specific scoring matrix and markov chain. Simulations on real and artificial datasets showed that, SOMIX could achieve significant performance improvement in terms of sensitivity and specificity over SOMBRERO, which is a well-known SOM based motif discovery tool. SOMIX has also been found promising comparing against other popular motif discovery tools
- ā¦