72 research outputs found
BETULA: Numerically Stable CF-Trees for BIRCH Clustering
BIRCH clustering is a widely known approach for clustering, that has
influenced much subsequent research and commercial products. The key
contribution of BIRCH is the Clustering Feature tree (CF-Tree), which is a
compressed representation of the input data. As new data arrives, the tree is
eventually rebuilt to increase the compression. Afterward, the leaves of the
tree are used for clustering. Because of the data compression, this method is
very scalable. The idea has been adopted for example for k-means, data stream,
and density-based clustering.
Clustering features used by BIRCH are simple summary statistics that can
easily be updated with new data: the number of points, the linear sums, and the
sum of squared values. Unfortunately, how the sum of squares is then used in
BIRCH is prone to catastrophic cancellation.
We introduce a replacement cluster feature that does not have this numeric
problem, that is not much more expensive to maintain, and which makes many
computations simpler and hence more efficient. These cluster features can also
easily be used in other work derived from BIRCH, such as algorithms for
streaming data. In the experiments, we demonstrate the numerical problem and
compare the performance of the original algorithm compared to the improved
cluster features
Comparison of Network Intrusion Detection Performance Using Feature Representation
P. 463-475Intrusion detection is essential for the security of the components
of any network. For that reason, several strategies can be used in
Intrusion Detection Systems (IDS) to identify the increasing attempts to
gain unauthorized access with malicious purposes including those base
on machine learning. Anomaly detection has been applied successfully to
numerous domains and might help to identify unknown attacks. However,
there are existing issues such as high error rates or large dimensionality
of data that make its deployment di cult in real-life scenarios. Representation
learning allows to estimate new latent features of data in a
low-dimensionality space. In this work, anomaly detection is performed
using a previous feature learning stage in order to compare these methods
for the detection of intrusions in network tra c. For that purpose,
four di erent anomaly detection algorithms are applied to recent network
datasets using two di erent feature learning methods such as principal
component analysis and autoencoders. Several evaluation metrics such
as accuracy, F1 score or ROC curves are used for comparing their performance.
The experimental results show an improvement for two of the
anomaly detection methods using autoencoder and no signi cant variations
for the linear feature transformationS
Preliminary genetic evidence of two different populations of Opisthorchis viverrini in Lao PDR
Opisthorchis viverrini is a major public health concern in Southeast Asia. Various reports have suggested that this parasite may represent a species complex, with genetic structure in the region perhaps being dictated by geographical factors and different species of intermediate hosts. We used four microsatellite loci to analyze O. viverrini adult worms originating from six species of cyprinid fish in Thailand and Lao PDR. Two distinct O. viverrini populations were observed. In Ban Phai, Thailand, only one subgroup occurred, hosted by two different fish species. Both subgroups occurred in fish from That Luang, Lao PDR, but were represented to very different degrees among the fish hosts there. Our data suggest that, although geographical separation is more important than fish host specificity in influencing genetic structure, it is possible that two species of Opisthorchis, with little interbreeding, are present near Vientiane in Lao PDR
Data Stream Clustering for Real-Time Anomaly Detection: An Application to Insider Threats
Insider threat detection is an emergent concern for academia, industries, and governments due to the growing number of insider incidents in recent years. The continuous streaming of unbounded data coming from various sources in an organisation, typically in a high velocity, leads to a typical Big Data computational problem. The malicious insider threat refers to anomalous behaviour(s) (outliers) that deviate from the normal baseline of a data stream. The absence of previously logged activities executed by users shapes the insider threat detection mechanism into an unsupervised anomaly detection approach over a data stream. A common shortcoming in the existing data mining approaches to detect insider threats is the high number of false alarms/positives (FPs). To handle the Big Data issue and to address the shortcoming, we propose a streaming anomaly detection approach, namely Ensemble of Random subspace Anomaly detectors In Data Streams (E-RAIDS), for insider threat detection. E-RAIDS learns an ensemble of p established outlier detection techniques [Micro-cluster-based Continuous Outlier Detection (MCOD) or Anytime Outlier Detection (AnyOut)] which employ clustering over continuous data streams. Each model of the p models learns from a random feature subspace to detect local outliers, which might not be detected over the whole feature space. E-RAIDS introduces an aggregate component that combines the results from the p feature subspaces, in order to confirm whether to generate an alarm at each window iteration. The merit of E-RAIDS is that it defines a survival factor and a vote factor to address the shortcoming of high number of FPs. Experiments on E-RAIDS-MCOD and E-RAIDS-AnyOut are carried out, on synthetic data sets including malicious insider threat scenarios generated at Carnegie Mellon University, to test the effectiveness of voting feature subspaces, and the capability to detect (more than one)-behaviour-all-threat in real-time. The results show that E-RAIDS-MCOD reports the highest F1 measure and less number of false alarm = 0 compared to E-RAIDS-AnyOut, as well as it attains to detect approximately all the insider threats in real-time
Optimization of interneuron function by direct coupling of cell migration and axonal targeting
Neural circuit assembly relies on the precise synchronization of developmental processes, such as cell migration and axon targeting, but the cell-autonomous mechanisms coordinating these events remain largely unknown. Here we found that different classes of interneurons use distinct routes of migration to reach the embryonic cerebral cortex. Somatostatin-expressing interneurons that migrate through the marginal zone develop into Martinotti cells, one of the most distinctive classes of cortical interneurons. For these cells, migration through the marginal zone is linked to the development of their characteristic layer 1 axonal arborization. Altering the normal migratory route of Martinotti cells by conditional deletion of Mafb—a gene that is preferentially expressed by these cells—cell-autonomously disrupts axonal development and impairs the function of these cells in vivo. Our results suggest that migration and axon targeting programs are coupled to optimize the assembly of inhibitory circuits in the cerebral cortex
Transcriptional Responses of Cultured Rat Sympathetic Neurons during BMP-7-Induced Dendritic Growth
Dendrites are the primary site of synapse formation in the vertebrate nervous system; however, relatively little is known about the molecular mechanisms that regulate the initial formation of primary dendrites. Embryonic rat sympathetic neurons cultured under defined conditions extend a single functional axon, but fail to form dendrites. Addition of bone morphogenetic proteins (BMPs) triggers these neurons to extend multiple dendrites without altering axonal growth or cell survival. We used this culture system to examine differential gene expression patterns in naïve vs. BMP-treated sympathetic neurons in order to identify candidate genes involved in regulation of primary dendritogenesis.To determine the critical transcriptional window during BMP-induced dendritic growth, morphometric analysis of microtubule-associated protein (MAP-2)-immunopositive processes was used to quantify dendritic growth in cultures exposed to the transcription inhibitor actinomycin-D added at varying times after addition of BMP-7. BMP-7-induced dendritic growth was blocked when transcription was inhibited within the first 24 hr after adding exogenous BMP-7. Thus, total RNA was isolated from sympathetic neurons exposed to three different experimental conditions: (1) no BMP-7 treatment; (2) treatment with BMP-7 for 6 hr; and (3) treatment with BMP-7 for 24 hr. Affymetrix oligonucleotide microarrays were used to identify differential gene expression under these three culture conditions. BMP-7 significantly regulated 56 unique genes at 6 hr and 185 unique genes at 24 hr. Bioinformatic analyses implicate both established and novel genes and signaling pathways in primary dendritogenesis.This study provides a unique dataset that will be useful in generating testable hypotheses regarding transcriptional control of the initial stages of dendritic growth. Since BMPs selectively promote dendritic growth in central neurons as well, these findings may be generally applicable to dendritic growth in other neuronal cell types
- …