72 research outputs found

    BETULA: Numerically Stable CF-Trees for BIRCH Clustering

    Full text link
    BIRCH clustering is a widely known approach for clustering, that has influenced much subsequent research and commercial products. The key contribution of BIRCH is the Clustering Feature tree (CF-Tree), which is a compressed representation of the input data. As new data arrives, the tree is eventually rebuilt to increase the compression. Afterward, the leaves of the tree are used for clustering. Because of the data compression, this method is very scalable. The idea has been adopted for example for k-means, data stream, and density-based clustering. Clustering features used by BIRCH are simple summary statistics that can easily be updated with new data: the number of points, the linear sums, and the sum of squared values. Unfortunately, how the sum of squares is then used in BIRCH is prone to catastrophic cancellation. We introduce a replacement cluster feature that does not have this numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient. These cluster features can also easily be used in other work derived from BIRCH, such as algorithms for streaming data. In the experiments, we demonstrate the numerical problem and compare the performance of the original algorithm compared to the improved cluster features

    Comparison of Network Intrusion Detection Performance Using Feature Representation

    Get PDF
    P. 463-475Intrusion detection is essential for the security of the components of any network. For that reason, several strategies can be used in Intrusion Detection Systems (IDS) to identify the increasing attempts to gain unauthorized access with malicious purposes including those base on machine learning. Anomaly detection has been applied successfully to numerous domains and might help to identify unknown attacks. However, there are existing issues such as high error rates or large dimensionality of data that make its deployment di cult in real-life scenarios. Representation learning allows to estimate new latent features of data in a low-dimensionality space. In this work, anomaly detection is performed using a previous feature learning stage in order to compare these methods for the detection of intrusions in network tra c. For that purpose, four di erent anomaly detection algorithms are applied to recent network datasets using two di erent feature learning methods such as principal component analysis and autoencoders. Several evaluation metrics such as accuracy, F1 score or ROC curves are used for comparing their performance. The experimental results show an improvement for two of the anomaly detection methods using autoencoder and no signi cant variations for the linear feature transformationS

    Preliminary genetic evidence of two different populations of Opisthorchis viverrini in Lao PDR

    Get PDF
    Opisthorchis viverrini is a major public health concern in Southeast Asia. Various reports have suggested that this parasite may represent a species complex, with genetic structure in the region perhaps being dictated by geographical factors and different species of intermediate hosts. We used four microsatellite loci to analyze O. viverrini adult worms originating from six species of cyprinid fish in Thailand and Lao PDR. Two distinct O. viverrini populations were observed. In Ban Phai, Thailand, only one subgroup occurred, hosted by two different fish species. Both subgroups occurred in fish from That Luang, Lao PDR, but were represented to very different degrees among the fish hosts there. Our data suggest that, although geographical separation is more important than fish host specificity in influencing genetic structure, it is possible that two species of Opisthorchis, with little interbreeding, are present near Vientiane in Lao PDR

    Data Stream Clustering for Real-Time Anomaly Detection: An Application to Insider Threats

    Get PDF
    Insider threat detection is an emergent concern for academia, industries, and governments due to the growing number of insider incidents in recent years. The continuous streaming of unbounded data coming from various sources in an organisation, typically in a high velocity, leads to a typical Big Data computational problem. The malicious insider threat refers to anomalous behaviour(s) (outliers) that deviate from the normal baseline of a data stream. The absence of previously logged activities executed by users shapes the insider threat detection mechanism into an unsupervised anomaly detection approach over a data stream. A common shortcoming in the existing data mining approaches to detect insider threats is the high number of false alarms/positives (FPs). To handle the Big Data issue and to address the shortcoming, we propose a streaming anomaly detection approach, namely Ensemble of Random subspace Anomaly detectors In Data Streams (E-RAIDS), for insider threat detection. E-RAIDS learns an ensemble of p established outlier detection techniques [Micro-cluster-based Continuous Outlier Detection (MCOD) or Anytime Outlier Detection (AnyOut)] which employ clustering over continuous data streams. Each model of the p models learns from a random feature subspace to detect local outliers, which might not be detected over the whole feature space. E-RAIDS introduces an aggregate component that combines the results from the p feature subspaces, in order to confirm whether to generate an alarm at each window iteration. The merit of E-RAIDS is that it defines a survival factor and a vote factor to address the shortcoming of high number of FPs. Experiments on E-RAIDS-MCOD and E-RAIDS-AnyOut are carried out, on synthetic data sets including malicious insider threat scenarios generated at Carnegie Mellon University, to test the effectiveness of voting feature subspaces, and the capability to detect (more than one)-behaviour-all-threat in real-time. The results show that E-RAIDS-MCOD reports the highest F1 measure and less number of false alarm = 0 compared to E-RAIDS-AnyOut, as well as it attains to detect approximately all the insider threats in real-time

    Optimization of interneuron function by direct coupling of cell migration and axonal targeting

    Get PDF
    Neural circuit assembly relies on the precise synchronization of developmental processes, such as cell migration and axon targeting, but the cell-autonomous mechanisms coordinating these events remain largely unknown. Here we found that different classes of interneurons use distinct routes of migration to reach the embryonic cerebral cortex. Somatostatin-expressing interneurons that migrate through the marginal zone develop into Martinotti cells, one of the most distinctive classes of cortical interneurons. For these cells, migration through the marginal zone is linked to the development of their characteristic layer 1 axonal arborization. Altering the normal migratory route of Martinotti cells by conditional deletion of Mafb—a gene that is preferentially expressed by these cells—cell-autonomously disrupts axonal development and impairs the function of these cells in vivo. Our results suggest that migration and axon targeting programs are coupled to optimize the assembly of inhibitory circuits in the cerebral cortex

    Transcriptional Responses of Cultured Rat Sympathetic Neurons during BMP-7-Induced Dendritic Growth

    Get PDF
    Dendrites are the primary site of synapse formation in the vertebrate nervous system; however, relatively little is known about the molecular mechanisms that regulate the initial formation of primary dendrites. Embryonic rat sympathetic neurons cultured under defined conditions extend a single functional axon, but fail to form dendrites. Addition of bone morphogenetic proteins (BMPs) triggers these neurons to extend multiple dendrites without altering axonal growth or cell survival. We used this culture system to examine differential gene expression patterns in naïve vs. BMP-treated sympathetic neurons in order to identify candidate genes involved in regulation of primary dendritogenesis.To determine the critical transcriptional window during BMP-induced dendritic growth, morphometric analysis of microtubule-associated protein (MAP-2)-immunopositive processes was used to quantify dendritic growth in cultures exposed to the transcription inhibitor actinomycin-D added at varying times after addition of BMP-7. BMP-7-induced dendritic growth was blocked when transcription was inhibited within the first 24 hr after adding exogenous BMP-7. Thus, total RNA was isolated from sympathetic neurons exposed to three different experimental conditions: (1) no BMP-7 treatment; (2) treatment with BMP-7 for 6 hr; and (3) treatment with BMP-7 for 24 hr. Affymetrix oligonucleotide microarrays were used to identify differential gene expression under these three culture conditions. BMP-7 significantly regulated 56 unique genes at 6 hr and 185 unique genes at 24 hr. Bioinformatic analyses implicate both established and novel genes and signaling pathways in primary dendritogenesis.This study provides a unique dataset that will be useful in generating testable hypotheses regarding transcriptional control of the initial stages of dendritic growth. Since BMPs selectively promote dendritic growth in central neurons as well, these findings may be generally applicable to dendritic growth in other neuronal cell types
    • …
    corecore