77 research outputs found

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    Network analysis of the cellular circuits of memory

    Get PDF
    Intuitively, memory is conceived as a collection of static images that we accumulate as we experience the world. But actually, memories are constantly changing through our life, shaped by our ongoing experiences. Assimilating new knowledge without corrupting pre-existing memories is then a critical brain function. However, learning and memory interact: prior knowledge can proactively influence learning, and new information can retroactively modify memories of past events. The hippocampus is a brain region essential for learning and memory, but the network-level operations that underlie the continuous integration of new experiences into memory, segregating them as discrete traces while enabling their interaction, are unknown. Here I show a network mechanism by which two distinct memories interact. Hippocampal CA1 neuron ensembles were monitored in mice as they explored a familiar environment before and after forming a new place-reward memory in a different environment. By employing a network science representation of the co-firing relationships among principal cells, I first found that new associative learning modifies the topology of the cells’ co-firing patterns representing the unrelated familiar environment. I fur- ther observed that these neuronal co-firing graphs evolved along three functional axes: the first segregated novelty; the second distinguished individual novel be- havioural experiences; while the third revealed cross-memory interaction. Finally, I found that during this process, high activity principal cells rapidly formed the core representation of each memory; whereas low activity principal cells gradually joined co-activation motifs throughout individual experiences, enabling cross-memory in- teractions. These findings reveal an organizational principle of brain networks where high and low activity cells are differentially recruited into coactivity motifs as build- ing blocks for the flexible integration and interaction of memories. Finally, I employ a set of manifold learning and related approaches to explore and characterise the complex neural population dynamics within CA1 that underlie sim- ple exploration.Open Acces

    Probabilistic Models for Genetic and Genomic Data with Missing Information

    Get PDF
    Genetic and genomic data often contain unobservable or missing information. Applications of probabilistic models such as mixture models and hidden Markov models (HMMs) have been widely used since the 1960s to make inference on unobserved information using some observed information demonstrating the versatility and importance of these models. Biological applications of mixture models include gene expression data, meta-analysis, disease mapping, epidemiology and pharmacology and applications of HMMs include gene finding, linkage analysis, phylogenetic analysis and identifying regions of identity-by-descent. An important statistical and informatics challenge posed by modern genetics is to understand the functional consequences of genetic variation and its relation to phenotypic variation. In the analysis of whole-exome sequencing data, predicting the impact of missense mutations on protein function is an important factor in identifying and determining the clinical importance of disease susceptibility mutations in the absence of independent data determining impact on disease. In addition to the interpretation, identifying co-inherited regions of related individuals with Mendelian disorders can further narrow the search for disease susceptibility mutations. In this thesis, we develop two probabilistic models in application of genetic and genomic data with missing information: 1) a mixture model to estimate a posterior probability of functionality of missense mutations and 2) a HMM to identify co-inherited regions in the exomes of related individuals. The first application combines functional predictions from available computational or {\it in silico} methods which often have a high degree of disagreement leading to conflicting results for the user to assess the pathogenic impact of missense mutations on protein function. The second application considers extensions of a first-order HMM to include conditional emission probabilities varying as a function of minor allele frequency and a second-order dependence structure between observed variant calls. We apply these models to whole-exome sequencing data and show how these models can be used to identify disease susceptibility mutations. As disease-gene identification projects increasingly use next-generation sequencing, the probabilistic models developed in this thesis help identify and associate relevant disease-causing mutations with human disorders. The purpose of this thesis is to demonstrate that probabilistic models can contribute to more accurate and dependable inference based on genetic and genomic data with missing information

    Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations

    Get PDF
    Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here

    The Telecommunications and Data Acquisition Progress Report 42-77

    Get PDF
    Activities in space communication, radio navigation, radio science, and ground-based astronomy are reported. Advanced systems for the Deep Space Network and its Ground-Communications Facility are discussed including station control and system technology. Network sustaining as well as data and information systems are covered. Studies of geodynamics, investigations of the microwave spectrum, and the search for extraterrestrial intelligence are reported

    PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics

    Get PDF
    A PHYSTAT workshop on the topic of Statistical issues for LHC physics was held at CERN. The workshop focused on issues related to discovery that we hope will be relevant to the LHC. These proceedings contain written versions of nearly all the talks, several of which were given by professional statisticians. The talks varied from general overviews, to those describing searches for specific particles. The treatment of background uncertainties figured prominently. Many of the talks describing search strategies for new effects should be of interest not only to particle physicists but also to scientists in other fields
    corecore