12 research outputs found

    Machine Learning Applied for Spectra Classification

    Get PDF

    Markov Models for Network-Behavior Modeling and Anonymization

    Get PDF
    Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks -- particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymity-by-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network "behavior" and introduce a kernel-based framework for anonymity which fits together naturally with network-data modeling

    Bayesian Classification of Flight Calls with a Novel Dynamic Time Warping Kernel

    Full text link
    Abstract—In this paper we propose a probabilistic classifi-cation algorithm with a novel Dynamic Time Warping (DTW) kernel to automatically recognize flight calls of different species of birds. The performance of the method on a real world dataset of warbler (Parulidae) flight calls is competitive to human expert recognition levels and outperforms other classifiers trained on a variety of feature extraction approaches. In addition we offer a novel and intuitive DTW kernel formulation which is positive semi-definite in contrast with previous work. Finally we obtain promising results with a larger dataset of multiple species that we can handle efficiently due to the explicit multiclass probit likelihood of the proposed approach1

    ON MARKOV AND HIDDEN MARKOV MODELS WITH APPLICATIONS TO TRAJECTORIES

    Get PDF
    Markov and hidden Markov models (HMMs) provide a special angle to characterize trajectories using their state transition patterns. Distinct from Markov models, HMMs assume that an unobserved sequence governs the observed sequence and the Markovian property is imposed on the hidden chain rather than the observed one. In the first part of this dissertation, we develop a model for HMMs with exponential family distribution and extend it to incorporate covariates. We call it HMM-GLM, for which we propose a joint model selection method. The proposed selection criterion is tailored for HMM-GLM aiming at a more accurate approximation of the Kullback-Leibler divergence; we seek improvement of the widely-used Akaike information criterion. The second and the third parts of this dissertation are about clustering trajectories with HMMs and Markov mixture models. The research interests for HMM clustering are to develop a less computationally expensive and more interpretable algorithm for HMM sequence clustering problem, based on the emission and transition features of the chains. We propose an efficient clustering method using Bhattacharyya affinity to measure the pairwise similarity between sequences and apply a spectral clustering algorithm to obtain the cluster assignment. The computational efficiency benefits from the fact that our method avoids iterative computation for the affinity of a pair of sequences. Meanwhile, both simulation and empirical studies show that the proposed algorithm maintains good performance compared to other similar methods. In the third part of the dissertation, we address a study of the course of children and adolescents with bipolar disorder. Measuring and making sense of the fluctuations in different moods over time is challenging. We use a Markov mixture model with different transition matrices to find homogeneous clusters and capture different longitudinal mood change patterns. We also conduct a simulation study to investigate the performance of the model when there are violations of model assumptions. The results show that this model is fairly robust in the tested situations. We find that the clusters separate out those who tend to stay in a mood state from those who fluctuate between mood states more frequently

    Modelling Digital Media Objects

    Get PDF

    An Unsupervised Cluster: Learning Water Customer Behavior Using Variation of Information on a Reconstructed Phase Space

    Get PDF
    The unsupervised clustering algorithm described in this dissertation addresses the need to divide a population of water utility customers into groups based on their similarities and differences, using only the measured flow data collected by water meters. After clustering, the groups represent customers with similar consumption behavior patterns and provide insight into ‘normal’ and ‘unusual’ customer behavior patterns. This research focuses upon individually metered water utility customers and includes both residential and commercial customer accounts serviced by utilities within North America. The contributions of this dissertation not only represent a novel academic work, but also solve a practical problem for the utility industry. This dissertation introduces a method of agglomerative clustering using information theoretic distance measures on Gaussian mixture models within a reconstructed phase space. The clustering method accommodates a utility’s limited human, financial, computational, and environmental resources. The proposed weighted variation of information distance measure for comparing Gaussian mixture models places emphasis upon those behaviors whose statistical distributions are more compact over those behaviors with large variation and contributes a novel addition to existing comparison options
    corecore