17,272 research outputs found

    A Two-step Statistical Approach for Inferring Network Traffic Demands (Revises Technical Report BUCS-2003-003)

    Full text link
    Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study a two-step approach for inferring network traffic demands. First we elaborate and evaluate a modeling approach for generating good starting points to be fed to iterative statistical inference techniques. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we evaluate and compare alternative mechanisms for generating starting points and the convergence characteristics of our EM algorithm against a recently proposed Weighted Least Squares approach.National Science Foundation (ANI-0095988, EIA-0202067, ITR ANI-0205294

    Nonparametric Feature Extraction from Dendrograms

    Full text link
    We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

    A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data

    Full text link
    Deducing the structure of neural circuits is one of the central problems of modern neuroscience. Recently-introduced calcium fluorescent imaging methods permit experimentalists to observe network activity in large populations of neurons, but these techniques provide only indirect observations of neural spike trains, with limited time resolution and signal quality. In this work we present a Bayesian approach for inferring neural circuitry given this type of imaging data. We model the network activity in terms of a collection of coupled hidden Markov chains, with each chain corresponding to a single neuron in the network and the coupling between the chains reflecting the network's connectivity matrix. We derive a Monte Carlo Expectation--Maximization algorithm for fitting the model parameters; to obtain the sufficient statistics in a computationally-efficient manner, we introduce a specialized blockwise-Gibbs algorithm for sampling from the joint activity of all observed neurons given the observed fluorescence data. We perform large-scale simulations of randomly connected neuronal networks with biophysically realistic parameters and find that the proposed methods can accurately infer the connectivity in these networks given reasonable experimental and computational constraints. In addition, the estimation accuracy may be improved significantly by incorporating prior knowledge about the sparseness of connectivity in the network, via standard L1_1 penalization methods.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS303 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    PowerSpy: Location Tracking using Mobile Device Power Analysis

    Full text link
    Modern mobile platforms like Android enable applications to read aggregate power usage on the phone. This information is considered harmless and reading it requires no user permission or notification. We show that by simply reading the phone's aggregate power consumption over a period of a few minutes an application can learn information about the user's location. Aggregate phone power consumption data is extremely noisy due to the multitude of components and applications that simultaneously consume power. Nevertheless, by using machine learning algorithms we are able to successfully infer the phone's location. We discuss several ways in which this privacy leak can be remedied.Comment: Usenix Security 201

    Active Learning of Multiple Source Multiple Destination Topologies

    Get PDF
    We consider the problem of inferring the topology of a network with MM sources and NN receivers (hereafter referred to as an MM-by-NN network), by sending probes between the sources and receivers. Prior work has shown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e., 11-by-NN's or 22-by-22's) and then merge these components to identify the MM-by-NN topology. In this paper, we focus on the second part, which had previously received less attention in the literature. In particular, we assume that a 11-by-NN topology is given and that all 22-by-22 components can be queried and learned using end-to-end probes. The problem is which 22-by-22's to query and how to merge them with the given 11-by-NN, so as to exactly identify the 22-by-NN topology, and optimize a number of performance metrics, including the number of queries (which directly translates into measurement bandwidth), time complexity, and memory usage. We provide a lower bound, N2\lceil \frac{N}{2} \rceil, on the number of 22-by-22's required by any active learning algorithm and propose two greedy algorithms. The first algorithm follows the framework of multiple hypothesis testing, in particular Generalized Binary Search (GBS), since our problem is one of active learning, from 22-by-22 queries. The second algorithm is called the Receiver Elimination Algorithm (REA) and follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding 22-by-22, and merges it with the given 11-by-NN; it requires exactly N1N-1 steps, which is much less than all (N2)\binom{N}{2} possible 22-by-22's. Simulation results over synthetic and realistic topologies demonstrate that both algorithms correctly identify the 22-by-NN topology and are near-optimal, but REA is more efficient in practice

    Understanding Internet topology: principles, models, and validation

    Get PDF
    Building on a recent effort that combines a first-principles approach to modeling router-level connectivity with a more pragmatic use of statistics and graph theory, we show in this paper that for the Internet, an improved understanding of its physical infrastructure is possible by viewing the physical connectivity as an annotated graph that delivers raw connectivity and bandwidth to the upper layers in the TCP/IP protocol stack, subject to practical constraints (e.g., router technology) and economic considerations (e.g., link costs). More importantly, by relying on data from Abilene, a Tier-1 ISP, and the Rocketfuel project, we provide empirical evidence in support of the proposed approach and its consistency with networking reality. To illustrate its utility, we: 1) show that our approach provides insight into the origin of high variability in measured or inferred router-level maps; 2) demonstrate that it easily accommodates the incorporation of additional objectives of network design (e.g., robustness to router failure); and 3) discuss how it complements ongoing community efforts to reverse-engineer the Internet

    Data based identification and prediction of nonlinear and complex dynamical systems

    Get PDF
    We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin
    corecore