17,272 research outputs found
A Two-step Statistical Approach for Inferring Network Traffic Demands (Revises Technical Report BUCS-2003-003)
Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study a two-step approach for inferring network traffic demands. First we elaborate and evaluate a modeling approach for generating good starting points to be fed to iterative statistical inference techniques. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we evaluate and compare alternative mechanisms for generating starting points and the convergence characteristics of our EM algorithm against a recently proposed Weighted Least Squares approach.National Science Foundation (ANI-0095988, EIA-0202067, ITR ANI-0205294
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data
Deducing the structure of neural circuits is one of the central problems of
modern neuroscience. Recently-introduced calcium fluorescent imaging methods
permit experimentalists to observe network activity in large populations of
neurons, but these techniques provide only indirect observations of neural
spike trains, with limited time resolution and signal quality. In this work we
present a Bayesian approach for inferring neural circuitry given this type of
imaging data. We model the network activity in terms of a collection of coupled
hidden Markov chains, with each chain corresponding to a single neuron in the
network and the coupling between the chains reflecting the network's
connectivity matrix. We derive a Monte Carlo Expectation--Maximization
algorithm for fitting the model parameters; to obtain the sufficient statistics
in a computationally-efficient manner, we introduce a specialized
blockwise-Gibbs algorithm for sampling from the joint activity of all observed
neurons given the observed fluorescence data. We perform large-scale
simulations of randomly connected neuronal networks with biophysically
realistic parameters and find that the proposed methods can accurately infer
the connectivity in these networks given reasonable experimental and
computational constraints. In addition, the estimation accuracy may be improved
significantly by incorporating prior knowledge about the sparseness of
connectivity in the network, via standard L penalization methods.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS303 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
PowerSpy: Location Tracking using Mobile Device Power Analysis
Modern mobile platforms like Android enable applications to read aggregate
power usage on the phone. This information is considered harmless and reading
it requires no user permission or notification. We show that by simply reading
the phone's aggregate power consumption over a period of a few minutes an
application can learn information about the user's location. Aggregate phone
power consumption data is extremely noisy due to the multitude of components
and applications that simultaneously consume power. Nevertheless, by using
machine learning algorithms we are able to successfully infer the phone's
location. We discuss several ways in which this privacy leak can be remedied.Comment: Usenix Security 201
Active Learning of Multiple Source Multiple Destination Topologies
We consider the problem of inferring the topology of a network with
sources and receivers (hereafter referred to as an -by- network), by
sending probes between the sources and receivers. Prior work has shown that
this problem can be decomposed into two parts: first, infer smaller subnetwork
components (i.e., -by-'s or -by-'s) and then merge these components
to identify the -by- topology. In this paper, we focus on the second
part, which had previously received less attention in the literature. In
particular, we assume that a -by- topology is given and that all
-by- components can be queried and learned using end-to-end probes. The
problem is which -by-'s to query and how to merge them with the given
-by-, so as to exactly identify the -by- topology, and optimize a
number of performance metrics, including the number of queries (which directly
translates into measurement bandwidth), time complexity, and memory usage. We
provide a lower bound, , on the number of
-by-'s required by any active learning algorithm and propose two greedy
algorithms. The first algorithm follows the framework of multiple hypothesis
testing, in particular Generalized Binary Search (GBS), since our problem is
one of active learning, from -by- queries. The second algorithm is called
the Receiver Elimination Algorithm (REA) and follows a bottom-up approach: at
every step, it selects two receivers, queries the corresponding -by-, and
merges it with the given -by-; it requires exactly steps, which is
much less than all possible -by-'s. Simulation results
over synthetic and realistic topologies demonstrate that both algorithms
correctly identify the -by- topology and are near-optimal, but REA is
more efficient in practice
Understanding Internet topology: principles, models, and validation
Building on a recent effort that combines a first-principles approach to modeling router-level connectivity with a more pragmatic use of statistics and graph theory, we show in this paper that for the Internet, an improved understanding of its physical infrastructure is possible by viewing the physical connectivity as an annotated graph that delivers raw connectivity and bandwidth to the upper layers in the TCP/IP protocol stack, subject to practical constraints (e.g., router technology) and economic considerations (e.g., link costs). More importantly, by relying on data from Abilene, a Tier-1 ISP, and the Rocketfuel project, we provide empirical evidence in support of the proposed approach and its consistency with networking reality. To illustrate its utility, we: 1) show that our approach provides insight into the origin of high variability in measured or inferred router-level maps; 2) demonstrate that it easily accommodates the incorporation of additional objectives of network design (e.g., robustness to router failure); and 3) discuss how it complements ongoing community efforts to reverse-engineer the Internet
Data based identification and prediction of nonlinear and complex dynamical systems
We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin
- …