17,699 research outputs found
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
BeSpaceD: Towards a Tool Framework and Methodology for the Specification and Verification of Spatial Behavior of Distributed Software Component Systems
In this report, we present work towards a framework for modeling and checking
behavior of spatially distributed component systems. Design goals of our
framework are the ability to model spatial behavior in a component oriented,
simple and intuitive way, the possibility to automatically analyse and verify
systems and integration possibilities with other modeling and verification
tools. We present examples and the verification steps necessary to prove
properties such as range coverage or the absence of collisions between
components and technical details
Scenic: A Language for Scenario Specification and Scene Generation
We propose a new probabilistic programming language for the design and
analysis of perception systems, especially those based on machine learning.
Specifically, we consider the problems of training a perception system to
handle rare events, testing its performance under different conditions, and
debugging failures. We show how a probabilistic programming language can help
address these problems by specifying distributions encoding interesting types
of inputs and sampling these to generate specialized training and test sets.
More generally, such languages can be used for cyber-physical systems and
robotics to write environment models, an essential prerequisite to any formal
analysis. In this paper, we focus on systems like autonomous cars and robots,
whose environment is a "scene", a configuration of physical objects and agents.
We design a domain-specific language, Scenic, for describing "scenarios" that
are distributions over scenes. As a probabilistic programming language, Scenic
allows assigning distributions to features of the scene, as well as
declaratively imposing hard and soft constraints over the scene. We develop
specialized techniques for sampling from the resulting distribution, taking
advantage of the structure provided by Scenic's domain-specific syntax.
Finally, we apply Scenic in a case study on a convolutional neural network
designed to detect cars in road images, improving its performance beyond that
achieved by state-of-the-art synthetic data generation methods.Comment: 41 pages, 36 figures. Full version of a PLDI 2019 paper (extending UC
Berkeley EECS Department Tech Report No. UCB/EECS-2018-8
Scalable Mining of Common Routes in Mobile Communication Network Traffic Data
A probabilistic method for inferring common routes from mobile communication network traffic data is presented. Besides providing mobility information, valuable in a multitude of application areas, the method has the dual purpose of enabling efficient coarse-graining as well as anonymisation by mapping individual sequences onto common routes. The approach is to represent spatial trajectories by Cell ID sequences that are grouped into routes using locality-sensitive hashing and graph clustering. The method is demonstrated to be scalable, and to accurately group sequences using an evaluation set of GPS tagged data
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Practical applications of probabilistic model checking to communication protocols
Probabilistic model checking is a formal verification technique for the analysis of systems that exhibit stochastic behaviour. It has been successfully employed in an extremely wide array of application domains including, for example, communication and multimedia protocols, security and power management. In this chapter we focus on the applicability of these techniques to the analysis of communication protocols. An analysis of the performance of such systems must successfully incorporate several crucial aspects, including concurrency between multiple components, real-time constraints and randomisation. Probabilistic model checking, in particular using probabilistic timed automata, is well suited to such an analysis. We provide an overview of this area, with emphasis on an industrially relevant case study: the IEEE 802.3 (CSMA/CD) protocol. We also discuss two contrasting approaches to the implementation of probabilistic model checking, namely those based on numerical computation and those based on discrete-event simulation. Using results from the two tools PRISM and APMC, we summarise the advantages, disadvantages and trade-offs associated with these techniques
- …