9,543 research outputs found
Scalable Approach to Uncertainty Quantification and Robust Design of Interconnected Dynamical Systems
Development of robust dynamical systems and networks such as autonomous
aircraft systems capable of accomplishing complex missions faces challenges due
to the dynamically evolving uncertainties coming from model uncertainties,
necessity to operate in a hostile cluttered urban environment, and the
distributed and dynamic nature of the communication and computation resources.
Model-based robust design is difficult because of the complexity of the hybrid
dynamic models including continuous vehicle dynamics, the discrete models of
computations and communications, and the size of the problem. We will overview
recent advances in methodology and tools to model, analyze, and design robust
autonomous aerospace systems operating in uncertain environment, with stress on
efficient uncertainty quantification and robust design using the case studies
of the mission including model-based target tracking and search, and trajectory
planning in uncertain urban environment. To show that the methodology is
generally applicable to uncertain dynamical systems, we will also show examples
of application of the new methods to efficient uncertainty quantification of
energy usage in buildings, and stability assessment of interconnected power
networks
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
Manitest: Are classifiers really invariant?
Invariance to geometric transformations is a highly desirable property of
automatic classifiers in many image recognition tasks. Nevertheless, it is
unclear to which extent state-of-the-art classifiers are invariant to basic
transformations such as rotations and translations. This is mainly due to the
lack of general methods that properly measure such an invariance. In this
paper, we propose a rigorous and systematic approach for quantifying the
invariance to geometric transformations of any classifier. Our key idea is to
cast the problem of assessing a classifier's invariance as the computation of
geodesics along the manifold of transformed images. We propose the Manitest
method, built on the efficient Fast Marching algorithm to compute the
invariance of classifiers. Our new method quantifies in particular the
importance of data augmentation for learning invariance from data, and the
increased invariance of convolutional neural networks with depth. We foresee
that the proposed generic tool for measuring invariance to a large class of
geometric transformations and arbitrary classifiers will have many applications
for evaluating and comparing classifiers based on their invariance, and help
improving the invariance of existing classifiers.Comment: BMVC 201
Towards Structural Classification of Proteins based on Contact Map Overlap
A multitude of measures have been proposed to quantify the similarity between
protein 3-D structure. Among these measures, contact map overlap (CMO)
maximization deserved sustained attention during past decade because it offers
a fine estimation of the natural homology relation between proteins. Despite
this large involvement of the bioinformatics and computer science community,
the performance of known algorithms remains modest. Due to the complexity of
the problem, they got stuck on relatively small instances and are not
applicable for large scale comparison. This paper offers a clear improvement
over past methods in this respect. We present a new integer programming model
for CMO and propose an exact B &B algorithm with bounds computed by solving
Lagrangian relaxation. The efficiency of the approach is demonstrated on a
popular small benchmark (Skolnick set, 40 domains). On this set our algorithm
significantly outperforms the best existing exact algorithms, and yet provides
lower and upper bounds of better quality. Some hard CMO instances have been
solved for the first time and within reasonable time limits. From the values of
the running time and the relative gap (relative difference between upper and
lower bounds), we obtained the right classification for this test. These
encouraging result led us to design a harder benchmark to better assess the
classification capability of our approach. We constructed a large scale set of
300 protein domains (a subset of ASTRAL database) that we have called Proteus
300. Using the relative gap of any of the 44850 couples as a similarity
measure, we obtained a classification in very good agreement with SCOP. Our
algorithm provides thus a powerful classification tool for large structure
databases
Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation
Myriad of graph-based algorithms in machine learning and data mining require
parsing relational data iteratively. These algorithms are implemented in a
large-scale distributed environment in order to scale to massive data sets. To
accelerate these large-scale graph-based iterative computations, we propose
delta-based accumulative iterative computation (DAIC). Different from
traditional iterative computations, which iteratively update the result based
on the result from the previous iteration, DAIC updates the result by
accumulating the "changes" between iterations. By DAIC, we can process only the
"changes" to avoid the negligible updates. Furthermore, we can perform DAIC
asynchronously to bypass the high-cost synchronous barriers in heterogeneous
distributed environments. Based on the DAIC model, we design and implement an
asynchronous graph processing framework, Maiter. We evaluate Maiter on local
cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves
as much as 60x speedup over Hadoop and outperforms other state-of-the-art
frameworks.Comment: ScienceCloud 2012, TKDE 201
Effective and Efficient Similarity Index for Link Prediction of Complex Networks
Predictions of missing links of incomplete networks like protein-protein
interaction networks or very likely but not yet existent links in evolutionary
networks like friendship networks in web society can be considered as a
guideline for further experiments or valuable information for web users. In
this paper, we introduce a local path index to estimate the likelihood of the
existence of a link between two nodes. We propose a network model with
controllable density and noise strength in generating links, as well as collect
data of six real networks. Extensive numerical simulations on both modeled
networks and real networks demonstrated the high effectiveness and efficiency
of the local path index compared with two well-known and widely used indices,
the common neighbors and the Katz index. Indeed, the local path index provides
competitively accurate predictions as the Katz index while requires much less
CPU time and memory space, which is therefore a strong candidate for potential
practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table
- …