276 research outputs found
JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition
This paper proposes a novel algorithm to reassemble an arbitrarily shredded
image to its original status. Existing reassembly pipelines commonly consist of
a local matching stage and a global compositions stage. In the local stage, a
key challenge in fragment reassembly is to reliably compute and identify
correct pairwise matching, for which most existing algorithms use handcrafted
features, and hence, cannot reliably handle complicated puzzles. We build a
deep convolutional neural network to detect the compatibility of a pairwise
stitching, and use it to prune computed pairwise matches. To improve the
network efficiency and accuracy, we transfer the calculation of CNN to the
stitching region and apply a boost training strategy. In the global composition
stage, we modify the commonly adopted greedy edge selection strategies to two
new loop closure based searching algorithms. Extensive experiments show that
our algorithm significantly outperforms existing methods on solving various
puzzles, especially those challenging ones with many fragment pieces
SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine
Client-side logic and storage are increasingly used in web and mobile
applications to improve response time and availability. Current approaches tend
to be ad-hoc and poorly integrated with the server-side logic. We present a
principled approach to integrate client- and server-side storage. We support
mergeable and strongly consistent transactions that target either client or
server replicas and provide access to causally-consistent snapshots
efficiently. In the presence of infrastructure faults, a client-assisted
failover solution allows client execution to resume immediately and seamlessly
access consistent snapshots without waiting. We implement this approach in
SwiftCloud, the first transactional system to bring geo-replication all the way
to the client machine. Example applications show that our programming model is
useful across a range of application areas. Our experimental evaluation shows
that SwiftCloud provides better fault tolerance and at the same time can
improve both latency and throughput by up to an order of magnitude, compared to
classical geo-replication techniques
Spectral Methods for Learning Multivariate Latent Tree Structure
This work considers the problem of learning the structure of multivariate
linear tree models, which include a variety of directed tree graphical models
with continuous, discrete, and mixed latent variables such as linear-Gaussian
models, hidden Markov models, Gaussian mixture models, and Markov evolutionary
trees. The setting is one where we only have samples from certain observed
variables in the tree, and our goal is to estimate the tree structure (i.e.,
the graph of how the underlying hidden variables are connected to each other
and to the observed variables). We propose the Spectral Recursive Grouping
algorithm, an efficient and simple bottom-up procedure for recovering the tree
structure from independent samples of the observed variables. Our finite sample
size bounds for exact recovery of the tree structure reveal certain natural
dependencies on underlying statistical and structural properties of the
underlying joint distribution. Furthermore, our sample complexity guarantees
have no explicit dependence on the dimensionality of the observed variables,
making the algorithm applicable to many high-dimensional settings. At the heart
of our algorithm is a spectral quartet test for determining the relative
topology of a quartet of variables from second-order statistics
Convergent types for shared memory
Dissertação de mestrado em Computer ScienceIt is well-known that consistency in shared memory concurrent programming comes with
the price of degrading performance and scalability. Some of the existing solutions to this
problem end up with high-level complexity and are not programmer friendly.
We present a simple and well-defined approach to obtain relevant results for shared memory
environments through relaxing synchronization. For that, we will look into Mergeable
Data Types, data structures analogous to Conflict-Free Replicated Data Types but designed to
perform in shared memory.
CRDTs were the first formal approach engaging a solid theoretical study about eventual
consistency on distributed systems, answering the CAP Theorem problem and providing
high-availability. With CRDTs, updates are unsynchronized, and replicas eventually converge
to a correct common state. However, CRDTs are not designed to perform in shared
memory. In large-scale distributed systems the merge cost is negligible when compared to
network mediated synchronization. Therefore, we have migrated the concept by developing
the already existent Mergeable Data Types through formally defining a programming
model that we named Global-Local View. Furthermore, we have created a portfolio of MDTs
and demonstrated that in the appropriated scenarios we can largely benefit from the model.É bem sabido que para garantir coerência em programas concorrentes num ambiente de
memória partilhada sacrifica-se performance e escalabilidade. Alguns dos métodos existentes
para garantirem resultados significativos introduzem uma elevada complexidade e
não são práticos.
O nosso objetivo é o de garantir uma abordagem simples e bem definida de alcançar
resultados notáveis em ambientes de memória partilhada, quando comparados com os
métodos existentes, relaxando a coerência. Para tal, vamos analisar o conceito de Mergeable
Data Type, estruturas análogas aos Conflict-Free Replicated Data Types mas concebidas para
memória partilhada.
CRDTs foram a primeira abordagem a desenvolver um estudo formal sobre eventual consistency,
respondendo ao problema descrito no CAP Theorem e garantindo elevada disponibilidade.
Com CRDTs os updates não são sÃncronos e as réplicas convergem eventualmente
para um estado correto e comum. No entanto, não foram concebidos para atuar
em memória partilhada. Em sistemas distribuÃdos de larga escala o custo da operação
de merge é negligenciável quando comparado com a sincronização global. Portanto, migramos
o conceito desenvolvendo os já existentes Mergeable Data Type através da criação
de uma formalização de um modelo de programação ao qual chamamos de Global-Local
View. Além do mais, criamos um portfolio de MDTs e demonstramos que nos cenários
apropriados podemos beneficiar largamente do modelo
A graph-based cell tracking algorithm with few manually tunable parameters and automated segmentation error correction
Automatic cell segmentation and tracking enables to gain quantitative insights into the processes driving cell migration. To investigate new data with minimal manual effort, cell tracking algorithms should be easy to apply and reduce manual curation time by providing automatic correction of segmentation errors. Current cell tracking algorithms, however, are either easy to apply to new data sets but lack automatic segmentation error correction, or have a vast set of parameters that needs either manual tuning or annotated data for parameter tuning. In this work, we propose a tracking algorithm with only few manually tunable parameters and automatic segmentation error correction. Moreover, no training data is needed. We compare the performance of our approach to three well-performing tracking algorithms from the Cell Tracking Challenge on data sets with simulated, degraded segmentation—including false negatives, over- and under-segmentation errors. Our tracking algorithm can correct false negatives, over- and under-segmentation errors as well as a mixture of the aforementioned segmentation errors. On data sets with under-segmentation errors or a mixture of segmentation errors our approach performs best. Moreover, without requiring additional manual tuning, our approach ranks several times in the top 3 on the 6(th) edition of the Cell Tracking Challenge
Dynamic Package Interfaces - Extended Version
A hallmark of object-oriented programming is the ability to perform
computation through a set of interacting objects. A common manifestation of
this style is the notion of a package, which groups a set of commonly used
classes together. A challenge in using a package is to ensure that a client
follows the implicit protocol of the package when calling its methods.
Violations of the protocol can cause a runtime error or latent invariant
violations. These protocols can extend across different, potentially
unboundedly many, objects, and are specified informally in the documentation.
As a result, ensuring that a client does not violate the protocol is hard.
We introduce dynamic package interfaces (DPI), a formalism to explicitly
capture the protocol of a package. The DPI of a package is a finite set of
rules that together specify how any set of interacting objects of the package
can evolve through method calls and under what conditions an error can happen.
We have developed a dynamic tool that automatically computes an approximation
of the DPI of a package, given a set of abstraction predicates. A key property
of DPI is that the unbounded number of configurations of objects of a package
are summarized finitely in an abstract domain. This uses the observation that
many packages behave monotonically: the semantics of a method call over a
configuration does not essentially change if more objects are added to the
configuration. We have exploited monotonicity and have devised heuristics to
obtain succinct yet general DPIs. We have used our tool to compute DPIs for
several commonly used Java packages with complex protocols, such as JDBC,
HashSet, and ArrayList.Comment: The only changes compared to v1 are improvements to the Abstract and
Introductio
Hillview:A trillion-cell spreadsheet for big data
Hillview is a distributed spreadsheet for browsing very large datasets that
cannot be handled by a single machine. As a spreadsheet, Hillview provides a
high degree of interactivity that permits data analysts to explore information
quickly along many dimensions while switching visualizations on a whim. To
provide the required responsiveness, Hillview introduces visualization
sketches, or vizketches, as a simple idea to produce compact data
visualizations. Vizketches combine algorithmic techniques for data
summarization with computer graphics principles for efficient rendering. While
simple, vizketches are effective at scaling the spreadsheet by parallelizing
computation, reducing communication, providing progressive visualizations, and
offering precise accuracy guarantees. Using Hillview running on eight servers,
we can navigate and visualize datasets of tens of billions of rows and
trillions of cells, much beyond the published capabilities of competing
systems
Allocation of Excitation Signals for Generic Identifiability of Linear Dynamic Networks
A recent research direction in data-driven modeling is the identification of
dynamic networks, in which measured vertex signals are interconnected by
dynamic edges represented by causal linear transfer functions. The major
question addressed in this paper is where to allocate external excitation
signals such that a network model set becomes generically identifiable when
measuring all vertex signals. To tackle this synthesis problem, a novel graph
structure, referred to as \textit{directed pseudotree}, is introduced, and the
generic identifiability of a network model set can be featured by a set of
disjoint directed pseudotrees that cover all the parameterized edges of an
\textit{extended graph}, which includes the correlation structure of the
process noises. Thereby, an algorithmic procedure is devised, aiming to
decompose the extended graph into a minimal number of disjoint pseudotrees,
whose roots then provide the appropriate locations for excitation signals.
Furthermore, the proposed approach can be adapted using the notion of
\textit{anti-pseudotrees} to solve a dual problem, that is to select a minimal
number of measurement signals for generic identifiability of the overall
network, under the assumption that all the vertices are excited
- …