10,693 research outputs found
Interacting with Large Distributed Datasets using Sketch
We present Sketch, a distributed software infrastructure for building interactive tools for exploring large datasets, distributed across multiple machines. We have built three sophisticated applications using this framework: a billion-row spreadsheet, a distributed log browser, and a distributed- systems performance debugging tool. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling gracefully to large system sizes. The conflicting constraints of large-scale data and small timescales required by human interaction are difficult to satisfy simultaneously. Sketch exploits a sweet spot in this trade-off by exploiting the observation that the precision of a data view is limited by the resolution of the user?s screen. The system pushes data reduction operations to the data sources. The core Sketch abstraction provides a narrow programming interface; Sketch clients construct a distributed application by stacking modular components with identical interfaces, each providing a useful feature: network transparency, concurrency, fault-tolerance, straggler avoidance, round-trip reduction, distributed aggregation
Finding Mutated Subnetworks Associated with Survival in Cancer
Next-generation sequencing technologies allow the measurement of somatic
mutations in a large number of patients from the same cancer type. One of the
main goals in analyzing these mutations is the identification of mutations
associated with clinical parameters, such as survival time. This goal is
hindered by the genetic heterogeneity of mutations in cancer, due to the fact
that genes and mutations act in the context of pathways. To identify mutations
associated with survival time it is therefore crucial to study mutations in the
context of interaction networks.
In this work we study the problem of identifying subnetworks of a large
gene-gene interaction network that have mutations associated with survival. We
formally define the associated computational problem by using a score for
subnetworks based on the test statistic of the log-rank test, a widely used
statistical test for comparing the survival of two populations. We show that
the computational problem is NP-hard and we propose a novel algorithm, called
Network of Mutations Associated with Survival (NoMAS), to solve it. NoMAS is
based on the color-coding technique, that has been previously used in other
applications to find the highest scoring subnetwork with high probability when
the subnetwork score is additive. In our case the score is not additive;
nonetheless, we prove that under a reasonable model for mutations in cancer
NoMAS does identify the optimal solution with high probability. We test NoMAS
on simulated and cancer data, comparing it to approaches based on single gene
tests and to various greedy approaches. We show that our method does indeed
find the optimal solution and performs better than the other approaches.
Moreover, on two cancer datasets our method identifies subnetworks with
significant association to survival when none of the genes has significant
association with survival when considered in isolation.Comment: This paper was selected for oral presentation at RECOMB 2016 and an
abstract is published in the conference proceeding
Why (and How) Networks Should Run Themselves
The proliferation of networked devices, systems, and applications that we
depend on every day makes managing networks more important than ever. The
increasing security, availability, and performance demands of these
applications suggest that these increasingly difficult network management
problems be solved in real time, across a complex web of interacting protocols
and systems. Alas, just as the importance of network management has increased,
the network has grown so complex that it is seemingly unmanageable. In this new
era, network management requires a fundamentally new approach. Instead of
optimizations based on closed-form analysis of individual protocols, network
operators need data-driven, machine-learning-based models of end-to-end and
application performance based on high-level policy goals and a holistic view of
the underlying components. Instead of anomaly detection algorithms that operate
on offline analysis of network traces, operators need classification and
detection algorithms that can make real-time, closed-loop decisions. Networks
should learn to drive themselves. This paper explores this concept, discussing
how we might attain this ambitious goal by more closely coupling measurement
with real-time control and by relying on learning for inference and prediction
about a networked application or system, as opposed to closed-form analysis of
individual protocols
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
A survey of comics research in computer science
Graphical novels such as comics and mangas are well known all over the world.
The digital transition started to change the way people are reading comics,
more and more on smartphones and tablets and less and less on paper. In the
recent years, a wide variety of research about comics has been proposed and
might change the way comics are created, distributed and read in future years.
Early work focuses on low level document image analysis: indeed comic books are
complex, they contains text, drawings, balloon, panels, onomatopoeia, etc.
Different fields of computer science covered research about user interaction
and content generation such as multimedia, artificial intelligence,
human-computer interaction, etc. with different sets of values. We propose in
this paper to review the previous research about comics in computer science, to
state what have been done and to give some insights about the main outlooks
Expanded Parts Model for Semantic Description of Humans in Still Images
We introduce an Expanded Parts Model (EPM) for recognizing human attributes
(e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in
still images. An EPM is a collection of part templates which are learnt
discriminatively to explain specific scale-space regions in the images (in
human centric coordinates). This is in contrast to current models which consist
of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a
subset of the parts to score an image and scores the image sparsely in space,
i.e. it ignores redundant and random background in an image. To learn our
model, we propose an algorithm which automatically mines parts and learns
corresponding discriminative templates together with their respective locations
from a large number of candidate parts. We validate our method on three recent
challenging datasets of human attributes and actions. We obtain convincing
qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
- …