192 research outputs found
The number of clone orderings
AbstractThis article provides an exponential generating function and a recurrence relation for computing the number of topologically distinct clone orderings.Denote the number of maps for n clones by c(n) and define the exponential generating function C(x)=∑n=0∞c(n)xnn!.We show c(1) = 1, c(2) = 2, c(3) = 10, and, for n > 3, that c(n) = (4n−5)c(n−1)−(4n−7) c(n−2) + (n−2)c(n−3). We show c(n)∼ e3828n(4ne)n We also prove that C(x) = exp (1+ 2x−1−4x)4
Discovering Hierarchical Process Models: an Approach Based on Events Clustering
Process mining is a field of computer science that deals with discovery and
analysis of process models based on automatically generated event logs.
Currently, many companies use this technology for optimization and improving
their processes. However, a discovered process model may be too detailed,
sophisticated and difficult for experts to understand. In this paper, we
consider the problem of discovering a hierarchical business process model from
a low-level event log, i.e., the problem of automatic synthesis of more
readable and understandable process models based on information stored in event
logs of information systems.
Discovery of better structured and more readable process models is
intensively studied in the frame of process mining research from different
perspectives. In this paper, we present an algorithm for discovering
hierarchical process models represented as two-level workflow nets. The
algorithm is based on predefined event ilustering so that the cluster defines a
sub-process corresponding to a high-level transition at the top level of the
net. Unlike existing solutions, our algorithm does not impose restrictions on
the process control flow and allows for concurrency and iteration
Automatic layout and visualization of biclusters
BACKGROUND: Biclustering has emerged as a powerful algorithmic tool for analyzing measurements of gene expression. A number of different methods have emerged for computing biclusters in gene expression data. Many of these algorithms may output a very large number of biclusters with varying degrees of overlap. There are no systematic methods that create a two-dimensional layout of the computed biclusters and display overlaps between them. RESULTS: We develop a novel algorithm for laying out biclusters in a two-dimensional matrix whose rows (respectively, columns) are rows (respectively, columns) of the original dataset. We display each bicluster as a contiguous submatrix in the layout. We allow the layout to have repeated rows and/or columns from the original matrix as required, but we seek a layout of the smallest size. We also develop a web-based search interface for the user to query the genes and samples of interest and visualise the layout of biclusters matching the queries. CONCLUSION: We demonstrate the usefulness of our approach on gene expression data for two types of leukaemia and on protein-DNA binding data for two growth conditions in Saccharomyces cerevisiae. The software implementing the layout algorithm is available at
How Hard Is It to Control an Election by Breaking Ties?
We study the computational complexity of controlling the result of an
election by breaking ties strategically. This problem is equivalent to the
problem of deciding the winner of an election under parallel universes
tie-breaking. When the chair of the election is only asked to break ties to
choose between one of the co-winners, the problem is trivially easy. However,
in multi-round elections, we prove that it can be NP-hard for the chair to
compute how to break ties to ensure a given result. Additionally, we show that
the form of the tie-breaking function can increase the opportunities for
control. Indeed, we prove that it can be NP-hard to control an election by
breaking ties even with a two-stage voting rule.Comment: Revised and expanded version including longer proofs and additional
result
Castell: a heterogeneous cmp architecture scalable to hundreds of processors
Technology improvements and power constrains have taken multicore architectures to dominate
microprocessor designs over uniprocessors. At the same time, accelerator based architectures
have shown that heterogeneous multicores are very efficient and can provide high throughput for
parallel applications, but with a high-programming effort. We propose Castell a scalable chip
multiprocessor architecture that can be programmed as uniprocessors, and provides the high
throughput of accelerator-based architectures.
Castell relies on task-based programming models that simplify software development. These
models use a runtime system that dynamically finds, schedules, and adds hardware-specific features
to parallel tasks. One of these features is DMA transfers to overlap computation and data
movement, which is known as double buffering. This feature allows applications on Castell
to tolerate large memory latencies and lets us design the memory system focusing on memory
bandwidth.
In addition to provide programmability and the design of the memory system, we have used
a hierarchical NoC and added a synchronization module. The NoC design distributes memory
traffic efficiently to allow the architecture to scale. The synchronization module is a consequence
of the large performance degradation of application for large synchronization latencies.
Castell is mainly an architecture framework that enables the definition of domain-specific
implementations, fine-tuned to a particular problem or application. So far, Castell has been
successfully used to propose heterogeneous multicore architectures for scientific kernels, video
decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW).
It has also been used to explore a number of architecture optimizations such as enhanced DMA
controllers, and architecture support for task-based programming models.
ii
Scalable detection of semantic clones
Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments
Inferring clonal evolution of tumors from single nucleotide somatic mutations
High-throughput sequencing allows the detection and quantification of
frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor
cell populations. In some cases, the evolutionary history and population
frequency of the subclonal lineages of tumor cells present in the sample can be
reconstructed from these SNV frequency measurements. However, automated methods
to do this reconstruction are not available and the conditions under which
reconstruction is possible have not been described.
We describe the conditions under which the evolutionary history can be
uniquely reconstructed from SNV frequencies from single or multiple samples
from the tumor population and we introduce a new statistical model, PhyloSub,
that infers the phylogeny and genotype of the major subclonal lineages
represented in the population of cancer cells. It uses a Bayesian nonparametric
prior over trees that groups SNVs into major subclonal lineages and
automatically estimates the number of lineages and their ancestry. We sample
from the joint posterior distribution over trees to identify evolutionary
histories and cell population frequencies that have the highest probability of
generating the observed SNV frequency data. When multiple phylogenies are
consistent with a given set of SNV frequencies, PhyloSub represents the
uncertainty in the tumor phylogeny using a partial order plot. Experiments on a
simulated dataset and two real datasets comprising tumor samples from acute
myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that
PhyloSub can infer both linear (or chain) and branching lineages and its
inferences are in good agreement with ground truth, where it is available
A de novo Genome of a Chinese Radish Cultivar
AbstractHere, we report a high-quality draft genome of a Chinese radish (Raphanus sativus) cultivar. This draft contains 387.73Mb of assembled scaffolds, 83.93% of the scaffolds were anchored onto nine pseudochromosomes and 95.09% of 43 240 protein-coding genes were functionally annotated. 184.75Mb (47.65%) of repeat sequences was identified in the assembled genome. By comparative analyses of the radish genome against 10 other plant genomes, 2 275 genes in 780 gene families were found unique to R. sativus. This genome is a good reference for genomic study and of great value for genetic improvement of radish
First Class Copy & Paste
The Subtext project seeks to make programming fundamentally easier by altering the nature of programming languages and tools. This paper defines an operational semantics for an essential subset of the Subtext language. It also presents a fresh approach to the problems of mutable state, I/O, and concurrency.Inclusions reify copy & paste edits into persistent relationships that propagate changes from their source into their destination. Inclusions formulate a programming language in which there is no distinction between a programÂs representation and its execution. Like spreadsheets, programs are live executions within a persistent runtime, and programming is direct manipulation of these executions via a graphical user interface. There is no need to encode programs into source text.Mutation of state is effected by the computation of hypothetical recursive variants of the state, which can then be lifted into new versions of the state. Transactional concurrency is based upon queued single-threaded execution. Speculative execution of queued hypotheticals provides concurrency as a semantically transparent implementation optimization
- …