649 research outputs found
Sum-of-squares lower bounds for planted clique
Finding cliques in random graphs and the closely related "planted" clique
variant, where a clique of size k is planted in a random G(n, 1/2) graph, have
been the focus of substantial study in algorithm design. Despite much effort,
the best known polynomial-time algorithms only solve the problem for k ~
sqrt(n).
In this paper we study the complexity of the planted clique problem under
algorithms from the Sum-of-squares hierarchy. We prove the first average case
lower bound for this model: for almost all graphs in G(n,1/2), r rounds of the
SOS hierarchy cannot find a planted k-clique unless k > n^{1/2r} (up to
logarithmic factors). Thus, for any constant number of rounds planted cliques
of size n^{o(1)} cannot be found by this powerful class of algorithms. This is
shown via an integrability gap for the natural formulation of maximum clique
problem on random graphs for SOS and Lasserre hierarchies, which in turn follow
from degree lower bounds for the Positivestellensatz proof system.
We follow the usual recipe for such proofs. First, we introduce a natural
"dual certificate" (also known as a "vector-solution" or "pseudo-expectation")
for the given system of polynomial equations representing the problem for every
fixed input graph. Then we show that the matrix associated with this dual
certificate is PSD (positive semi-definite) with high probability over the
choice of the input graph.This requires the use of certain tools. One is the
theory of association schemes, and in particular the eigenspaces and
eigenvalues of the Johnson scheme. Another is a combinatorial method we develop
to compute (via traces) norm bounds for certain random matrices whose entries
are highly dependent; we hope this method will be useful elsewhere
Application of diffracted waves analysis to edge detection on 3D and 4D seismic data
Over the past several decades diffracted wave analysis was applied to such problems as detection of small objects on seismic and GPR data, enhanced fault imaging, detection of edges of the geological bodies. Recently we showed using 2D mathematical modelling the potential of diffracted wave analysis for detection of the CO2 leakage. In this paper we illustrate application of several diffracted waves imaging techniques based on the phase reversal phenomena for to detection of edges of the objects on 2D data and generalize the approach for 3D acquisition geometry. The proposed technique was tested on physical modelling data and 4D seismic data acquired as a part of CO2CRC Otway project conducted by the Cooperative Research Centre for Greenhouse Gases Technologies (CO2CRC)
A New Simulated Annealing Algorithm for the Multiple Sequence Alignment Problem: The approach of Polymers in a Random Media
We proposed a probabilistic algorithm to solve the Multiple Sequence
Alignment problem. The algorithm is a Simulated Annealing (SA) that exploits
the representation of the Multiple Alignment between sequences as a
directed polymer in dimensions. Within this representation we can easily
track the evolution in the configuration space of the alignment through local
moves of low computational cost. At variance with other probabilistic
algorithms proposed to solve this problem, our approach allows for the creation
and deletion of gaps without extra computational cost. The algorithm was tested
aligning proteins from the kinases family. When D=3 the results are consistent
with those obtained using a complete algorithm. For where the complete
algorithm fails, we show that our algorithm still converges to reasonable
alignments. Moreover, we study the space of solutions obtained and show that
depending on the number of sequences aligned the solutions are organized in
different ways, suggesting a possible source of errors for progressive
algorithms.Comment: 7 pages and 11 figure
Structural and Functional Organization of the Vestibular Apparatus in Rats Subjected to Weightlessness for 19.5 Days Aboard the Kosmos-782 Satellite
The vestibular apparatus was investigated in rats subjected to weightlessness for 19.5 days. The vestibular apparatus was removed and its sections were fixed in a glutaraldehyde solution for investigation by light and electron microscopes. Structural and functional charges were noted in the otolith portions of the ear, with the otolith particles clinging to the utricular receptor surface and with the peripheral arrangement of the nucleolus in the nuclei of the receptor cells. It is possible that increased edema of the vestibular tissue resulted in the destruction of some receptor cells and in changes in the form and structure of the otolith. In the horizontal crista, the capula was separated
Anyui Volcano in Chukotka: Age, structure, pecularities of rocks' composition and eruptions
The study of lavas and pyroclastics from Anyui Volcano made it possible to reconstruct succession of its eruption events. The age of the eruption is estimated by isotopic methods to be 0.248 ± 0.030 Ma. It is established that the last episode of volcanic activity in northeastern Russia occurred 0.2‒0.5 Ma ago (in its continental part, 0.2‒0.3 Ma ago). This episode is chronologically close to the last peak in activation of volcanism in the Arctic and Subarctic regions. The absence of features indicating glacial influence on lavas from Anyui Volcano provides grounds for an assumption that no significant glaciations took place in the continental areas of western Chukotka during the last 250 ka
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
Deep sequencing has enabled the investigation of a wide range of
environmental microbial ecosystems, but the high memory requirements for {\em
de novo} assembly of short-read shotgun sequencing data from these complex
populations are an increasingly large practical barrier. Here we introduce a
memory-efficient graph representation with which we can analyze the k-mer
connectivity of metagenomic samples. The graph representation is based on a
probabilistic data structure, a Bloom filter, that allows us to efficiently
store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We
show that this data structure accurately represents DNA assembly graphs in low
memory. We apply this data structure to the problem of partitioning assembly
graphs into components as a prelude to assembly, and show that this reduces the
overall memory requirements for {\em de novo} assembly of metagenomes. On one
soil metagenome assembly, this approach achieves a nearly 40-fold decrease in
the maximum memory requirements for assembly. This probabilistic graph
representation is a significant theoretical advance in storing assembly graphs
and also yields immediate leverage on metagenomic assembly
Expected length of the longest common subsequence for large alphabets
We consider the length L of the longest common subsequence of two randomly
uniformly and independently chosen n character words over a k-ary alphabet.
Subadditivity arguments yield that the expected value of L, when normalized by
n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville
from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe
Safe and complete contig assembly via omnitigs
Contig assembly is the first stage that most assemblers solve when
reconstructing a genome from a set of reads. Its output consists of contigs --
a set of strings that are promised to appear in any genome that could have
generated the reads. From the introduction of contigs 20 years ago, assemblers
have tried to obtain longer and longer contigs, but the following question was
never solved: given a genome graph (e.g. a de Bruijn, or a string graph),
what are all the strings that can be safely reported from as contigs? In
this paper we finally answer this question, and also give a polynomial time
algorithm to find them. Our experiments show that these strings, which we call
omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of
dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201
Parking functions, labeled trees and DCJ sorting scenarios
In genome rearrangement theory, one of the elusive questions raised in recent
years is the enumeration of rearrangement scenarios between two genomes. This
problem is related to the uniform generation of rearrangement scenarios, and
the derivation of tests of statistical significance of the properties of these
scenarios. Here we give an exact formula for the number of double-cut-and-join
(DCJ) rearrangement scenarios of co-tailed genomes. We also construct effective
bijections between the set of scenarios that sort a cycle and well studied
combinatorial objects such as parking functions and labeled trees.Comment: 12 pages, 3 figure
Limited Lifespan of Fragile Regions in Mammalian Evolution
An important question in genome evolution is whether there exist fragile
regions (rearrangement hotspots) where chromosomal rearrangements are happening
over and over again. Although nearly all recent studies supported the existence
of fragile regions in mammalian genomes, the most comprehensive phylogenomic
study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some
doubts about their existence. We demonstrate that fragile regions are subject
to a "birth and death" process, implying that fragility has limited
evolutionary lifespan. This finding implies that fragile regions migrate to
different locations in different mammals, explaining why there exist only a few
chromosomal breakpoints shared between different lineages. The birth and death
of fragile regions phenomenon reinforces the hypothesis that rearrangements are
promoted by matching segmental duplications and suggests putative locations of
the currently active fragile regions in the human genome
- …