238 research outputs found

    Genomic Scaffold Filling Revisited

    Get PDF
    The genomic scaffold filling problem has attracted a lot of attention recently. The problem is on filling an incomplete sequence (scaffold) I into I\u27, with respect to a complete reference genome G, such that the number of adjacencies between G and I\u27 is maximized. The problem is NP-complete and APX-hard, and admits a 1.2-approximation. However, the sequence input I is not quite practical and does not fit most of the real datasets (where a scaffold is more often given as a list of contigs). In this paper, we revisit the genomic scaffold filling problem by considering this important case when, (1) a scaffold S is given, the missing genes X = c(G) - c(S) can only be inserted in between the contigs, and the objective is to maximize the number of adjacencies between G and the filled S\u27 and (2) a scaffold S is given, a subset of the missing genes X\u27 subset X = c(G) - c(S) can only be inserted in between the contigs, and the objective is still to maximize the number of adjacencies between G and the filled S\u27\u27. For problem (1), we present a simple NP-completeness proof, we then present a factor-2 greedy approximation algorithm, and finally we show that the problem is FPT when each gene appears at most d times in G. For problem (2), we prove that the problem is W[1]-hard and then we present a factor-2 FPT-approximation for the case when each gene appears at most d times in G

    Beyond Adjacency Maximization: Scaffold Filling for New String Distances

    Get PDF
    International audienceIn Genomic Scaffold Filling, one aims at polishing in silico a draft genome, called scaffold. The scaffold is given in the form of an ordered set of gene sequences, called contigs. This is done by confronting the scaffold to an already complete reference genome from a close species. More precisely, given a scaffold S, a reference genome G and a score function f () between two genomes, the aim is to complete S by adding the missing genes from G so that the obtained complete genome S * optimizes f (S * , G). In this paper, we extend a model of Jiang et al. [CPM 2016] (i) by allowing the insertions of strings instead of single characters (i.e., some groups of genes may be forced to be inserted together) and (ii) by considering two alternative score functions: the first generalizes the notion of common adjacencies by maximizing the number of common k-mers between S * and G (k-Mer Scaffold Filling), the second aims at minimizing the number of breakpoints between S * and G (Min-Breakpoint Scaffold Filling). We study these problems from the parameterized complexity point of view, providing fixed-parameter (FPT) algorithms for both problems. In particular, we show that k-Mer Scaffold Filling is FPT wrt. parameter , the number of additional k-mers realized by the completion of S—this answers an open question of Jiang et al. [CPM 2016]. We also show that Min-Breakpoint Scaffold Filling is FPT wrt. a parameter combining the number of missing genes, the number of gene repetitions and the target distance

    On the Approximability of the Exemplar Adjacency Number Problem for Genomes with Gene Repetitions

    Get PDF
    In this paper, we apply a measure, exemplar adjacency number, which complements and extends the well-studied breakpoint distance between two permutations, to measure the similarity between two genomes (or in general, between any two sequences drawn from the same alphabet). For two genomes and drawn from the same set of n gene families and containing gene repetitions, we consider the corresponding Exemplar Adjacency Number problem (EAN), in which we delete duplicated genes from and such that the resultant exemplar genomes (permutations) G and H have the maximum adjacency number. We obtain the following results. First, we prove that the one-sided 2-repetitive EAN problem, i.e., when one of and is given exemplar and each gene occurs in the other genome at most twice, can be linearly reduced from the Maximum Independent Set problem. This implies that EAN does not admit any -approximation algorithm, for any , unless P = NP. This hardness result also implies that EAN, parameterized by the optimal solution value, is W[1]-hard. Secondly, we show that the two-sided 2-repetitive EAN problem has an -approximation algorithm, which is tight up to a constant factor

    Connected Coordinated Motion Planning with Bounded Stretch

    Full text link
    We consider the problem of connected coordinated motion planning for a large collective of simple, identical robots: From a given start grid configuration of robots, we need to reach a desired target configuration via a sequence of parallel, collision-free robot motions, such that the set of robots induces a connected grid graph at all integer times. The objective is to minimize the makespan of the motion schedule, i.e., to reach the new configuration in a minimum amount of time. We show that this problem is NP-complete, even for deciding whether a makespan of 2 can be achieved, while it is possible to check in polynomial time whether a makespan of 1 can be achieved. On the algorithmic side, we establish simultaneous constant-factor approximation for two fundamental parameters, by achieving constant stretch for constant scale. Scaled shapes (which arise by increasing all dimensions of a given object by the same multiplicative factor) have been considered in previous seminal work on self-assembly, often with unbounded or logarithmic scale factors; we provide methods for a generalized scale factor, bounded by a constant. Moreover, our algorithm achieves a constant stretch factor: If mapping the start configuration to the target configuration requires a maximum Manhattan distance of dd, then the total duration of our overall schedule is O(d)\mathcal{O}(d), which is optimal up to constant factors.Comment: 28 pages, 18 figures, full version of an extended abstract that appeared in the proceedings of the 32nd International Symposium on Algorithms and Computation (ISAAC 2021); revised version (more details added, and typing errors corrected

    From Multiview Image Curves to 3D Drawings

    Full text link
    Reconstructing 3D scenes from multiple views has made impressive strides in recent years, chiefly by correlating isolated feature points, intensity patterns, or curvilinear structures. In the general setting - without controlled acquisition, abundant texture, curves and surfaces following specific models or limiting scene complexity - most methods produce unorganized point clouds, meshes, or voxel representations, with some exceptions producing unorganized clouds of 3D curve fragments. Ideally, many applications require structured representations of curves, surfaces and their spatial relationships. This paper presents a step in this direction by formulating an approach that combines 2D image curves into a collection of 3D curves, with topological connectivity between them represented as a 3D graph. This results in a 3D drawing, which is complementary to surface representations in the same sense as a 3D scaffold complements a tent taut over it. We evaluate our results against truth on synthetic and real datasets.Comment: Expanded ECCV 2016 version with tweaked figures and including an overview of the supplementary material available at multiview-3d-drawing.sourceforge.ne

    Phylogenetic assembly of paleogenomes integrating ancient DNA data

    Get PDF
    Luhmann N. Phylogenetic assembly of paleogenomes integrating ancient DNA data. Bielefeld: Universität Bielefeld; 2017.In comparative genomics, reconstructing the genomes of ancestral species in a given phylogeny is an important problem in order to analyze genome evolution over time. The diversity of present-day genomes in terms of local mutations and genome rearrangements allows to shed light on the dynamics of evolutionary processes that led from a common ancestor to a set of extant genomes. This speciation history is depicted in a phylogenetic tree. Comparative genome reconstruction methods aim to infer genomic features such as an order of markers (e.g. genes) for extinct species at internal nodes of the tree by applying different evolutionary models, relying only on the information available for the extant genomes at the leaves of the phylogenetic tree. Recently, the steady progress in sequencing technologies led to the emergence of the field of paleogenomics, where the study of ancient DNA (aDNA) found in conserved organic material is moving rapidly towards the sequencing and analysis of complete paleogenomes. Such ''genetic time travel'' allows direct insight into specific phases of the evolution of specific genomes that are not only implicitly inferred from extant DNA sequences. However, as DNA is naturally degraded over time after the death of an organism and environmental conditions interfere with the conservation of DNA material, an assembly of these paleogenomes is usually fragmented, preventing a detailed analysis of genome rearrangements along the branches of the phylogenetic tree. In this thesis, we aim to combine the study of aDNA and comparative ancestral reconstruction in a phylogenetic framework. The comparison with extant related genomes can naturally assist in scaffolding a fragmented aDNA assembly, while the aDNA sequencing data can be included as an additional source of information for comparative reconstruction methods to improve the reconstructions of all related genomes in the phylogenetic tree. Our first focus is on integrative methods to reconstruct marker orders globally in a phylogeny under the assumption of parsimony. An underlying rearrangement model can describe the evolutionary operations that occurred along the edges of the tree. However, as much as complex rearrangement scenarios can give insights into underlying biological mechanisms during evolution, from an computational point of view the ancestral reconstruction problem under rearrangement distances is an NP-hard problem. One exception is the Single-Cut-or-Join (SCJ) distance, that uses a marker order-based representation of the involved genomes to model the cut and join of marker adjacencies as evolutionary operations. We build upon this rearrangement model and describe parsimony-based reconstruction methods aiming to minimize the SCJ distance in the tree. In addition, we require the reconstructed solutions to be consistent, such that they represent linear or circular regions of the ancestral genome. Our first polynomial-time method is based on the Sankoff-Rousseau algorithm and directly includes an aDNA assembly graph at one internal node of the tree. We show that including branch lengths in the underlying tree can avoid ambiguity in practice. Our second approach follows a more general strategy and includes the aDNA sequencing data as local weights for adjacencies next to the SCJ distance in the objective. We describe a fixed-parameter-tractable algorithm that also allows to sample co-optimal solutions. Finally, we describe an approach to fill gaps between potentially adjacent markers by aDNA data to reconstruct the complete genome sequence of a paleogenome guided by the related extant genome sequences. In addition, this approach enables us to select the adjacencies that are supported by the sequencing information from sets of conflicting adjacencies. We evaluate our proposed models and algorithms on simulated and biological data. In particular, we integrate two aDNA sequencing data sets for ancient strains of the pathogen Yersinia pestis, that is understood to be the cause of several pandemics in medieval times. We show that the combination of aDNA sequencing reads and a parsimonious reconstruction in the phylogenetic tree reduces the fragmentation of an initial aDNA assembly substantially and explore alternative reconstructions to emphasize reliably reconstructed regions of the ancient genomes

    28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

    Get PDF
    Peer reviewe

    Mechanical stimulation of mesenchymal stem/stromal cells in a bioreactor system: An approach to mobilize cells into scaffolds

    Get PDF
    Articular cartilage (AC) is a viscoelastic avascular tissue mainly composed of chondrocytes embedded in a rich extracellular matrix that covers the joints and supports load distribution of the joints. The absence of vessels restricts its regenerative capability. Hence, joint motion facilitates nutrient deposition and cell waste disposal. Mechanical stimulation contributes to the homeostasis of functional AC by supporting delivery of nutrients, cytokines and growth factors between the distant chondrocytes. Current techniques to treat AC defects still fail to entirely heal and to achieve a native-like AC. As the knee joint has neighboring niches of stem cells, we hypothesized that mechanical stimulation might enhance the mobilization of endogenous mesenchymal stem/stromal cells (MSCs) from nearby niches as the bone marrow (BM), when the subchondral bone is opened. To test this hypothesis, we developed a compression bioreactor system in vitro for simultaneous application of mechanical stimulation and cell cultivation. This study aimed to evaluate the role of dynamic of mechanical stimulation on mobilizing MSCs toward scaffolds in a bioreactor system. The novel mechanical system for evaluating mobilization of MSCs in a 3D context in vitro consisted of a) a compression bioreactor able to induce loading on scaffolds, b) custom-made software for settings for management and data recording, c) cell loading experiments, and d) 3D image-based biological evaluation. The mechanical stimulation acted on an acellular scaffold made of alginate, functionalized-alginate with laminin-521(alginate-Ln) or collagen-I (col-I), and a cell reservoir containing porcine or human BM-MSCs (pBM-MSCs and hBM-MSCs, respectively) below it. The mechanical loading program was set up as 10 % strain regarding the original height of the scaffold, 24 hours at 0.3 Hz, using dynamic continuous or intermittent loading regime, with breaks of 10 seconds each 180 cycles, when intermittent loading was used. Supporting our hypothesis, we found that intermittent mechanical stimulation induced the mobilization of hBM-MSCs in col-I scaffolds 10-fold compared to the unloaded control (245 ± 42 viable cells/mm3 vs. 22 ± 6 viable cells/mm3, respectively; p-value < 0.0001), as well as pBM-MSCs mobilized 4-fold in alginate-Ln scaffold when intermittently loaded (194 ± 39 cells/mm3 vs. 48 ± 21 cells/mm3 for the unloaded control. In addition, we found that the bioreactor was able to stimulate the scaffolds and the cells for 23.99 ± 0.94 hours in 137.72 ± 13.21 periods, exerting compression with vertical piston displacements of 230.08 ± 54.07 μm, force of 1.08 ± 0.13 N for hBM-MSCs and force-amplitude of 1.86 ± 1.46 N for pBM-MSCs. Remarkably, the viability of mobilized cells was not compromised by intermittent mechanical loading application as evaluated with an optimized and validated protocol for counting and viability cell detection in 3D. As a first step to induce cartilage regeneration in situ, this study shows enriched acellular scaffolds with viable MSCs after mechanical stimulation, and provides an useful tool to understand better the regeneration of AC in situ

    A Rearrangement Distance for Fully-Labelled Trees

    Get PDF
    The problem of comparing trees representing the evolutionary histories of cancerous tumors has turned out to be crucial, since there is a variety of different methods which typically infer multiple possible trees. A departure from the widely studied setting of classical phylogenetics, where trees are leaf-labelled, tumoral trees are fully labelled, i.e., every vertex has a label. In this paper we provide a rearrangement distance measure between two fully-labelled trees. This notion originates from two operations: one which modifies the topology of the tree, the other which permutes the labels of the vertices, hence leaving the topology unaffected. While we show that the distance between two trees in terms of each such operation alone can be decided in polynomial time, the more general notion of distance when both operations are allowed is NP-hard to decide. Despite this result, we show that it is fixed-parameter tractable, and we give a 4-approximation algorithm when one of the trees is binary
    corecore