1,348 research outputs found

    Average-case analysis of perfect sorting by reversals (Journal Version)

    Full text link
    Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. B\'erard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial properties of this family of trees are essential to the algorithm analysis. Here, we use the expected value of certain tree parameters to prove that the average run-time of the algorithm is at worst, polynomial, and additionally, for sufficiently long permutations, the sorting algorithm runs in polynomial time with probability one. Furthermore, our analysis of the subclass of commuting scenarios yields precise results on the average length of a reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete Mathematics, Algorithms and Applications, vol. 3(3), 201

    Easy identification of generalized common and conserved nested intervals

    Full text link
    In this paper we explain how to easily compute gene clusters, formalized by classical or generalized nested common or conserved intervals, between a set of K genomes represented as K permutations. A b-nested common (resp. conserved) interval I of size |I| is either an interval of size 1 or a common (resp. conserved) interval that contains another b-nested common (resp. conserved) interval of size at least |I|-b. When b=1, this corresponds to the classical notion of nested interval. We exhibit two simple algorithms to output all b-nested common or conserved intervals between K permutations in O(Kn+nocc) time, where nocc is the total number of such intervals. We also explain how to count all b-nested intervals in O(Kn) time. New properties of the family of conserved intervals are proposed to do so

    Coordinated RNA-Seq and peptidomics identify neuropeptides and G-protein coupled receptors (GPCRs) in the large pine weevil Hylobius abietis, a major forestry pest

    Get PDF
    Hylobius abietis (Linnaeus), or large pine weevil (Coleoptera, Curculionidae), is a pest of European coniferous forests. In order to gain understanding of the functional physiology of this species, we have assembled a de novo transcriptome of H. abietis, from sequence data obtained by Next Generation Sequencing. In particular, we have identified genes encoding neuropeptides, peptide hormones and their putative G-protein coupled receptors (GPCRs) to gain insights into neuropeptide-modulated processes. The transcriptome was assembled de novo from pooled paired-end, sequence reads obtained from RNA from whole adults, gut and central nervous system tissue samples. Data analysis was performed on the transcripts obtained from the assembly including, annotation, gene ontology and functional assignment as well as transcriptome completeness assessment and KEGG pathway analysis. Pipelines were created using Bioinformatics tools and techniques for prediction and identification of neuropeptides and neuropeptide receptors. Peptidomic analysis was also carried out using a combination of MALDI-TOF as well as Q-Exactive Orbitrap mass spectrometry to confirm the identified neuropeptide. 41 putative neuropeptide families were identified in H. abietis, including Adipokinetic hormone (AKH), CAPA and DH31. Neuropeptide F, which has not been yet identified in the model beetle T. castaneum, was identified. Additionally, 24 putative neuropeptide and 9 leucine-rich repeat containing G protein coupled receptor-encoding transcripts were determined using both alignment as well as non-alignment methods. This information, submitted to the NCBI sequence read archive repository (SRA accession: SRP133355), can now be used to inform understanding of neuropeptide-modulated physiology and behaviour in H. abietis; and to develop specific neuropeptide-based tools for H. abietis control

    Genetic rearrangements in <i>Pseudomonas amygdali</i> pathovar <i>aesculi </i>shape coronatine plasmids

    Get PDF
    Plant pathogenic Pseudomonas species use multiple classes of toxins and virulence factors during host infection. The genes encoding these pathogenicity factors are often located on plasmids and other mobile genetic elements, suggesting that they are acquired through horizontal gene transfer to confer an evolutionary advantage for successful adaptation to host infection. However, the genetic rearrangements that have led to mobilization of the pathogenicity genes are not fully understood. In this study, we have sequenced and analyzed the complete genome sequences of four Pseudomonas amygdali pv. aesculi (Pae), which infect European horse chestnut trees (Aesculus hippocastanum) and belong to phylogroup 3 of the P. syringae species complex. The four investigated genomes contain six groups of plasmids that all encode pathogenicity factors. Effector genes were found to be mostly associated with insertion sequence elements, suggesting that virulence genes are generally mobilized and potentially undergo horizontal gene transfer after transfer to a conjugative plasmid. We show that the biosynthetic gene cluster encoding the phytotoxin coronatine was recently transferred from a chromosomal location to a mobilizable plasmid that subsequently formed a co-integrate with a conjugative plasmid

    A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes

    Get PDF
    The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition

    Analysis of local genome rearrangement improves resolution of ancestral genomic maps in plants

    Get PDF
    Rubert D, Martinez FHV, Stoye J, Dörr D. Analysis of local genome rearrangement improves resolution of ancestral genomic maps in plants. BMC Genomics. 2020;21(Suppl. 2): 273.Background Computationally inferred ancestral genomes play an important role in many areas of genome research. We present an improved workflow for the reconstruction from highly diverged genomes such as those of plants. Results Our work relies on an established workflow in the reconstruction of ancestral plants, but improves several steps of this process. Instead of using gene annotations for inferring the genome content of the ancestral sequence, we identify genomic markers through a process called genome segmentation. This enables us to reconstruct the ancestral genome from hundreds of thousands of markers rather than the tens of thousands of annotated genes. We also introduce the concept of local genome rearrangement, through which we refine syntenic blocks before they are used in the reconstruction of contiguous ancestral regions. With the enhanced workflow at hand, we reconstruct the ancestral genome of eudicots, a major sub-clade of flowering plants, using whole genome sequences of five modern plants. Conclusions Our reconstructed genome is highly detailed, yet its layout agrees well with that reported in Badouin et al. (2017). Using local genome rearrangement, not only the marker-based, but also the gene-based reconstruction of the eudicot ancestor exhibited increased genome content, evidencing the power of this novel concept

    Approximate Search for Known Gene Clusters in New Genomes Using PQ-Trees

    Get PDF
    We define a new problem in comparative genomics, denoted PQ-Tree Search, that takes as input a PQ-tree T representing the known gene orders of a gene cluster of interest, a gene-to-gene substitution scoring function h, integer parameters d_T and d_S, and a new genome S. The objective is to identify in S approximate new instances of the gene cluster that could vary from the known gene orders by genome rearrangements that are constrained by T, by gene substitutions that are governed by h, and by gene deletions and insertions that are bounded from above by d_T and d_S, respectively. We prove that the PQ-Tree Search problem is NP-hard and propose a parameterized algorithm that solves the optimization variant of PQ-Tree Search in O^*(2^{?}) time, where ? is the maximum degree of a node in T and O^* is used to hide factors polynomial in the input size. The algorithm is implemented as a search tool, denoted PQFinder, and applied to search for instances of chromosomal gene clusters in plasmids, within a dataset of 1,487 prokaryotic genomes. We report on 29 chromosomal gene clusters that are rearranged in plasmids, where the rearrangements are guided by the corresponding PQ-tree. One of these results, coding for a heavy metal efflux pump, is further analysed to exemplify how PQFinder can be harnessed to reveal interesting new structural variants of known gene clusters

    Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell

    Get PDF
    Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes

    Finding Nested Common Intervals Efficiently

    Get PDF
    International audienceIn this paper, we study the problem of effi ciently fi nding gene clusters formalized by nested common intervals between two genomes represented either as permutations or as sequences. Considering permutations, we give several algorithms whose running time depends on the size of the actual output rather than the output in the worst case. Indeed, we first provide a straightforward O(n^3) time algorithm for finding all nested common intervals. We reduce this complexity by providing an O(n^2) time algorithm computing an irredundant output. Finally, we show, by providing a third algorithm, that fi nding only the maximal nested common intervals can be done in linear time. Considering sequences, we provide solutions (modi cations of previously de ned algorithms and a new algorithm) for di fferent variants of the problem, depending on the treatment one wants to apply to duplicated genes
    • …
    corecore