10 research outputs found

    A double classification tree search algorithm for index SNP selection

    Get PDF
    BACKGROUND: In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets. RESULTS: We have developed a double classification tree search algorithm to generate index SNPs that can distinguish all SNP and haplotype patterns. This algorithm runs very rapidly and generates very good, though not necessarily minimum, sets of index SNPs, as is to be expected for such NP-complete problems. CONCLUSIONS: A new algorithm for index SNP selection has been developed. A webserver for index SNP selection is available a

    On the threshold-width of graphs

    Full text link
    The GG-width of a class of graphs GG is defined as follows. A graph G has GG-width k if there are k independent sets N1,...,Nk in G such that G can be embedded into a graph H in GG such that for every edge e in H which is not an edge in G, there exists an i such that both endpoints of e are in Ni. For the class TH of threshold graphs we show that TH-width is NP-complete and we present fixed-parameter algorithms. We also show that for each k, graphs of TH-width at most k are characterized by a finite collection of forbidden induced subgraphs

    Recognition Algorithm for Probe Interval 2-Trees

    Get PDF
    Recognition of probe interval graphs has been studied extensively. Recognition algorithms of probe interval graphs can be broken down into two types of problems: partitioned and non-partitioned. A partitioned recognition algorithm includes the probe and nonprobe partition of the vertices as part of the input, where a non-partitioned algorithm does not include the partition. Partitioned probe interval graphs can be recognized in linear-time in the edges, whereas non-partitioned probe interval graphs can be recognized in polynomial-time. Here we present a non-partitioned recognition algorithm for 2-trees, an extension of trees, that are probe interval graphs. We show that this algorithm runs in O(m) time, where m is the number of edges of a 2-tree. Currently there is no algorithm that performs as well for this problem

    Recognition Algorithms for 2-Tree Probe Interval Graphs

    Get PDF
    This thesis focuses on looking at a particular set of graphs and recognizing if a given graph has certain properties that would make it belong in this family, here called 2-tree Probe Interval Graphs. For these graphs, we create an algorithm to run on a coded script that recursively runs criteria through an input graph from its matrix representation to check the 2-path, and will output either a success that our graph is a 2-tree Probe Interval Graph, or failure if it is not. After the creation of this algorithm, a complexity analysis for the algorithm will be developed, as well as the implementation of di_erent search criteria to hopefully reduce the complexity by some polynomial factor. The recognition for our set of graphs follows to the conceptual idea that triangles are built upon each other in a fashion of adding one vertex and two edges to a previous triangle in the graph. Each new triangle is added to an existing triangle and recursively builds the graph where the new vertex neighbors strictly two vertices with an existing triangle, creating a recursively de_ned 2-path

    Identification and functional analysis of nematode resistance genes

    Get PDF
    Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant’s molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). High throughput sequencing allowed the identification of several candidate genes that may be involved in the response to the PWN. With regards to the gene function most commonly identified, the majority of the sequence functions were associated with protein metabolism and carbohydrate metabolism. However, a significant fraction of sequences associated with RNA metabolism were also highly represented. The sequences that were more commonly found in Pinus pinaster were transcription repressors and a translation machinery component: aminoacyl-tRNA synthetase. The cellulose synthase is also important in the disease response, as this gene was up-regulated in infested Pinus pinaster. KEGG analysis revealed that the pathway more commonly found in this study were the pentose pathway, the pathway for glucuronate interconversion, the pathway for phenylanine metabolism, amino acid, sugar and nucleotide metabolism, phenylppropanoid biosynthesis, methane metabolism, and citrate cycle (TCA cycle).A doença da madeira do pinheiro provocada pelo nemátodo do pinheiro (PWN; Bursaphelenchus xylophilus), provoca danos irreversíveis matando pinheiros e causando graves prejuízos económicos. Embora o mecanismo de infecção seja bem descrito, a resposta molecular da planta para o patogénico não é bem conhecida. Isto deve-se principalmente à falta de informação genómica e à complexidade da doença. A sequenciação de alta capacidade é atualmente uma rota eficiente para a detecção de genes de expressão em organismos não modelos, fornecendo assim informação valiosa. Na tentativa de descobrir genes potencialmente envolvidos na defesa do pinheiro ao agente patogénico, foi realizada a análise de transcriptómica total das sequências de amostras infetadas e não infetadas do caule de Pinus pinaster (muito susceptível ao nemátodo do pinheiro) e Pinus pinea (menos susceptíveis ao nemátodo do pinheiro), e comparado o seu perfil ao nível da transcrição. A pirosequenciação permitiu a identificação de diversos genes candidatos que poderão estar associados à resposta ao NMP. No que respeita à função do gene mais predominantemente identificado foi a função associada com o metabolismo de proteínas e metabolismo de hidratos de carbono. No entanto, uma fracção significativa de sequências associadas com o metabolismo de RNA foram, também altamente representadas. As sequências que foram mais comumente encontradas em Pinus pinaster foram repressores de transcrição e um componente de tradução: aminoacil-tRNA sintetase. A celulose sintetase é também importante na resposta da doença, uma vez que, este gene foi sobre regulado em infestado Pinus pinaster. A análise de KEGG revela que as vias metabólicas mais comumente representadas neste estudo estão relacionadas com a via das pentoses, com as interconversões do glucoronato, as vias do metabolismo da fenilalanina, do metabolismo dos aminoácidos e dos açúcares, biossíntese dos fenilpropanóides, metabolismo do metano e o ciclo do citrato (ácido cítrico)

    LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).</p> <p>Results</p> <p>To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize.</p> <p>Conclusions</p> <p>The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.</p

    Dynamic representation of consecutive-ones matrices and interval graphs

    Get PDF
    2015 Spring.Includes bibliographical references.We give an algorithm for updating a consecutive-ones ordering of a consecutive-ones matrix when a row or column is added or deleted. When the addition of the row or column would result in a matrix that does not have the consecutive-ones property, we return a well-known minimal forbidden submatrix for the consecutive-ones property, known as a Tucker submatrix, which serves as a certificate of correctness of the output in this case, in O(n log n) time. The ability to return such a certificate within this time bound is one of the new contributions of this work. Using this result, we obtain an O(n) algorithm for updating an interval model of an interval graph when an edge or vertex is added or deleted. This matches the bounds obtained by a previous dynamic interval-graph recognition algorithm due to Crespelle. We improve on Crespelle's result by producing an easy-to-check certificate, known as a Lekkerkerker-Boland subgraph, when a proposed change to the graph results in a graph that is not an interval graph. Our algorithm takes O(n log n) time to produce this certificate. The ability to return such a certificate within this time bound is the second main contribution of this work
    corecore