48,953 research outputs found

    On Longest Repeat Queries Using GPU

    Full text link
    Repeat finding in strings has important applications in subfields such as computational biology. The challenge of finding the longest repeats covering particular string positions was recently proposed and solved by \.{I}leri et al., using a total of the optimal O(n)O(n) time and space, where nn is the string size. However, their solution can only find the \emph{leftmost} longest repeat for each of the nn string position. It is also not known how to parallelize their solution. In this paper, we propose a new solution for longest repeat finding, which although is theoretically suboptimal in time but is conceptually simpler and works faster and uses less memory space in practice than the optimal solution. Further, our solution can find \emph{all} longest repeats of every string position, while still maintaining a faster processing speed and less memory space usage. Moreover, our solution is \emph{parallelizable} in the shared memory architecture (SMA), enabling it to take advantage of the modern multi-processor computing platforms such as the general-purpose graphics processing units (GPU). We have implemented both the sequential and parallel versions of our solution. Experiments with both biological and non-biological data show that our sequential and parallel solutions are faster than the optimal solution by a factor of 2--3.5 and 6--14, respectively, and use less memory space.Comment: 14 page

    High-Level Expression of Various Apolipoprotein (a) Isoforms by "Transferrinfection". The Role of Kringle IV Sequences in the Extracellular Association with Low-Density Lipoprotein

    Get PDF
    Characterization of the assembly of lipoprotein(a) [Lp(a)] is of fundamental importance to understanding the biosynthesis and metabolism of this atherogenic lipoprotein. Since no established cell lines exist that express Lp(a) or apolipoprotein(a) [apo(a)], a "transferrinfection" system for apo(a) was developed utilizing adenovirus receptor- and transferrin receptor-mediated DNA uptake into cells. Using this method, different apo(a) cDNA constructions of variable length, due to the presence of 3, 5, 7, 9, 15, or 18 internal kringle IV sequences, were expressed in cos-7 cells or CHO cells. All constructions contained kringle IV-36, which includes the only unpaired cysteine residue (Cys-4057) in apo(a). r-Apo(a) was synthesized as a precursor and secreted as mature apolipoprotein into the medium. When medium containing r-apo(a) with 9, 15, or 18 kringle IV repeats was mixed with normal human plasma LDL, stable complexes formed that had a bouyant density typical of Lp(a). Association was substantially decreased if Cys-4057 on r-apo(a) was replaced by Arg by site-directed mutagenesis or if Cys-4057 was chemically modified. Lack of association was also observed with r-apo(a) containing only 3, 5, or 7 kringle IV repeats without "unique kringle IV sequences", although Cys-4057 was present in all of these constructions. Synthesis and secretion of r-apo(a) was not dependent on its sialic acid content. r-Apo(a) was expressed even more efficiently in sialylation-defective CHO cells than in wild-type CHO cells. In transfected CHO cells defective in the addition of N-acetylglucosamine, apo(a) secretion was found to be decreased by 50%. Extracellular association with LDL was not affected by the carbohydrate moiety of r-apo(a), indicating a protein-protein interaction between r-apo(a) and apoB. These results show that, besides kringle IV-36, other kringle IV sequences are necessary for the extracellular association of r-apo(a) with LDL. Changes in the carbohydrate moiety of apo(a), however, do not affect complex formation

    Molecular biology techniques as a tool for detection and characterisation of Mycobacterium avium subsp. paratuberculosis

    Get PDF
    Mycobacterium avium subsp. paratuberculosis (M. paratuberculosis) is the causative agent of paratuberculosis, also known as Johne’s disease, a chronic intestinal infection in cattle and other ruminants. Paratuberculosis is characterised by diarrhea and weight loss that occurs after a period of a few months up to several years without any clinical signs. The considerable economic losses to dairy and beef cattle producers are caused by reduced milk production and poor reproduction performance in subclinically infected animals. Early diagnosis of infected cattle is essential to prevent the spread of the disease. Efforts have been made to eradicate paratuberculosis by using a detection and cull strategy, but eradication is hampered by the lack of suitable and sensitive diagnostic methods. This thesis, based on five scientific investigations, describes the development of different DNA amplification strategies for detection and characterisation of M. paratuberculosis. Various ways to pre-treat bacterial cultures, tissue specimens and fecal samples prior to PCR analysis were investigated. Internal positive PCR control molecules were developed and used in PCR analyses to improve the reliability and to facilitate the interpretation of the results. The sensitivity of the ultimate methods was found to be approximate that of culture and allowed detection of low numbers of M. paratuberculosis expected to be found in subclinically infected animals. Genomic DNA of a Swedish mycobacterial isolate, incorrectly identified by PCR as M. paratuberculosis was characterised. The isolate was closely related to M. cookii and harboured one copy of a DNA segment with 94% similarity to IS900, the target sequence used in diagnostic PCR for detection of M. paratuberculosis. This finding highlighted the urgency of developing or evaluating PCR systems based on genes other than IS900. A PCR-based fingerprinting method using primers targeting the enterobacterial intergenic consensus sequence (ERIC) and the IS900 sequence was developed and successfully used to distinguish M. paratuberculosis from closely related mycobacteria, including the above mentioned mycobacterial isolate. In conclusion, the molecular biology techniques developed in these studies have proved useful for accelerating the diagnostic detection and characterisation of M. paratuberculosis

    Protein Repeats from First Principles

    Get PDF
    Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repeating patterns within a single repeat protein can be extremely variable at the sequence level. We use a mathematical definition of a repetition and investigate the occurrences of these in sequences of different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast quantification of the likelihood of a protein to belong to a family.Fil: Turjanski, Pablo Guillermo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocío. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Becher, Veronica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentin

    Measuring Plant Genetic Diversity Using Inter-Simple Sequence Repeats (ISSRS)

    Get PDF
    Simple sequence repeats (SSRs) have great utility as they are conserved and present in all eukaryotic genomes. Here we report the use of a simple PCR with fluorescently-labelled primers to amplify inter-SSR markers (ISSRs) for diversity assessments. The use of ISSR markers does not rely upon specific genetic sequence information, or prolonged method development and may be measured rapidly using the automated equipment. The major restriction of the ISSR method is at the analysis stage, as the markers are dominant it is not possible to distinguish heterozygotes as loci. We obtained ISSR data from ca. 60 phenotypically characterised Capsella bursa pastoris L. Medic (shepherds purse)accessions that had been isolated from a diverse mix of arable field sites throughout the UK. We developed mathematical scripts for use with the free statistical software tool R (http://www.rproject.org/), that processed the molecular data in a binary format to estimate genetic diversity (using the Jaccard co-efficient), and that related genotype to the plant phenotypic and environmental (site specific) traits. The methodology established has the power to predict the relationship between environmental and plant morphological characteristics
    corecore