272 research outputs found

    Reconstituting typeset Marriage Registers using simple software tools

    Get PDF
    In a world of fully integrated software applications, which can seem daunting to develop and to maintain, it is sometimes useful to recall that a system of loosely-linked software components can provide surprisingly powerful and flexible methods for software development. This paper describes a project which aims to retypeset a series of volumes from the Phillimore Marriage Registers, first published in England around the turn of the last century. The source material is plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volumes. The regular, tabular, structure of the Register pages allows us to automate the re-typesetting process. The UNIX troff software and its tbl preprocessor are used for the typesetting itself, but a series of simple awk-based software tools, all of them parsers and code generators of one sort or another, is used to bring about the OCR-to-troff transformation. By re-parsing the generated troff codes it is possible to produce a surname index as a supplement to the retypeset volume. Moreover, this second-stage parsing has been invaluable in discovering subtle ‘typos’ in the automatically generated material. With small adjustments to this parser it would be possible to output the complete marriage entries in standard XML or GEDCOM notations

    Accurate and exact CNV identification from targeted high-throughput sequence data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.</p> <p>Results</p> <p>Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.</p> <p>Conclusions</p> <p>Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.</p

    Effects of a recombinant gene expression on ColE1-like plasmid segregation in Escherichia coli

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Segregation of expression plasmids leads to loss of recombinant DNA from transformed bacterial cells due to the irregular distribution of plasmids between the daughter cells during cell division. Under non-selective conditions this segregational instability results in a heterogeneous population of cells, where the non-productive plasmid-free cells overgrow the plasmid-bearing cells thus decreasing the yield of recombinant protein. Amongst the factors affecting segregational plasmid instability are: the plasmid design, plasmid copy-number, host cell genotype, fermentation conditions etc. This study aims to investigate the influence of transcription and translation on the segregation of recombinant plasmids designed for constitutive gene expression in <it>Escherichia coli </it>LE392 at glucose-limited continuous cultivation. To this end a series of pBR322-based plasmids carrying a synthetic human interferon-gamma (hIFNγ) gene placed under the control of different regulatory elements (promoter and ribosome-binding sites) were used as a model.</p> <p>Results</p> <p>Bacterial growth and product formation kinetics of transformed <it>E. coli </it>LE392 cells cultivated continuously were described by a structured kinetic model proposed by Lee et al. (1985). The obtained results demonstrated that both transcription and translation efficiency strongly affected plasmid segregation. The segregation of plasmid having a deleted promoter did not exceed 5% after 190 h of cultivation. The observed high plasmid stability was not related with an increase in the plasmid copy-number. A reverse correlation between the yield of recombinant protein (as modulated by using different ribosome binding sites) and segregational plasmid stability (determined by the above model) was also observed.</p> <p>Conclusions</p> <p>Switching-off transcription of the hIFNγ gene has a stabilising effect on ColE1-like plasmids against segregation, which is not associated with an increase in the plasmid copy-number. The increased constitutive gene expression has a negative effect on segregational plasmid stability. A kinetic model proposed by Lee et al. (1985) was appropriate for description of <it>E. coli </it>cell growth and recombinant product formation in chemostat cultivations.</p

    Modification of EGF-Like Module 1 of Thrombospondin-1, an Animal Extracellular Protein, by O-Linked N-Acetylglucosamine

    Get PDF
    Thrombospondin-1 (TSP-1) is known to be subject to three unusual carbohydrate modifications: C-mannosylation, O-fucosylation, and O-glucosylation. We now describe a fourth: O-β-N-acetylglucosaminylation. Previously, O-β-N-acetylglucosamine (O-β-GlcNAc) was found on a threonine in the loop between the fifth and sixth cysteines of the 20th epidermal growth factor (EGF)-like module of Drosophila Notch. A BLAST search based on the Drosophila Notch loop sequence identified a number of human EGF-like modules that contain a similar sequence, including EGF-like module 1 of TSP-1 and its homolog, TSP-2. TSP-1, which has a potentially modifiable serine in the loop, reacted in immuno-blots with the CTD110.6 anti-O-GlcNAc antibody. Antibody reactivity was diminished by treatment of TSP-1 with β-N-acetylhexosaminidase. TSP-2, which lacks a potentially modifiable serine/threonine in the loop, did not react with CTD110.6. Analysis of tandem modules of TSP-1 localized reactivity of CTD110.6 to EGF-like module 1. Top-down mass spectrometric analysis of EGF-like module 1 demonstrated the expected modifications with glucose (+162 Da) and xylose (+132 Da) separately from modification with N-acetyl hexosamine (+203 Da). Mass spectrometric sequence analysis localized the +203-Da modification to Ser580 in the sequence 575CPPGYSGNGIQC586. These results demonstrate that O-β-N-acetylglucosaminylation can occur on secreted extracellular matrix proteins as well as on cell surface proteins

    Nucleotide Discrimination with DNA Immobilized in the MspA Nanopore

    Get PDF
    Nanopore sequencing has the potential to become a fast and low-cost DNA sequencing platform. An ionic current passing through a small pore would directly map the sequence of single stranded DNA (ssDNA) driven through the constriction. The pore protein, MspA, derived from Mycobacterium smegmatis, has a short and narrow channel constriction ideally suited for nanopore sequencing. To study MspA's ability to resolve nucleotides, we held ssDNA within the pore using a biotin-NeutrAvidin complex. We show that homopolymers of adenine, cytosine, thymine, and guanine in MspA exhibit much larger current differences than in α-hemolysin. Additionally, methylated cytosine is distinguishable from unmethylated cytosine. We establish that single nucleotide substitutions within homopolymer ssDNA can be detected when held in MspA's constriction. Using genomic single nucleotide polymorphisms, we demonstrate that single nucleotides within random DNA can be identified. Our results indicate that MspA has high signal-to-noise ratio and the single nucleotide sensitivity desired for nanopore sequencing devices

    Consensus over Random Graph Processes: Network Borel-Cantelli Lemmas for Almost Sure Convergence

    Full text link
    Distributed consensus computation over random graph processes is considered. The random graph process is defined as a sequence of random variables which take values from the set of all possible digraphs over the node set. At each time step, every node updates its state based on a Bernoulli trial, independent in time and among different nodes: either averaging among the neighbor set generated by the random graph, or sticking with its current state. Connectivity-independence and arc-independence are introduced to capture the fundamental influence of the random graphs on the consensus convergence. Necessary and/or sufficient conditions are presented on the success probabilities of the Bernoulli trials for the network to reach a global almost sure consensus, with some sharp threshold established revealing a consensus zero-one law. Convergence rates are established by lower and upper bounds of the ϵ\epsilon-computation time. We also generalize the concepts of connectivity/arc independence to their analogues from the *-mixing point of view, so that our results apply to a very wide class of graphical models, including the majority of random graph models in the literature, e.g., Erd\H{o}s-R\'{e}nyi, gossiping, and Markovian random graphs. We show that under *-mixing, our convergence analysis continues to hold and the corresponding almost sure consensus conditions are established. Finally, we further investigate almost sure finite-time convergence of random gossiping algorithms, and prove that the Bernoulli trials play a key role in ensuring finite-time convergence. These results add to the understanding of the interplay between random graphs, random computations, and convergence probability for distributed information processing.Comment: IEEE Transactions on Information Theory, In Pres

    Disparate oxidant gene expression of airway epithelium compared to alveolar macrophages in smokers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The small airway epithelium and alveolar macrophages are exposed to oxidants in cigarette smoke leading to epithelial dysfunction and macrophage activation. In this context, we asked: what is the transcriptome of oxidant-related genes in small airway epithelium and alveolar macrophages, and does their response differ substantially to inhaled cigarette smoke?</p> <p>Methods</p> <p>Using microarray analysis, with TaqMan RT-PCR confirmation, we assessed oxidant-related gene expression in small airway epithelium and alveolar macrophages from the same healthy nonsmoker and smoker individuals.</p> <p>Results</p> <p>Of 155 genes surveyed, 87 (56%) were expressed in both cell populations in nonsmokers, with higher expression in alveolar macrophages (43%) compared to airway epithelium (24%). In smokers, there were 15 genes (10%) up-regulated and 7 genes (5%) down-regulated in airway epithelium, but only 3 (2%) up-regulated and 2 (1%) down-regulated in alveolar macrophages. Pathway analysis of airway epithelium showed oxidant pathways dominated, but in alveolar macrophages immune pathways dominated.</p> <p>Conclusion</p> <p>Thus, the response of different cell-types with an identical genome exposed to the same stress of smoking is different; responses of alveolar macrophages are more subdued than those of airway epithelium. These findings are consistent with the observation that, while the small airway epithelium is vulnerable, alveolar macrophages are not "diseased" in response to smoking.</p> <p>Trial Registration</p> <p>ClinicalTrials.gov ID: NCT00224185 and NCT00224198</p

    Copy Number Variation Affecting the Photoperiod-B1 and Vernalization-A1 Genes Is Associated with Altered Flowering Time in Wheat (Triticum aestivum)

    Get PDF
    The timing of flowering during the year is an important adaptive character affecting reproductive success in plants and is critical to crop yield. Flowering time has been extensively manipulated in crops such as wheat (Triticum aestivum L.) during domestication, and this enables them to grow productively in a wide range of environments. Several major genes controlling flowering time have been identified in wheat with mutant alleles having sequence changes such as insertions, deletions or point mutations. We investigated genetic variants in commercial varieties of wheat that regulate flowering by altering photoperiod response (Ppd-B1 alleles) or vernalization requirement (Vrn-A1 alleles) and for which no candidate mutation was found within the gene sequence. Genetic and genomic approaches showed that in both cases alleles conferring altered flowering time had an increased copy number of the gene and altered gene expression. Alleles with an increased copy number of Ppd-B1 confer an early flowering day neutral phenotype and have arisen independently at least twice. Plants with an increased copy number of Vrn-A1 have an increased requirement for vernalization so that longer periods of cold are required to potentiate flowering. The results suggest that copy number variation (CNV) plays a significant role in wheat adaptation

    Complex Reorganization and Predominant Non-Homologous Repair Following Chromosomal Breakage in Karyotypically Balanced Germline Rearrangements and Transgenic Integration

    Get PDF
    We defined the genetic landscape of balanced chromosomal rearrangements at nucleotide resolution by sequencing 141 breakpoints from cytogenetically-interpreted translocations and inversions. We confirm that the recently described phenomenon of “chromothripsis” (massive chromosomal shattering and reorganization) is not unique to cancer cells but also occurs in the germline where it can resolve to a karyotypically balanced state with frequent inversions. We detected a high incidence of complex rearrangements (19.2%) and substantially less reliance on microhomology (31%) than previously observed in benign CNVs. We compared these results to experimentally-generated DNA breakage-repair by sequencing seven transgenic animals, and revealed extensive rearrangement of the transgene and host genome with similar complexity to human germline alterations. Inversion is the most common rearrangement, suggesting that a combined mechanism involving template switching and non-homologous repair mediates the formation of balanced complex rearrangements that are viable, stably replicated and transmitted unaltered to subsequent generations
    corecore