1,103 research outputs found

    ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Get PDF
    Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.</p

    Reconstructing ‘the Alcoholic’: Recovering from Alcohol Addiction and the Stigma this Entails

    Get PDF
    Public perception of alcohol addiction is frequently negative, whilst an important part of recovery is the construction of a positive sense of self. In order to explore how this might be achieved, we investigated how those who self-identify as in recovery from alcohol problems view themselves and their difficulties with alcohol and how they make sense of others’ responses to their addiction. Semi-structured interviews with six individuals who had been in recovery between 5 and 35 years and in contact with Alcoholics Anonymous were analysed using Interpretative Phenomenological Analysis. The participants were acutely aware of stigmatising images of ‘alcoholics’ and described having struggled with a considerable dilemma in accepting this identity themselves. However, to some extent they were able to resist stigma by conceiving of an ‘aware alcoholic self’ which was divorced from their previously unaware self and formed the basis for a new more knowing and valued identity

    Glucose-induced down regulation of thiamine transporters in the kidney proximal tubular epithelium produces thiamine insufficiency in diabetes

    Get PDF
    Increased renal clearance of thiamine (vitamin B1) occurs in experimental and clinical diabetes producing thiamine insufficiency mediated by impaired tubular re-uptake and linked to the development of diabetic nephropathy. We studied the mechanism of impaired renal re-uptake of thiamine in diabetes. Expression of thiamine transporter proteins THTR-1 and THTR-2 in normal human kidney sections examined by immunohistochemistry showed intense polarised staining of the apical, luminal membranes in proximal tubules for THTR-1 and THTR-2 of the cortex and uniform, diffuse staining throughout cells of the collecting duct for THTR-1 and THTR-2 of the medulla. Human primary proximal tubule epithelial cells were incubated with low and high glucose concentration, 5 and 26 mmol/l, respectively. In high glucose concentration there was decreased expression of THTR-1 and THTR-2 (transporter mRNA: −76% and −53% respectively, p<0.001; transporter protein −77% and −83% respectively, p<0.05), concomitant with decreased expression of transcription factor specificity protein-1. High glucose concentration also produced a 37% decrease in apical to basolateral transport of thiamine transport across cell monolayers. Intensification of glycemic control corrected increased fractional excretion of thiamine in experimental diabetes. We conclude that glucose-induced decreased expression of thiamine transporters in the tubular epithelium may mediate renal mishandling of thiamine in diabetes. This is a novel mechanism of thiamine insufficiency linked to diabetic nephropathy

    The N-terminal intrinsically disordered domain of mgm101p is localized to the mitochondrial nucleoid.

    Get PDF
    The mitochondrial genome maintenance gene, MGM101, is essential for yeasts that depend on mitochondrial DNA replication. Previously, in Saccharomyces cerevisiae, it has been found that the carboxy-terminal two-thirds of Mgm101p has a functional core. Furthermore, there is a high level of amino acid sequence conservation in this region from widely diverse species. By contrast, the amino-terminal region, that is also essential for function, does not have recognizable conservation. Using a bioinformatic approach we find that the functional core from yeast and a corresponding region of Mgm101p from the coral Acropora millepora have an ordered structure, while the N-terminal domains of sequences from yeast and coral are predicted to be disordered. To examine whether ordered and disordered domains of Mgm101p have specific or general functions we made chimeric proteins from yeast and coral by swapping the two regions. We find, by an in vivo assay in S.cerevisiae, that the ordered domain of A.millepora can functionally replace the yeast core region but the disordered domain of the coral protein cannot substitute for its yeast counterpart. Mgm101p is found in the mitochondrial nucleoid along with enzymes and proteins involved in mtDNA replication. By attaching green fluorescent protein to the N-terminal disordered domain of yeast Mgm101p we find that GFP is still directed to the mitochondrial nucleoid where full-length Mgm101p-GFP is targeted

    Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

    Get PDF
    The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

    PPCAS: Implementation of a Probabilistic Pairwise Model for Consistency-Based Multiple Alignment in Apache Spark

    Get PDF
    Large-scale data processing techniques, currently known as Big-Data, are used to manage the huge amount of data that are generated by sequencers. Although these techniques have significant advantages, few biological applications have adopted them. In the Bioinformatic scientific area, Multiple Sequence Alignment (MSA) tools are widely applied for evolution and phylogenetic analysis, homology and domain structure prediction. Highly-rated MSA tools, such as MAFFT, ProbCons and T-Coffee (TC), use the probabilistic consistency as a prior step to the progressive alignment stage in order to improve the final accuracy. In this paper, a novel approach named PPCAS (Probabilistic Pairwise model for Consistency-based multiple alignment in Apache Spark) is presented. PPCAS is based on the MapReduce processing paradigm in order to enable large datasets to be processed with the aim of improving the performance and scalability of the original algorithm.This work was supported by the MEyC-Spain [contract TIN2014-53234-C2-2-R]

    Optimizing substitution matrix choice and gap parameters for sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.</p> <p>Results</p> <p>POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.</p> <p>Conclusion</p> <p>The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

    The genome sequence of <i>Trypanosoma brucei gambiense</i>, causative agent of chronic Human African Trypanosomiasis

    Get PDF
    &lt;p&gt;&lt;b&gt;Background:&lt;/b&gt; &lt;i&gt;Trypanosoma brucei gambiense&lt;/i&gt; is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a &lt;i&gt;T. b. brucei&lt;/i&gt; isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between &lt;i&gt;T. b. gambiense&lt;/i&gt; and the reference genome. We sought to identify features that were uniquely associated with &lt;i&gt;T. b. gambiense&lt;/i&gt; and its ability to infect humans.&lt;/p&gt; &lt;p&gt;&lt;b&gt;Methods and findings:&lt;/b&gt; An improved high-quality draft genome sequence for the group 1 &lt;i&gt;T. b. gambiense&lt;/i&gt; DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with &lt;i&gt;T. b. brucei&lt;/i&gt; showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in &lt;i&gt;T. b. gambiense&lt;/i&gt; DAL 972. A comparison of the variant surface glycoproteins (VSG) in &lt;i&gt;T. b. brucei&lt;/i&gt; with all &lt;i&gt;T. b. gambiense&lt;/i&gt; sequence reads showed that the essential structural repertoire of VSG domains is conserved across &lt;i&gt;T. brucei&lt;/i&gt;.&lt;/p&gt; &lt;p&gt;&lt;b&gt;Conclusions:&lt;/b&gt; This study provides the first estimate of intraspecific genomic variation within &lt;i&gt;T. brucei&lt;/i&gt;, and so has important consequences for future population genomics studies. We have shown that the &lt;i&gt;T. b. gambiense&lt;/i&gt; genome corresponds closely with the reference, which should therefore be an effective scaffold for any &lt;i&gt;T. brucei&lt;/i&gt; genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in &lt;i&gt;T. b. brucei&lt;/i&gt;, no &lt;i&gt;T. b. gambiense&lt;/i&gt;-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.&lt;/p&gt
    corecore