2,117 research outputs found

    Towards a lightweight generic computational grid framework for biological research

    Get PDF
    Background: An increasing number of scientific research projects require access to large-scale computational resources. This is particularly true in the biological field, whether to facilitate the analysis of large high-throughput data sets, or to perform large numbers of complex simulations – a characteristic of the emerging field of systems biology. Results: In this paper we present a lightweight generic framework for combining disparate computational resources at multiple sites (ranging from local computers and clusters to established national Grid services). A detailed guide describing how to set up the framework is available from the following URL: http://igrid-ext.cryst.bbk.ac.uk/portal_guide/. Conclusion: This approach is particularly (but not exclusively) appropriate for large-scale biology projects with multiple collaborators working at different national or international sites. The framework is relatively easy to set up, hides the complexity of Grid middleware from the user, and provides access to resources through a single, uniform interface. It has been developed as part of the European ImmunoGrid project

    A text-mining system for extracting metabolic reactions from full-text articles

    Get PDF
    Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway—metabolic pathways—has been largely neglected. Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed

    Utilities for high-throughput analysis of B-cell clonal lineages

    Get PDF
    There are at present few tools available to assist with the determination and analysis of B-cell lineage trees from next-generation sequencing data. Here we present two utilities that support automated large-scale analysis and the creation of publication-quality results. The tools are available on the web, and are also available for download so that they can be integrated into an automated pipeline. Critically, and in contrast to previously published tools, these utilities can be used with any suitable phylogenetic inference method and with any antibody germline library, and hence are species-independent

    Investigating substitutions in antibody–antigen complexes using molecular dynamics: a case study with broad-spectrum, Influenza A antibodies

    Get PDF
    In studying the binding of host antibodies to the surface antigens of pathogens, the structural and functional characterization of antibody–antigen complexes by X-ray crystallography and binding assay is important. However, the characterization requires experiments that are typically time consuming and expensive: thus, many antibody–antigen complexes are under-characterized. For vaccine development and disease surveillance, it is often vital to assess the impact of amino acid substitutions on antibody binding. For example, are there antibody substitutions capable of improving binding without a loss of breadth, or antigen substitutions that lead to antigenic escape? The questions cannot be answered reliably from sequence variation alone, exhaustive substitution assays are usually impractical, and alanine scans provide at best an incomplete identification of the critical residue–residue interactions. Here, we show that, given an initial structure of an antibody bound to an antigen, molecular dynamics simulations using the energy method molecular mechanics with Generalized Born surface area (MM/GBSA) can model the impact of single amino acid substitutions on antibody–antigen binding energy. We apply the technique to three broad-spectrum antibodies to influenza A hemagglutinin and examine both previously characterized and novel variant strains observed in the human population that may give rise to antigenic escape. We find that in some cases the impact of a substitution is local, while in others it causes a reorientation of the antibody with wide-ranging impact on residue–residue interactions: this explains, in part, why the change in chemical properties of a residue can be, on its own, a poor predictor of overall change in binding energy. Our estimates are in good agreement with experimental results—indeed, they approximate the degree of agreement between different experimental techniques. Simulations were performed on commodity computer hardware; hence, this approach has the potential to be widely adopted by those undertaking infectious disease research. Novel aspects of this research include the application of MM/GBSA to investigate binding between broadly binding antibodies and a viral glycoprotein; the development of an approach for visualizing substrate–ligand interactions; and the use of experimental assay data to rescale our predictions, allowing us to make inferences about absolute, as well as relative, changes in binding energy

    A realistic assessment of methods for extracting gene/protein interactions from free text

    Get PDF
    Background: The automated extraction of gene and/or protein interactions from the literature is one of the most important targets of biomedical text mining research. In this paper we present a realistic evaluation of gene/protein interaction mining relevant to potential non-specialist users. Hence we have specifically avoided methods that are complex to install or require reimplementation, and we coupled our chosen extraction methods with a state-of-the-art biomedical named entity tagger. Results: Our results show: that performance across different evaluation corpora is extremely variable; that the use of tagged (as opposed to gold standard) gene and protein names has a significant impact on performance, with a drop in F-score of over 20 percentage points being commonplace; and that a simple keyword-based benchmark algorithm when coupled with a named entity tagger outperforms two of the tools most widely used to extract gene/protein interactions. Conclusion: In terms of availability, ease of use and performance, the potential non-specialist user community interested in automatically extracting gene and/or protein interactions from free text is poorly served by current tools and systems. The public release of extraction tools that are easy to install and use, and that achieve state-of-art levels of performance should be treated as a high priority by the biomedical text mining community

    An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

    Get PDF
    Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

    Dynamical principles in neuroscience

    Full text link
    Dynamical modeling of neural systems and brain functions has a history of success over the last half century. This includes, for example, the explanation and prediction of some features of neural rhythmic behaviors. Many interesting dynamical models of learning and memory based on physiological experiments have been suggested over the last two decades. Dynamical models even of consciousness now exist. Usually these models and results are based on traditional approaches and paradigms of nonlinear dynamics including dynamical chaos. Neural systems are, however, an unusual subject for nonlinear dynamics for several reasons: (i) Even the simplest neural network, with only a few neurons and synaptic connections, has an enormous number of variables and control parameters. These make neural systems adaptive and flexible, and are critical to their biological function. (ii) In contrast to traditional physical systems described by well-known basic principles, first principles governing the dynamics of neural systems are unknown. (iii) Many different neural systems exhibit similar dynamics despite having different architectures and different levels of complexity. (iv) The network architecture and connection strengths are usually not known in detail and therefore the dynamical analysis must, in some sense, be probabilistic. (v) Since nervous systems are able to organize behavior based on sensory inputs, the dynamical modeling of these systems has to explain the transformation of temporal information into combinatorial or combinatorial-temporal codes, and vice versa, for memory and recognition. In this review these problems are discussed in the context of addressing the stimulating questions: What can neuroscience learn from nonlinear dynamics, and what can nonlinear dynamics learn from neuroscience?This work was supported by NSF Grant No. NSF/EIA-0130708, and Grant No. PHY 0414174; NIH Grant No. 1 R01 NS50945 and Grant No. NS40110; MEC BFI2003-07276, and Fundación BBVA

    Flexibility and intrinsic disorder are conserved features of hepatitis C virus E2 glycoprotein

    Get PDF
    The glycoproteins of hepatitis C virus, E1E2, are unlike any other viral fusion machinery yet described, and are the current focus of immunogen design in HCV vaccine development; thus, making E1E2 both scientifically and medically important. We used pre-existing, but fragmentary, structures to model a complete ectodomain of the major glycoprotein E2 from three strains of HCV. We then performed molecular dynamic simulations to explore the conformational landscape of E2, revealing a number of important features. Despite high sequence divergence, and subtle differences in the models, E2 from different strains behave similarly, possessing a stable core flanked by highly flexible regions, some of which perform essential functions such as receptor binding. Comparison with sequence data suggest that this consistent behaviour is conferred by a network of conserved residues that act as hinge and anchor points throughout E2. The variable regions (HVR-1, HVR-2 and VR-3) exhibit particularly high flexibility, and bioinformatic analysis suggests that HVR-1 is a putative intrinsically disordered protein region. Dynamic cross-correlation analyses demonstrate intramolecular communication and suggest that specific regions, such as HVR-1, can exert influence throughout E2. To support our computational approach we performed small-angle X-ray scattering with purified E2 ectodomain; this data was consistent with our MD experiments, suggesting a compact globular core with peripheral flexible regions. This work captures the dynamic behaviour of E2 and has direct relevance to the interaction of HCV with cell-surface receptors and neutralising antibodies

    Vigorous lateral export of the meltwater outflow from beneath an Antarctic ice shelf

    Get PDF
    The instability and accelerated melting of the Antarctic Ice Sheet are among the foremost elements of contemporary global climate change1, 2. The increased freshwater output from Antarctica is important in determining sea level rise1, the fate of Antarctic sea ice and its effect on the Earth’s albedo4, 5, ongoing changes in global deep-ocean ventilation6, and the evolution of Southern Ocean ecosystems and carbon cycling7, 8. A key uncertainty in assessing and predicting the impacts of Antarctic Ice Sheet melting concerns the vertical distribution of the exported meltwater. This is usually represented by climate-scale models3–5, 9 as a near-surface freshwater input to the ocean, yet measurements around Antarctica reveal the meltwater to be concentrated at deeper levels10, 11, 12, 13, 14. Here we use observations of the turbulent properties of the meltwater outflows from beneath a rapidly melting Antarctic ice shelf to identify the mechanism responsible for the depth of the meltwater. We show that the initial ascent of the meltwater outflow from the ice shelf cavity triggers a centrifugal overturning instability that grows by extracting kinetic energy from the lateral shear of the background oceanic flow. The instability promotes vigorous lateral export, rapid dilution by turbulent mixing, and finally settling of meltwater at depth. We use an idealized ocean circulation model to show that this mechanism is relevant to a broad spectrum of Antarctic ice shelves. Our findings demonstrate that the mechanism producing meltwater at depth is a dynamically robust feature of Antarctic melting that should be incorporated into climate-scale models
    corecore