79 research outputs found

    An Exact Algorithm for Side-Chain Placement in Protein Design

    Get PDF
    Computational protein design aims at constructing novel or improved functions on the structure of a given protein backbone and has important applications in the pharmaceutical and biotechnical industry. The underlying combinatorial side-chain placement problem consists of choosing a side-chain placement for each residue position such that the resulting overall energy is minimum. The choice of the side-chain then also determines the amino acid for this position. Many algorithms for this NP-hard problem have been proposed in the context of homology modeling, which, however, reach their limits when faced with large protein design instances. In this paper, we propose a new exact method for the side-chain placement problem that works well even for large instance sizes as they appear in protein design. Our main contribution is a dedicated branch-and-bound algorithm that combines tight upper and lower bounds resulting from a novel Lagrangian relaxation approach for side-chain placement. Our experimental results show that our method outperforms alternative state-of-the art exact approaches and makes it possible to optimally solve large protein design instances routinely

    Exploiting physico-chemical properties in string kernels

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas.</p> <p>Results</p> <p>We propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels.</p> <p>Conclusions</p> <p>In summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference.</p> <p>Availability</p> <p>Data sets, code and additional information are available from <url>http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask</url>. Implementations of the developed kernels are available as part of the Shogun toolbox.</p

    Inferring latent task structure for Multitask Learning by Multiple Kernel Learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics, many problems can be cast into the Multitask Learning scenario by incorporating data from several organisms. However, combining information from several tasks requires careful consideration of the degree of similarity between tasks. Our proposed method simultaneously learns or refines the similarity between tasks along with the Multitask Learning classifier. This is done by formulating the Multitask Learning problem as Multiple Kernel Learning, using the recently published <it>q</it>-Norm MKL algorithm.</p> <p>Results</p> <p>We demonstrate the performance of our method on two problems from Computational Biology. First, we show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. Second, we consider an MHC-I dataset, for which we assume no knowledge about the degree of task relatedness. Here, we are able to learn the task similarities<it> ab initio</it> along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against.</p> <p>Conclusions</p> <p>We present a novel approach to Multitask Learning that is capable of learning task similarity along with the classifiers. The framework is very general as it allows to incorporate prior knowledge about tasks relationships if available, but is also able to identify task similarities in absence of such prior information. Both variants show promising results in applications from Computational Biology.</p

    Genomic and Geographic Context for the Evolution of High-Risk Carbapenem-Resistant Enterobacter cloacae Complex Clones ST171 and ST78

    Get PDF
    Recent reports have established the escalating threat of carbapenem-resistant Enterobacter cloacae complex (CREC). Here, we demonstrate that CREC has evolved as a highly antibiotic-resistant rather than highly virulent nosocomial pathogen. Applying genomics and Bayesian phylogenetic analyses to a 7-year collection of CREC isolates from a northern Manhattan hospital system and to a large set of publicly available, geographically diverse genomes, we demonstrate clonal spread of a single clone, ST171. We estimate that two major clades of epidemic ST171 diverged prior to 1962, subsequently spreading in parallel from the Northeastern to the Mid-Atlantic and Midwestern United States and demonstrating links to international sites. Acquisition of carbapenem and fluoroquinolone resistance determinants by both clades preceded widespread use of these drugs in the mid-1980s, suggesting that antibiotic pressure contributed substantially to its spread. Despite a unique mobile repertoire, ST171 isolates showed decreased virulence in vitro. While a second clone, ST78, substantially contributed to the emergence of CREC, it encompasses diverse carbapenemase-harboring plasmids, including a potentially hypertransmissible IncN plasmid, also present in other sequence types. Rather than heightened virulence, CREC demonstrates lineage-specific, multifactorial adaptations to nosocomial environments coupled with a unique potential to acquire and disseminate carbapenem resistance genes. These findings indicate a need for robust surveillance efforts that are attentive to the potential for local and international spread of high-risk CREC clones. IMPORTANCE Carbapenem-resistant Enterobacter cloacae complex (CREC) has emerged as a formidable nosocomial pathogen. While sporadic acquisition of plasmid-encoded carbapenemases has been implicated as a major driver of CREC, ST171 and ST78 clones demonstrate epidemic potential. However, a lack of reliable genomic references and rigorous statistical analyses has left many gaps in knowledge regarding the phylogenetic context and evolutionary pathways of successful CREC. Our reconstruction of recent ST171 and ST78 evolution represents a significant addition to current understanding of CREC and the directionality of its spread from the Eastern United States to the northern Midwestern United States with links to international collections. Our results indicate that the remarkable ability of E. cloacae to acquire and disseminate cross-class antibiotic resistance rather than virulence determinants, coupled with its ability to adapt under conditions of antibiotic pressure, likely led to the wide dissemination of CREC

    Vaccination with designed neopeptides induces intratumoral, cross-reactive CD4+ T cell responses in glioblastoma

    Full text link
    Purpose: The low mutational load of some cancers is considered one reason for the difficulties to develop effective tumor vaccines. To overcome this problem, we developed a strategy to design neopeptides through single amino acid mutation to enhance their immunogenicity. Experimental Design: Exome- and RNA sequencing as well as in silico HLA-binding predictions to autologous HLA molecules were used to identify candidate neopeptides. Subsequently, in silico HLA-anchor placements were used to deduce putative T cell receptor contacts of peptides. Single amino acids of TCR contacting residues were then mutated by amino acid replacements. Overall, 175 peptides were synthesized and sets of 25 each containing both peptides designed to bind to HLA class I and II molecules applied in the vaccination. Upon development of a tumor recurrence, the tumor-infiltrating lymphocytes (TILs) were characterized in detail both at the bulk and clonal level. Results: The immune response of peripheral blood T cells to vaccine peptides, including natural peptides and designed neopeptides, gradually increased with repetitive vaccination, but remained low. In contrast, at the time of tumor recurrence, CD8+ TILs and CD4+ TILs responded to 45% and 100% respectively of the vaccine peptides. Further, TIL-derived CD4+ T cell clones showed strong responses and tumor cell lysis not only against the designed neopeptide but also against the unmutated natural peptides of the tumor. Conclusions: Turning tumor self-peptides into foreign antigens by introduction of designed mutations is a promising strategy to induce strong intratumoral CD4+ T cell responses in a cold tumor like glioblastoma

    NGS-pipe: a flexible, easily extendable, and highly configurable framework for NGS analysis

    Get PDF
    Next-generation sequencing is now an established method in genomics, and massive amounts of sequencing data are being generated on a regular basis. Analysis of the sequencing data is typically performed by lab-specific in-house solutions, but the agreement of results from different facilities is often small. General standards for quality control, reproducibility, and documentation are missing.; We developed NGS-pipe, a flexible, transparent, and easy-to-use framework for the design of pipelines to analyze whole-exome, whole-genome, and transcriptome sequencing data. NGS-pipe facilitates the harmonization of genomic data analysis by supporting quality control, documentation, reproducibility, parallelization, and easy adaptation to other NGS experiments. https://github.com/cbg-ethz/NGS-pipe [email protected]

    BALL - biochemical algorithms library 1.3

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Biochemical Algorithms Library (BALL) is a comprehensive rapid application development framework for structural bioinformatics. It provides an extensive C++ class library of data structures and algorithms for molecular modeling and structural bioinformatics. Using BALL as a programming toolbox does not only allow to greatly reduce application development times but also helps in ensuring stability and correctness by avoiding the error-prone reimplementation of complex algorithms and replacing them with calls into the library that has been well-tested by a large number of developers. In the ten years since its original publication, BALL has seen a substantial increase in functionality and numerous other improvements.</p> <p>Results</p> <p>Here, we discuss BALL's current functionality and highlight the key additions and improvements: support for additional file formats, molecular edit-functionality, new molecular mechanics force fields, novel energy minimization techniques, docking algorithms, and support for cheminformatics.</p> <p>Conclusions</p> <p>BALL is available for all major operating systems, including Linux, Windows, and MacOS X. It is available free of charge under the Lesser GNU Public License (LPGL). Parts of the code are distributed under the GNU Public License (GPL). BALL is available as source code and binary packages from the project web site at <url>http://www.ball-project.org</url>. Recently, it has been accepted into the debian project; integration into further distributions is currently pursued.</p

    Characterization of a Novel Orthomyxo-like Virus Causing Mass Die-Offs of Tilapia

    Get PDF
    Tilapia are an important global food source due to their omnivorous diet, tolerance for high-density aquaculture, and relative disease resistance. Since 2009, tilapia aquaculture has been threatened by mass die-offs in farmed fish in Israel and Ecuador. Here we report evidence implicating a novel orthomyxo-like virus in these outbreaks. The tilapia lake virus (TiLV) has a 10-segment, negative-sense RNA genome. The largest segment, segment 1, contains an open reading frame with weak sequence homology to the influenza C virus PB1 subunit. The other nine segments showed no homology to other viruses but have conserved, complementary sequences at their 5′ and 3′ termini, consistent with the genome organization found in other orthomyxoviruses. In situ hybridization indicates TiLV replication and transcription at sites of pathology in the liver and central nervous system of tilapia with disease

    Generation of Priority Research Questions to Inform Conservation Policy and Management at a National Level

    Get PDF
    Integrating knowledge from across the natural and social sciences is necessary to effectively address societal tradeoffs between human use of biological diversity and its preservation. Collaborative processes can change the ways decision makers think about scientific evidence, enhance levels of mutual trust and credibility, and advance the conservation policy discourse. Canada has responsibility for a large fraction of some major ecosystems, such as boreal forests, Arctic tundra, wetlands, and temperate and Arctic oceans. Stressors to biological diversity within these ecosystems arise from activities of the country's resource-based economy, as well as external drivers of environmental change. Effective management is complicated by incongruence between ecological and political boundaries and conflicting perspectives on social and economic goals. Many knowledge gaps about stressors and their management might be reduced through targeted, timely research. We identify 40 questions that, if addressed or answered, would advance research that has a high probability of supporting development of effective policies and management strategies for species, ecosystems, and ecological processes in Canada. A total of 396 candidate questions drawn from natural and social science disciplines were contributed by individuals with diverse organizational affiliations. These were collaboratively winnowed to 40 by our team of collaborators. The questions emphasize understanding ecosystems, the effects and mitigation of climate change, coordinating governance and management efforts across multiple jurisdictions, and examining relations between conservation policy and the social and economic well-being of Aboriginal peoples. The questions we identified provide potential links between evidence from the conservation sciences and formulation of policies for conservation and resource management. Our collaborative process of communication and engagement between scientists and decision makers for generating and prioritizing research questions at a national level could be a model for similar efforts beyond Canada
    • …
    corecore