297 research outputs found

    Efficient Training of Graph-Regularized Multitask SVMs

    Full text link
    We present an optimization framework for graph-regularized multi-task SVMs based on the primal formulation of the problem. Previous approaches employ a so-called multi-task kernel (MTK) and thus are inapplicable when the numbers of training examples n is large (typically n < 20,000, even for just a few tasks). In this paper, we present a primal optimization criterion, allowing for general loss functions, and derive its dual representation. Building on the work of Hsieh et al. [1,2], we derive an algorithm for optimizing the large-margin objective and prove its convergence. Our computational experiments show a speedup of up to three orders of magnitude over LibSVM and SVMLight for several standard benchmarks as well as challenging data sets from the application domain of computational biology. Combining our optimization methodology with the COFFIN large-scale learning framework [3], we are able to train a multi-task SVM using over 1,000,000 training points stemming from 4 different tasks. An efficient C++ implementation of our algorithm is being made publicly available as a part of the SHOGUN machine learning toolbox [4]

    Chikungunya risk assessment for europe: recommendations for action

    Get PDF
    Since March 2005, 255 000 cases of chikungunya fever are estimated to have occurred on the island of Réunion, a French overseas department in the Indian Ocean [1]. An huge increase in estimated cases occurred at the end of December 2005, culminating in an estimated peak incidence of more than 40 000 cases in week 5 of 2006 [2]. Since then, the estimated weekly incidence trend is downwards, although there have been an estimated 3000 new cases per week since week 13 of 2006. In total, 213 deaths have been linked to the disease [1]. In Mayotte, the nearby French territorial collectivity, 5834 cases have been notified [3]. Chikungunya cases have also been reported on other islands in the Indian Ocean, and imported cases have been confirmed in several European countrie

    jFuzzyLogic: a Java Library to Design Fuzzy Logic Controllers According to the Standard for Fuzzy Control Programming

    Get PDF
    Fuzzy Logic Controllers are a specific model of Fuzzy Rule Based Systems suitable for engineering applications for which classic control strategies do not achieve good results or for when it is too difficult to obtain a mathematical model. Recently, the International Electrotechnical Commission has published a standard for fuzzy control programming in part 7 of the IEC 61131 norm in order to offer a well defined common understanding of the basic means with which to integrate fuzzy control applications in control systems. In this paper, we introduce an open source Java library called jFuzzyLogic which offers a fully functional and complete implementation of a fuzzy inference system according to this standard, providing a programming interface and Eclipse plugin to easily write and test code for fuzzy control applications. A case study is given to illustrate the use of jFuzzyLogic.McGill Uninversity, Genome QuebecSpanish Government TIN2011-28488Andalusian Government P10-TIC-685

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

    Get PDF
    Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom

    Considerations for best practices in studies of fiber or other dietary components and the intestinal microbiome

    Get PDF
    Considerations for best practices in studies of fiber or other dietary components and the intestinal microbiome. Am J Physiol Endocrinol Metab 315: E1087–E1097, 2018. First published August 21, 2018; doi:10.1152/ajpendo.00058.2018.—A 2-day workshop organized by the National Institutes of Health and U.S. Department of Agriculture included 16 presentations focused on the role of diet in alterations of the gastrointestinal microbiome, primarily that of the colon. Although thousands of research projects have been funded by U.S. federal agencies to study the intestinal microbiome of humans and a variety of animal models, only a minority addresses dietary effects, and a small subset is described in sufficient detail to allow reproduction of a study. Whereas there are standards being developed for many aspects of microbiome studies, such as sample collection, nucleic acid extraction, data handling, etc., none has been proposed for the dietary component; thus this workshop focused on the latter specific point. It is important to foster rigor in design and reproducibility of published studies to maintain high quality and enable designs that can be compared in systematic reviews. Speakers addressed the influence of the structure of the fermentable carbohydrate on the microbiota and the variables to consider in design of studies using animals, in vitro models, and human subjects. For all types of studies, strengths and weaknesses of various designs were highlighted, and for human studies, comparisons between controlled feeding and observational designs were discussed. Because of the lack of published, best-diet formulations for specific research questions, the main recommendation is to describe dietary ingredients and treatments in as much detail as possible to allow reproduction by other scientists

    Generating Explainable and Effective Data Descriptors Using Relational Learning: Application to Cancer Biology

    Get PDF
    The key to success in machine learning is the use of effective data representations. The success of deep neural networks (DNNs) is based on their ability to utilize multiple neural network layers, and big data, to learn how to convert simple input representations into richer internal representations that are effective for learning. However, these internal representations are sub-symbolic and difficult to explain. In many scientific problems explainable models are required, and the input data is semantically complex and unsuitable for DNNs. This is true in the fundamental problem of understanding the mechanism of cancer drugs, which requires complex background knowledge about the functions of genes/proteins, their cells, and the molecular structure of the drugs. This background knowledge cannot be compactly expressed propositionally, and requires at least the expressive power of Datalog. Here we demonstrate the use of relational learning to generate new data descriptors in such semantically complex background knowledge. These new descriptors are effective: adding them to standard propositional learning methods significantly improves prediction accuracy. They are also explainable, and add to our understanding of cancer. Our approach can readily be expanded to include other complex forms of background knowledge, and combines the generality of relational learning with the efficiency of standard propositional learning

    Expression of Colonization Factor CS5 of Enterotoxigenic Escherichia coli (ETEC) Is Enhanced In Vivo and by the Bile Component Na Glycocholate Hydrate

    Get PDF
    Enterotoxigenic Escherichia coli (ETEC) is an important cause of acute watery diarrhoea in developing countries. Colonization factors (CFs) on the bacterial surface mediate adhesion to the small intestinal epithelium. Two of the most common CFs worldwide are coli surface antigens 5 and 6 (CS5, CS6). In this study we investigated the expression of CS5 and CS6 in vivo, and the effects of bile and sodium bicarbonate, present in the human gut, on the expression of CS5. Five CS5+CS6 ETEC isolates from adult Bangladeshi patients with acute diarrhoea were studied. The level of transcription from the CS5 operon was approximately 100-fold higher than from the CS6 operon in ETEC bacteria recovered directly from diarrhoeal stool without sub-culturing (in vivo). The glyco-conjugated primary bile salt sodium glycocholate hydrate (NaGCH) induced phenotypic expression of CS5 in a dose-dependent manner and caused a 100-fold up-regulation of CS5 mRNA levels; this is the first description of NaGCH as an enteropathogenic virulence inducer. The relative transcription levels from the CS5 and CS6 operons in the presence of bile or NaGCH in vitro were similar to those in vivo. Another bile salt, sodium deoxycholate (NaDC), previously reported to induce enteropathogenic virulence, also induced expression of CS5, whereas sodium bicarbonate did not

    Sequence-specific antimicrobials using efficiently delivered RNA-guided nucleases

    Get PDF
    Current antibiotics tend to be broad spectrum, leading to indiscriminate killing of commensal bacteria and accelerated evolution of drug resistance. Here, we use CRISPR-Cas technology to create antimicrobials whose spectrum of activity is chosen by design. RNA-guided nucleases (RGNs) targeting specific DNA sequences are delivered efficiently to microbial populations using bacteriophage or bacteria carrying plasmids transmissible by conjugation. The DNA targets of RGNs can be undesirable genes or polymorphisms, including antibiotic resistance and virulence determinants in carbapenem-resistant Enterobacteriaceae and enterohemorrhagic Escherichia coli. Delivery of RGNs significantly improves survival in a Galleria mellonella infection model. We also show that RGNs enable modulation of complex bacterial populations by selective knockdown of targeted strains based on genetic signatures. RGNs constitute a class of highly discriminatory, customizable antimicrobials that enact selective pressure at the DNA level to reduce the prevalence of undesired genes, minimize off-target effects and enable programmable remodeling of microbiota.National Institutes of Health (U.S.) (New Innovator Award 1DP2OD008435)National Centers for Systems Biology (U.S.) (Grant 1P50GM098792)United States. Defense Threat Reduction Agency (HDTRA1-14-1-0007)Massachusetts Institute of Technology. Institute for Soldier Nanotechnologies (W911NF13D0001)National Institute of General Medical Sciences (U.S.) (Interdepartmental Biotechnology Training Program 5T32 GM008334)Fonds de la recherche en sante du Quebec (Master's Training Award
    • …
    corecore