176,286 research outputs found

    Memory-Efficient Topic Modeling

    Full text link
    As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be interpreted within a unified message passing framework. However, message passing requires storing previous messages with a large amount of memory space, increasing linearly with the number of documents or the number of topics. Therefore, the high memory usage is often a major problem for topic modeling of massive corpora containing a large number of topics. To reduce the space complexity, we propose a novel algorithm without storing previous messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP relates the message passing algorithms with the non-negative matrix factorization (NMF) algorithms, which absorb the message updating into the message passing process, and thus avoid storing previous messages. Experimental results on four large data sets confirm that TBP performs comparably well or even better than current state-of-the-art training algorithms for LDA but with a much less memory consumption. TBP can do topic modeling when massive corpora cannot fit in the computer memory, for example, extracting thematic topics from 7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure

    Systems Biology Graphical Notation: Entity Relationship language Level 1

    Get PDF
    Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialised notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Descriptions, the Entity Relationships and the Activity Flows. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage in textbooks and its teaching directly in high schools. The first level of the SBGN Entity Relationship language has been publicly released. Shared by the communities of biochemists, genomicians, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signalling pathways, metabolic networks and gene regulatory maps

    Systems Biology Graphical Notation: Activity Flow language Level 1

    Get PDF
    Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialized notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Descriptions, the Entity Relationships and the Activity Flows. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage ranging from textbooks and teaching in high schools to peer reviewed articles in scientific journals. The first level of the SBGN Activity Flow language has been publicly released. Shared by the communities of biochemists, genomic scientists, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signaling pathways, metabolic networks and gene regulatory maps

    Systems Biology Graphical Notation: Entity Relationship language Level 1 (Version 1.2)

    Get PDF
    Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialised notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Descriptions, the Entity Relationships and the Activity Flows. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage in textbooks and its teaching directly in high schools. The current document presents version 1.2 of the first level of the SBGN Entity Relationship language. Shared by the communities of biochemists, genomicians, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signaling pathways, metabolic networks and gene regulatory maps

    Virtual Fetal Pig Dissection As An Agent Of Knowledge Acquisition And Attitudinal Change In Female High School Biology Students

    Get PDF
    One way to determine if all students can learn through the use of computers is to introduce a lesson taught completely via computers and compare the results with those gained when the same lesson is taught in a traditional manner. This study attempted to determine if a virtual fetal pig dissection can be used as a viable alternative for an actual dissection for females enrolled in high school biology classes by comparing the knowledge acquisition and attitudinal change between the experimental (virtual dissection) and control (actual dissection) groups. Two hundred and twenty four students enrolled in biology classes in a suburban all-girl parochial high school participated in this study. Female students in an all-girl high school were chosen because research shows differences in science competency and computer usage between the genders that may mask the performance of females on computer-based tasks in a science laboratory exercise. Students who completed the virtual dissection scored significantly higher on practical test and objective tests that were used to measure knowledge acquisition. Attitudinal change was measured by examining the students\u27 attitudes toward dissections, computer usage in the classroom, and toward biology both before and after the dissections using pre and post surveys. Significant results in positive gain scores were found in the virtual dissection group\u27s attitude toward dissections, and their negative gain score toward virtual dissections. Attitudinal changes toward computers and biology were not significant. A purposefully selected sample of the students were interviewed, in addition to gathering a sample of the students\u27 daily dissection journals, as data highlighting their thoughts and feelings about their dissection experience. Further research is suggested to determine if a virtual laboratory experience can be a substitute for actual dissections, or may serve as an enhancement to an actual dissection

    Systems Biology Graphical Notation: Process Description language Level 1

    Get PDF
    Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialised notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Diagrams, the Entity Relationship Diagrams and the Activity Flow Diagrams. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage in textbooks and its teaching directly in high schools. The first level of the SBGN Process Diagram has been publicly released. Software support for SBGN Process Diagram was developed concurrently with its specification in order to speed-up public adoption. Shared by the communities of biochemists, genomicians, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signalling pathways, metabolic networks and gene regulatory maps

    Systems Biology Graphical Notation: Process Diagram Level 1

    Get PDF
    Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialised notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Diagrams, the Entity Relationship Diagrams and the Activity Flow Diagrams. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage in textbooks and its teaching directly in high schools. The first level of the SBGN Process Diagram has been publicly released. Software support for SBGN Process Diagram was developed concurrently with its specification in order to speed-up public adoption. Shared by the communities of biochemists, genomicians, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signalling pathways, metabolic networks and gene regulatory maps

    Systems Biology Graphical Notation: Process Description language Level 1

    Get PDF
    Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialised notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Diagrams, the Entity Relationship Diagrams and the Activity Flow Diagrams. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage in textbooks and its teaching directly in high schools. The first level of the SBGN Process Diagram has been publicly released. Software support for SBGN Process Diagram was developed concurrently with its specification in order to speed-up public adoption. Shared by the communities of biochemists, genomicians, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signalling pathways, metabolic networks and gene regulatory maps

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog
    • …
    corecore