39 research outputs found
Manifolds.jl: An Extensible Julia Framework for Data Analysis on Manifolds
For data given on a nonlinear space, like angles, symmetric positive
matrices, the sphere, or the hyperbolic space, there is often enough structure
to form a Riemannian manifold. We present the Julia package Manifolds.jl,
providing a fast and easy to use library of Riemannian manifolds and Lie
groups. We introduce a common interface, available in ManifoldsBase.jl, with
which new manifolds, applications, and algorithms can be implemented. We
demonstrate the utility of Manifolds.jl using B\'ezier splines, an optimization
task on manifolds, and a principal component analysis on nonlinear data. In a
benchmark, Manifolds.jl outperforms existing packages in Matlab or Python by
several orders of magnitude and is about twice as fast as a comparable package
implemented in C++
Comparative analysis of carboxysome shell proteins
Carboxysomes are metabolic modules for CO2 fixation that are found in all cyanobacteria and some chemoautotrophic bacteria. They comprise a semi-permeable proteinaceous shell that encapsulates ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) and carbonic anhydrase. Structural studies are revealing the integral role of the shell protein paralogs to carboxysome form and function. The shell proteins are composed of two domain classes: those with the bacterial microcompartment (BMC; Pfam00936) domain, which oligomerize to form (pseudo)hexamers, and those with the CcmL/EutN (Pfam03319) domain which form pentamers in carboxysomes. These two shell protein types are proposed to be the basis for the carboxysome’s icosahedral geometry. The shell proteins are also thought to allow the flux of metabolites across the shell through the presence of the small pore formed by their hexameric/pentameric symmetry axes. In this review, we describe bioinformatic and structural analyses that highlight the important primary, tertiary, and quaternary structural features of these conserved shell subunits. In the future, further understanding of these molecular building blocks may provide the basis for enhancing CO2 fixation in other organisms or creating novel biological nanostructures
Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum
Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies’ publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010’s top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century [1]. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question [1–12] in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologies in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences [13–16]. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [ www.pkal.org])
Author response: Determination of ubiquitin fitness landscapes under different chemical stresses in a classroom setting
Frequency effects in linear discriminative learning
Word frequency is a strong predictor in most lexical processing tasks. Thus,
any model of word recognition needs to account for how word frequency effects
arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019)
models lexical processing with linear mappings between words' forms and their
meanings. So far, the mappings can either be obtained incrementally via
error-driven learning, a computationally expensive process able to capture
frequency effects, or in an efficient, but frequency-agnostic solution
modelling the theoretical endstate of learning (EL) where all words are learned
optimally. In this study we show how an efficient, yet frequency-informed
mapping between form and meaning can be obtained (Frequency-informed learning;
FIL). We find that FIL well approximates an incremental solution while being
computationally much cheaper. FIL shows a relatively low type- and high
token-accuracy, demonstrating that the model is able to process most word
tokens encountered by speakers in daily life correctly. We use FIL to model
reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find
that FIL predicts well the S-shaped relationship between frequency and the mean
of reaction times but underestimates the variance of reaction times for low
frequency words. FIL is also better able to account for priming effects in an
auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL.
Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006)
to compare mappings obtained with FIL and incremental learning. The mappings
are highly correlated, but with FIL some nuances based on word ordering effects
are lost. Our results show how frequency effects in a learning model can be
simulated efficiently, and raise questions about how to best account for
low-frequency words in cognitive models.Comment: 32 pages, 12 figures, 3 tables; revised versio
Bacterial phyla tree with distribution of BMC locus types.
<p>The classified BMC locus types, excluding satellite and satellite-like loci, denoted as colored shapes are adjacent to the phyla in which they appear. For a given phylum, the shape of the triangular wedge represents sequence diversity; the nearest edge represents the shortest branch length from the phylum node to a leaf, while the farthest edge represents the longest branch length from the phylum node to a leaf. Phyla marked with an asterisk (*) are not in NR but contain BMC loci; the data were retrieved from IMG (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003898#s2" target="_blank">Materials and Methods</a>). Phylum tree based on <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003898#pcbi.1003898-Rinke1" target="_blank">[52]</a> with expansion by Christian Rinke.</p
A Taxonomy of Bacterial Microcompartment Loci Constructed by a Novel Scoring Method
<div><p>Bacterial microcompartments (BMCs) are proteinaceous organelles involved in both autotrophic and heterotrophic metabolism. All BMCs share homologous shell proteins but differ in their complement of enzymes; these are typically encoded adjacent to shell protein genes in genetic loci, or operons. To enable the identification and prediction of functional (sub)types of BMCs, we developed LoClass, an algorithm that finds putative BMC loci and inventories, weights, and compares their constituent pfam domains to construct a locus similarity network and predict locus (sub)types. In addition to using LoClass to analyze sequences in the Non-redundant Protein Database, we compared predicted BMC loci found in seven candidate bacterial phyla (six from single-cell genomic studies) to the LoClass taxonomy. Together, these analyses resulted in the identification of 23 different types of BMCs encoded in 30 distinct locus (sub)types found in 23 bacterial phyla. These include the two carboxysome types and a divergent set of metabolosomes, BMCs that share a common catalytic core and process distinct substrates via specific signature enzymes. Furthermore, many Candidate BMCs were found that lack one or more core metabolosome components, including one that is predicted to represent an entirely new paradigm for BMC-associated metabolism, joining the carboxysome and metabolosome. By placing these results in a phylogenetic context, we provide a framework for understanding the horizontal transfer of these loci, a starting point for studies aimed at understanding the evolution of BMCs. This comprehensive taxonomy of BMC loci, based on their constituent protein domains, foregrounds the functional diversity of BMCs and provides a reference for interpreting the role of BMC gene clusters encoded in isolate, single cell, and metagenomic data. Many loci encode ancillary functions such as transporters or genes for cofactor assembly; this expanded vocabulary of BMC-related functions should be useful for design of genetic modules for introducing BMCs in bioengineering applications.</p></div
