89 research outputs found
A Statistical Model of Protein Sequence Similarity and Function Similarity Reveals Overly-Specific Function Predictions
BACKGROUND:Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity. METHODOLOGY:Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity. SIGNIFICANCE:Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e(-62), non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e(-05), NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function
Why Should We Care About The Gender Difference in Classroom Participation?
It has been commonly understood and empirically demonstrated that co-ed undergraduate and graduate classrooms tend to derive greater male class participation from course discussions than from female students. Despite their equal ability to contribute substantial comments toward course topics female students still do so less frequently. In this paper, I present a normative argument for why we should care about this occurrence because of its impact on facets of a woman’s personal and professional development
MOPED: Model Organism Protein Expression Database
Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43 000 proteins with at least one spectral match and more than 11 million high certainty spectra
Dynamic Proteomic Analysis of Pancreatic Mesenchyme Reveals Novel Factors That Enhance Human Embryonic Stem Cell to Pancreatic Cell Differentiation
Current approaches in human embryonic stem cell (hESC) to pancreatic beta cell differentiation have largely been based on knowledge gained from developmental studies of the epithelial pancreas, while the potential roles of other supporting tissue compartments have not been fully explored. One such tissue is the pancreatic mesenchyme that supports epithelial organogenesis throughout embryogenesis. We hypothesized that detailed characterization of the pancreatic mesenchyme might result in the identification of novel factors not used in current differentiation protocols. Supplementing existing hESC differentiation conditions with such factors might create a more comprehensive simulation of normal development in cell culture. To validate our hypothesis, we took advantage of a novel transgenic mouse model to isolate the pancreatic mesenchyme at distinct embryonic and postnatal stages for subsequent proteomic analysis. Refined sample preparation and analysis conditions across four embryonic and prenatal time points resulted in the identification of 21,498 peptides with high-confidence mapping to 1,502 proteins. Expression analysis of pancreata confirmed the presence of three potentially important factors in cell differentiation: Galectin-1 (LGALS1), Neuroplastin (NPTN), and the Laminin α-2 subunit (LAMA2). Two of the three factors (LGALS1 and LAMA2) increased expression of pancreatic progenitor transcript levels in a published hESC to beta cell differentiation protocol. In addition, LAMA2 partially blocks cell culture induced beta cell dedifferentiation. Summarily, we provide evidence that proteomic analysis of supporting tissues such as the pancreatic mesenchyme allows for the identification of potentially important factors guiding hESC to pancreas differentiation
On the Energy Spectra of GeV/TeV Cosmic Ray Leptons
Recent observations of cosmic ray electrons from several instruments have
revealed various degrees of deviation in the measured electron energy
distribution from a simple power-law, in a form of an excess around TeV
energies. An even more prominent deviation has been observed in the fraction of
cosmic ray positrons around 100 GeV energies. In this paper we show that the
observed excesses in the electron spectrum may be easily re-produced without
invoking any unusual sources other than the general diffuse Galactic components
of cosmic rays. The primary physical effect involved is the Klein-Nishina
suppression of the electron cooling rate around TeV energies. With a very
reasonable choice of the model parameters characterizing the local interstellar
medium, we can reproduce the most recent observations by Fermi and HESS
experiments. We also find that high positron fraction increasing with energy,
as claimed by the PAMELA experiment, cannot be explained in our model with the
conservative set of the model parameters. We are able, however, to reproduce
the PAMELA results assuming high values of the starlight and interstellar gas
densities, which would be more appropriate for vicinities of supernova
remnants. A possible solution to this problem may be that cosmic rays undergo
most of their interactions near their sources due to the efficient trapping in
the far upstream of supernova shocks by self-generated, cosmic ray-driven
turbulence.Comment: 31 pages, accepted for publication in ApJ (abstract abridged for
arXiv
Vertebrate Natural History Notes from Arkansas, 2020
Smaller details of natural history often go undocumented to science if those details are not parts of larger studies, but small details can provide insights that lead to interesting questions about ecological relationships or environmental change. We have compiled recent important observations of distribution and reproduction of fishes and mammals. Included are new distributional records of mammals, and observations of reproduction in several mammals for which few data exist in Arkansas. A rare record of the Long-tailed weasel, a special of special concern in Arkansas, is documented from Newton Co. We also provide evidence that Seminole bats likely reproduce in Arkansas
The United States of America and Scientific Research
To gauge the current commitment to scientific research in the United States of America (US), we compared federal research funding (FRF) with the US gross domestic product (GDP) and industry research spending during the past six decades. In order to address the recent globalization of scientific research, we also focused on four key indicators of research activities: research and development (R&D) funding, total science and engineering doctoral degrees, patents, and scientific publications. We compared these indicators across three major population and economic regions: the US, the European Union (EU) and the People's Republic of China (China) over the past decade. We discovered a number of interesting trends with direct relevance for science policy. The level of US FRF has varied between 0.2% and 0.6% of the GDP during the last six decades. Since the 1960s, the US FRF contribution has fallen from twice that of industrial research funding to roughly equal. Also, in the last two decades, the portion of the US government R&D spending devoted to research has increased. Although well below the US and the EU in overall funding, the current growth rate for R&D funding in China greatly exceeds that of both. Finally, the EU currently produces more science and engineering doctoral graduates and scientific publications than the US in absolute terms, but not per capita. This study's aim is to facilitate a serious discussion of key questions by the research community and federal policy makers. In particular, our results raise two questions with respect to: a) the increasing globalization of science: “What role is the US playing now, and what role will it play in the future of international science?”; and b) the ability to produce beneficial innovations for society: “How will the US continue to foster its strengths?
Recommended from our members
Application and computation of likelihood methods for regression with measurement error
This thesis advocates the use of maximum likelihood analysis for generalized
regression models with measurement error in a single explanatory variable. This will be
done first by presenting a computational algorithm and the numerical details for carrying
out this algorithm on a wide variety of models. The computational methods will be based
on the EM algorithm in conjunction with the use of Gauss-Hermite quadrature to
approximate integrals in the E-step. Second, this thesis will demonstrate the relative
superiority of likelihood-ratio tests and confidence intervals over those based on
asymptotic normality of estimates and standard errors, and that likelihood methods may
be more robust in these situations than previously thought. The ability to carry out
likelihood analysis under a wide range of distributional assumptions, along with the
advantages of likelihood ratio inference and the encouraging robustness results make
likelihood analysis a practical option worth considering in regression problems with
explanatory variable measurement error
Handbook of uncertainty quantification
The topic of Uncertainty Quantification (UQ) has witnessed massive developments in response to the promise of achieving risk mitigation through scientific prediction. It has led to the integration of ideas from mathematics, statistics and engineering being used to lend credence to predictive assessments of risk but also to design actions (by engineers, scientists and investors) that are consistent with risk aversion. The objective of this Handbook is to facilitate the dissemination of the forefront of UQ ideas to their audiences. We recognize that these audiences are varied, with interests ranging from theory to application, and from research to development and even execution
- …