129 research outputs found

    Enhancing navigation in biomedical databases by community voting and database-driven text classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.</p> <p>Results</p> <p>Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly.</p> <p>Conclusion</p> <p>Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.</p> <p>The system can be accessed at <url>http://pepbank.mgh.harvard.edu</url>.</p

    Epidemiology of and surveillance for postpartum infections.

    Get PDF
    We screened automated ambulatory medical records, hospital and emergency room claims, and pharmacy records of 2,826 health maintenance organization (HMO) members who gave birth over a 30-month period. Full-text ambulatory records were reviewed for the 30-day postpartum period to confirm infection status for a weighted sample of cases. The overall postpartum infection rate was 6.0%, with rates of 7.4% following cesarean section and 5.5% following vaginal delivery. Rehospitalization; cesarean delivery; antistaphylococcal antibiotics; diagnosis codes for mastitis, endometritis, and wound infection; and ambulatory blood or wound cultures were important predictors of infection. Use of automated information routinely collected by HMOs and insurers allows efficient identification of postpartum infections not detected by conventional surveillance

    A joint individual-based model coupling growth and mortality reveals that tree vigor is a key component of tropical forest dynamics

    Get PDF
    Tree vigor is often used as a covariate when tree mortality is predicted from tree growth in tropical forest dynamic models, but it is rarely explicitly accounted for in a coherent modeling framework. We quantify tree vigor at the individual tree level, based on the difference between expected and observed growth. The available methods to join nonlinear tree growth and mortality processes are not commonly used by forest ecologists so that we develop an inference methodology based on an MCMC approach, allowing us to sample the parameters of the growth and mortality model according to their posterior distribution using the joint model likelihood. We apply our framework to a set of data on the 20-year dynamics of a forest in Paracou, French Guiana, taking advantage of functional trait-based growth and mortality models already developed independently. Our results showed that growth and mortality are intimately linked and that the vigor estimator is an essential predictor of mortality, highlighting that trees growing more than expected have a far lower probability of dying. Our joint model methodology is sufficiently generic to be used to join two longitudinal and punctual linked processes and thus may be applied to a wide range of growth and mortality models. In the context of global changes, such joint models are urgently needed in tropical forests to analyze, and then predict, the effects of the ongoing changes on the tree dynamics in hyperdiverse tropical forests. (RƩsumƩ d'auteur

    Evolved orthogonal ribosome purification for in vitro characterization

    Get PDF
    We developed orthogonal ribosomeāˆ’mRNA pairs in which the orthogonal ribosome (O-ribosome) specifically translates the orthogonal mRNA and the orthogonal mRNA is not a substrate for cellular ribosomes. O-ribosomes have been used to create new cellular circuits to control gene expression in new ways, they have been used to provide new information about the ribosome, and they form a crucial part of foundational technologies for genetic code expansion and encoded and evolvable polymer synthesis in cells. The production of O-ribosomes in the cell makes it challenging to study the properties of O-ribosomes in vitro, because no method exists to purify functional O-ribosomes from cellular ribosomes and other cellular components. Here we present a method for the affinity purification of O-ribosomes, via tagging of the orthogonal 16S ribosomal RNA. We demonstrate that the purified O-ribosomes are pure by primer extension assays, and structurally homogenous by gel electrophoresis and sucrose gradients. We demonstrate the utility of this purification method by providing a preliminary in vitro characterization of Ribo-X, an O-ribosome previously evolved for enhanced unnatural amino acid incorporation in response to amber codons. Our data suggest that the basis of Ribo-Xā€™s in vivo activity is a decreased affinity for RF1

    Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.</p> <p>Results</p> <p>The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential.</p> <p>Conclusion</p> <p>The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts.</p

    Annotating genes and genomes with DNA sequences extracted from biomedical articles

    Get PDF
    Motivation: Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and do not identify the exact genomic location of a study

    An on-bead tailing/ligation approach for sequencing resin-bound RNA libraries

    Get PDF
    Nucleic acids possess the unique property of being enzymatically amplifiable, and have therefore been a popular choice for the combinatorial selection of functional sequences, such as aptamers or ribozymes. However, amplification typically requires known sequence segments that serve as primer binding sites, which can be limiting for certain applications, like the screening of on-bead libraries. Here, we report a method to amplify and sequence on-bead RNA libraries that requires not more than five known nucleotides. A key element is the attachment of the starting nucleoside to the synthesis resin via the nucleobase, which leaves the 3ā€²-OH group accessible to subsequent enzymatic manipulations. After split-and-mix synthesis of the oligonucleotide library and deprotection, a poly(A)-tail can be efficiently added to this free 3ā€²-hydroxyl terminus by Escherichia coli poly(A) polymerase that serves as an anchored primer binding site for reverse transcription. The cDNA is joined to a DNA adapter by T4 DNA ligase. PCR amplification yielded single-band products that could be cloned and sequenced starting from individual polystyrene beads. The method described here makes the selection of functional RNAs from on-bead RNA libraries more attractive due to increased flexibility in library design, higher yields of full-length sequence on bead and robust sequence determination

    Structure of shocks in Burgers turbulence with L\'evy noise initial data

    Full text link
    We study the structure of the shocks for the inviscid Burgers equation in dimension 1 when the initial velocity is given by L\'evy noise, or equivalently when the initial potential is a two-sided L\'evy process Ļˆ0\psi_0. When Ļˆ0\psi_0 is abrupt in the sense of Vigon or has bounded variation with limā€‰supā”āˆ£hāˆ£ā†“0hāˆ’2Ļˆ0(h)=āˆž\limsup_{|h| \downarrow 0} h^{-2} \psi_0(h) = \infty, we prove that the set of points with zero velocity is regenerative, and that in the latter case this set is equal to the set of Lagrangian regular points, which is non-empty. When Ļˆ0\psi_0 is abrupt we show that the shock structure is discrete. When Ļˆ0\psi_0 is eroded we show that there are no rarefaction intervals.Comment: 22 page

    Do the rich get richer? Varying effects of tree species identity and diversity on the richness of understory taxa

    Get PDF
    Understory herbs and soil invertebrates play key roles in soil formation and nutrient cycling in forests. Studies suggest that diversity in the canopy and in the understory are positively associated, but these studies often confound the effects of tree species diversity with those of tree species identity and abiotic conditions. We combined extensive field sampling with structural equation modeling to evaluate the simultaneous effects of tree diversity on the species diversity of understory herbs, beetles, and earthworms. The diversity of earthworms and saproxylic beetles was directly and positively associated with tree diversity, presumably because species of both these taxa specialize on certain species of trees. Tree identity also strongly affected diversity in the understory, especially for herbs, likely as a result of interspecific differences in canopy light transmittance or litter decomposition rates. Our results suggest that changes in forest management will disproportionately affect certain understory taxa. For instance, changes in canopy diversity will affect the diversity of earthworms and saproxylic beetles more than changes in tree species composition, whereas the converse would be expected for understory herbs and detritivorous beetles. We conclude that the effects of tree diversity on understory taxa can vary from positive to negative and may affect biogeochemical cycling in temperate forests. Thus, maintaining high diversity in temperate forests can promote the diversity of multiple taxa in the understory
    • ā€¦
    corecore