Search CORE

152 research outputs found

Complet+: a computationally scalable method to improve completeness of large-scale protein sequence clustering

Author: Nguyen Rachel
Polikar Robi
Rosen Gail L.
Sokhansanj Bahrad A.
Publication venue: Rowan Digital Works
Publication date: 02/02/2023
Field of study

A major challenge for clustering algorithms is to balance the trade-off between homogeneity, i.e., the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative in grouping sequences with other sequences. Remote homologs may fail to be clustered together and instead form unnecessarily distinct clusters. The resulting clusters have high homogeneity but completeness that is too low. We propose Complet+, a computationally scalable post-processing method to increase the completeness of clusters without an undue cost in homogeneity. Complet+ proves to effectively merge closely-related clusters of protein that have verified structural relationships in the SCOPe classification scheme, improving the completeness of clustering results at little cost to homogeneity. Applying Complet+ to clusters obtained using MMseqs2’s clusterupdate achieves an increased V-measure of 0.09 and 0.05 at the SCOPe superfamily and family levels, respectively. Complet+ also creates more biologically representative clusters, as shown by a substantial increase in Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI) metrics when comparing predicted clusters to biological classifications. Complet+ similarly improves clustering metrics when applied to other methods, such as CD-HIT and linclust. Finally, we show that Complet+ runtime scales linearly with respect to the number of clusters being post-processed on a COG dataset of over 3 million sequences

Rowan University

Fizzy: feature subset selection for metagenomics

Author: Gail L. Rosen
Gregory Ditzler
J. Calvin Morrison
Yemin Lan
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection - a sub-field of machine learning - can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. RESULTS: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. CONCLUSIONS: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

Crossref

Springer - Publisher Connector

PubMed Central

The University of Arizona

Rowan University

NBC update: The addition of viral and fungal databases to the Naïve Bayes classification tool

Author: A Jumpponen
AS Amend
D Hansen
DC Richter
DL Taylor
F Meyer
Gail L Rosen
GL Rosen
GL Rosen
IC Andersson
M Ghannoum
M Hamady
P Baldrian
Q Wang
RA Edwards
RE Ley
RH Nilson
SF Chen
Tze Yee Lim
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Classifying the fungal and viral content of a sample is an important component of analyzing microbial communities in environmental media. Therefore, a method to classify any fragment from these organisms' DNA should be implemented. Results We update the näive Bayes classification (NBC) tool to classify reads originating from viral and fungal organisms. NBC classifies a fungal dataset similarly to Basic Local Alignment Search Tool (BLAST) and the Ribosomal Database Project (RDP) classifier. We also show NBC's similarities and differences to RDP on a fungal large subunit (LSU) ribosomal DNA dataset. For viruses in the training database, strain classification accuracy is 98%, while for those reads originating from sequences not in the database, the order-level accuracy is 78%, where order indicates the taxonomic level in the tree of life. Conclusions In addition to being competitive to other classifiers available, NBC has the potential to handle reads originating from any location in the genome. We recommend using the Bacteria/Archaea, Fungal, and Virus databases separately due to algorithmic biases towards long genomes. The tool is publicly available at: <url>http://nbc.ece.drexel.edu</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms

Author: A Brady
A Ulrich
AA Navarette
B Pace
D Wu
E Stackebrandt
EJ Biers
Gail L. Rosen
GL Rosen
GL Rosen
GL Rosen
H Zhang
J Peplies
Jack Anthony Gilbert
James R. Cole
JM Janda
JR Cole
L Li
Mignard
ML Sogin
PCY Woo
PE Galand
Q Wang
Qiong Wang
R Sandberg
RE Ley
RT Jones
TJ Sharpton
V Lazarevic
W Li
Yemin Lan
Publication venue: Public Library of Science
Publication date: 05/03/2012
Field of study

BACKGROUND: Currently, the naïve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa. PRINCIPAL FINDINGS: Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we "train" a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naïve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3-15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present). CONCLUSIONS: We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are "highly" similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database "doubles" in the future

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Cerebrospinal fluid levels of opioid peptides in fibromyalgia and chronic low back pain

Author: A Rosen
Daniel J Clauw
DX Lui
E Lapossy
F Wolfe
F Wolfe
Gail Whalen
H Brooks
H Vaeroy
HA Smythe
James N Baraniuk
JE Ware
Jill Cunningham
JN Baraniuk
K Fukuda
L Terenius
LA Bradley
M Spetea
MB Yunus
RC Coghill
RH Gracely
S Lyrenaes
T Giesecke
T Yoshimasa
Z Liu
Publication venue: BioMed Central
Publication date: 01/12/2004
Field of study

BACKGROUND: The mechanism(s) of nociceptive dysfunction and potential roles of opioid neurotransmitters are unresolved in the chronic pain syndromes of fibromyalgia and chronic low back pain. METHODS: History and physical examinations, tender point examinations, and questionnaires were used to identify 14 fibromyalgia, 10 chronic low back pain and 6 normal control subjects. Lumbar punctures were performed. Met-enkephalin-Arg(6)-Phe(7 )(MEAP) and nociceptin immunoreactive materials were measured in the cerebrospinal fluid by radioimmunoassays. RESULTS: Fibromyalgia (117.6 pg/ml; 85.9 to 149.4; mean, 95% C.I.; p = 0.009) and low back pain (92.3 pg/ml; 56.9 to 127.7; p = 0.049) groups had significantly higher MEAP than the normal control group (35.7 pg/ml; 15.0 to 56.5). MEAP was inversely correlated to systemic pain thresholds. Nociceptin was not different between groups. Systemic Complaints questionnaire responses were significantly ranked as fibromyalgia > back pain > normal. SF-36 domains demonstrated severe disability for the low back pain group, intermediate results in fibromyalgia, and high function in the normal group. CONCLUSIONS: Fibromyalgia was distinguished by higher cerebrospinal fluid MEAP, systemic complaints, and manual tender points; intermediate SF-36 scores; and lower pain thresholds compared to the low back pain and normal groups. MEAP and systemic pain thresholds were inversely correlated in low back pain subjects. Central nervous system opioid dysfunction may contribute to pain in fibromyalgia

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Candidate Variants in DNA Replication and Repair Genes in Early-Onset Renal Cell Carcinoma Patients Referred for Germline Testing

Author: Andrake Mark D.
Arora Sanjeevani
Chen David Y.T.
Daly Mary B.
Demidova Elena V.
Dunbrack Roland L.
Golemis Erica A.
Hall Michael J.
Hartman Tiffiney R.
Kelow Simon
Kent Tatiana
Pomerantz Richard T.
Rosen Gail L.
Serebriiskii Ilya G.
Virtucio James
Vlasenkova Ramilia
Publication venue: Jefferson Digital Commons
Publication date: 24/04/2023
Field of study

Background: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. Methods: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. Results: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. Conclusions: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC

Jefferson Digital Commons

An Economic Approach to the Law of Evidence

Author: A Kim
Christine See
Colin Camerer
Faigman
Gail S Goodman
John D Jackson
Kelman
Kerameus
Richard A. Posner
Rosen V. Ciba-Geigy Corp
Saks
See
See
See
See Daniel
See Deanne
See Susan
Vernon L See
W Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/1999
Field of study

Crossref

Combining gene prediction methods to improve metagenomic gene annotation

Author: A Delcher
CB Burge
E Birney
ER Mardis
G Parra
Gail L Rosen
H Noguchi
H Noguchi
J Besemer
J Besemer
J Handelsman
JE Allen
KJ Hoff
L Kuncheva
L Taher
L Xu
M Stanke
MG Reese
ML Metzker
N Yok
Non G Yok
PJ Turnbaugh
R Polikar
RJ Taft
SE Ahnert
SF Altschul
SP Shah
T Yada
V Pavlovic
W Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref