Search CORE

5,071 research outputs found

Insights into Analogy Completion from the Biomedical Domain

Author: Fosler-Lussier Eric
Lai Albert M
Newman-Griffis Denis
Publication venue
Publication date: 01/01/2017
Field of study

Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains. Through an analysis of analogies in the biomedical domain, we identify three assumptions: that of a Single Answer for any given analogy, that the pairs involved describe the Same Relationship, and that each pair is Informative with respect to the other. We propose modifying the standard methodology to relax these assumptions by allowing for multiple correct answers, reporting MAP and MRR in addition to accuracy, and using multiple example pairs. We further present BMASS, a novel dataset for evaluating linguistic regularities in biomedical embeddings, and demonstrate that the relationships described in the dataset pose significant semantic challenges to current word embedding methods.Comment: Accepted to BioNLP 2017. (10 pages

arXiv.org e-Print Archive

Crossref

White Rose Research Online

Jointly Embedding Entities and Text with Distant Supervision

Author: Fosler-Lussier Eric
Lai Albert M.
Newman-Griffis Denis
Publication venue
Publication date: 01/01/2018
Field of study

Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.Comment: 12 pages; Accepted to 3rd Workshop on Representation Learning for NLP (Repl4NLP 2018). Code at https://github.com/OSU-slatelab/JE

arXiv.org e-Print Archive

Crossref

White Rose Research Online

Enabling qualitative research data sharing using a natural language processing pipeline for deidentification: Moving beyond HIPAA Safe Harbor identifiers

Author: DuBois James M
Gupta Aditi
Lai Albert
Ma Xiaoteng
Mozersky Jessica
Walsh Heidi
Publication venue: Digital Commons@Becker
Publication date: 01/07/2021
Field of study

OBJECTIVE: Sharing health research data is essential for accelerating the translation of research into actionable knowledge that can impact health care services and outcomes. Qualitative health research data are rarely shared due to the challenge of deidentifying text and the potential risks of participant reidentification. Here, we establish and evaluate a framework for deidentifying qualitative research data using automated computational techniques including removal of identifiers that are not considered HIPAA Safe Harbor (HSH) identifiers but are likely to be found in unstructured qualitative data. MATERIALS AND METHODS: We developed and validated a pipeline for deidentifying qualitative research data using automated computational techniques. An in-depth analysis and qualitative review of different types of qualitative health research data were conducted to inform and evaluate the development of a natural language processing (NLP) pipeline using named-entity recognition, pattern matching, dictionary, and regular expression methods to deidentify qualitative texts. RESULTS: We collected 2 datasets with 1.2 million words derived from over 400 qualitative research data documents. We created a gold-standard dataset with 280K words (70 files) to evaluate our deidentification pipeline. The majority of identifiers in qualitative data are non-HSH and not captured by existing systems. Our NLP deidentification pipeline had a consistent F1-score of ∼0.90 for both datasets. CONCLUSION: The results of this study demonstrate that NLP methods can be used to identify both HSH identifiers and non-HSH identifiers. Automated tools to assist researchers with the deidentification of qualitative data will be increasingly important given the new National Institutes of Health (NIH) data-sharing mandate

Digital Commons@Becker

PubMed Central

Recommended from our members

Diffusion MR Characteristics Following Concurrent Radiochemotherapy Predicts Progression-Free and Overall Survival in Newly Diagnosed Glioblastoma.

Author: Chang Warren
Cloughesy Timothy F
Ellingson Benjamin M
Hardy Anthony J
Harris Robert J
Lai Albert
Leu Kevin
Mody Reema R
Nghiemphu Phioanh L
Pope Whitney B
Publication venue: eScholarship, University of California
Publication date: 01/09/2015
Field of study

The standard of care for newly diagnosed glioblastoma (GBM) is surgery, then radiotherapy (RT) with concurrent temozolomide (TMZ), followed by adjuvant TMZ. We hypothesized patients with low diffusivity measured using apparent diffusion coefficient (ADC) histogram analysis evaluated after RT+TMZ, prior to adjuvant TMZ, would have a significantly shorter progression-free (PFS) and overall survival (OS). To test this hypothesis we evaluated 120 patients with newly diagnosed GBM receiving RT+TMZ followed by adjuvant TMZ. MRI was performed after completion of RT+TMZ, prior to initiation of adjuvant TMZ. A double Gaussian mixed model was used to describe the ADC histograms within the enhancing tumor, where ADCL and ADCH were defined as the mean ADC value of the lower and higher Gaussian distribution, respectively. An ADCL value of 1.0 um2/ms and ADCH value of 1.6 um2/ms were used to stratify patients into high and low risk categories. Results suggest patients with low ADCL had significantly shorter PFS (Cox Hazard Ratio = 0.12, P = 0.0006). OS was significantly shorter with low ADCL tumors, showing a median OS of 407 vs. 644 days (Cox Hazard Ratio = 0.31, P = 0.047). ADCH was not predictive of PFS or OS when accounting for age and ADCL. In summary, newly diagnosed glioblastoma patients with low ADCL after completion of RT+TMZ are likely to progress and die earlier than patients with higher ADCL. Results suggest ADC histogram analysis may be useful for patient risk stratification following completion of RT+TMZ

eScholarship - University of California

Validation of vessel size imaging (VSI) in high-grade human gliomas using magnetic resonance imaging, image-guided biopsies, and quantitative immunohistochemistry.

Author: Chakhoyan Ararat
Cloughesy Timothy F
Ellingson Benjamin M
Everson Richard G
Lai Albert
Leu Kevin
Liau Linda M
Nathanson David A
Nghiemphu Phioanh L
Pope Whitney B
Prins Robert M
Salamon Noriko
Yao Jingwen
Yong William
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

To evaluate the association between a vessel size index (VSIMRI) derived from dynamic susceptibility contrast (DSC) perfusion imaging using a custom spin-and-gradient echo echoplanar imaging (SAGE-EPI) sequence and quantitative estimates of vessel morphometry based on immunohistochemistry from image-guided biopsy samples. The current study evaluated both relative cerebral blood volume (rCBV) and VSIMRI in eleven patients with high-grade glioma (7 WHO grade III and 4 WHO grade IV). Following 26 MRI-guided glioma biopsies in these 11 patients, we evaluated tissue morphometry, including vessel density and average radius, using an automated procedure based on the endothelial cell marker CD31 to highlight tumor vasculature. Measures of rCBV and VSIMRI were then compared to histological measures. We demonstrate good agreement between VSI measured by MRI and histology; VSIMRI = 13.67 μm and VSIHistology = 12.60 μm, with slight overestimation of VSIMRI in grade III patients compared to histology. rCBV showed a moderate but significant correlation with vessel density (r = 0.42, p = 0.03), and a correlation was also observed between VSIMRI and VSIHistology (r = 0.49, p = 0.01). The current study supports the hypothesis that vessel size measures using MRI accurately reflect vessel caliber within high-grade gliomas, while traditional measures of rCBV are correlated with vessel density and not vessel caliber

Directory of Open Access Journals

eScholarship - University of California

Topology of the conceptual network of language

Author: A.-L. Barabási
A.-L. Barabási
A.-L. Barabási
Adilson E. Motter
Alessandro P. S. de Moura
C. Koch
D.J. Watts
K.E. Stephan
M. Sigman
Partha Dasgupta
R. Albert
R. Albert
R.F.I. Cancho
S.H. Strogatz
S.N. Dorogovtsev
V. Latora
Ying-Cheng Lai
Publication venue: 'American Physical Society (APS)'
Publication date: 26/06/2002
Field of study

We define two words in a language to be connected if they express similar concepts. The network of connections among the many thousands of words that make up a language is important not only for the study of the structure and evolution of languages, but also for cognitive science. We study this issue quantitatively, by mapping out the conceptual network of the English language, with the connections being defined by the entries in a Thesaurus dictionary. We find that this network presents a small-world structure, with an amazingly small average shortest path, and appears to exhibit an asymptotic scale-free feature with algebraic connectivity distribution.Comment: 4 pages, 2 figures, Revte

arXiv.org e-Print Archive

Crossref

Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: A systematic literature review

Author: Gupta Aditi
Kumar Sayantan
Lai Albert M
Oh Inez
Payne Philip R O
Schindler Suzanne
Publication venue: Digital Commons@Becker
Publication date: 01/07/2021
Field of study

OBJECTIVE: Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia. MATERIALS AND METHODS: We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus. RESULTS: There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values). DISCUSSION: Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research

Digital Commons@Becker

Ribosomal Proteins RPS11 and RPS20, Two Stress-Response Markers of Glioblastoma Stem Cells, Are Novel Predictors of Poor Prognosis in Glioblastoma Patients.

Author: Chen Zugen
Cloughesy Timothy F
Lai Albert
Liau Linda M
Lucey Gregory M
Mareninov Sergey
Menjivar Jimmy C
Nelson Stanley F
Shabihkhani Maryam
Telesca Donatello
Tso Cho-Lea
Tso Jonathan L
Wei Bowen
Yang Shuai
Yong William H
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Glioblastoma stem cells (GSC) co-exhibiting a tumor-initiating capacity and a radio-chemoresistant phenotype, are a compelling cell model for explaining tumor recurrence. We have previously characterized patient-derived, treatment-resistant GSC clones (TRGC) that survived radiochemotherapy. Compared to glucose-dependent, treatment-sensitive GSC clones (TSGC), TRGC exhibited reduced glucose dependence that favor the fatty acid oxidation pathway as their energy source. Using comparative genome-wide transcriptome analysis, a series of defense signatures associated with TRGC survival were identified and verified by siRNA-based gene knockdown experiments that led to loss of cell integrity. In this study, we investigate the prognostic value of defense signatures in glioblastoma (GBM) patients using gene expression analysis with Probeset Analyzer (131 GBM) and The Cancer Genome Atlas (TCGA) data, and protein expression with a tissue microarray (50 GBM), yielding the first TRGC-derived prognostic biomarkers for GBM patients. Ribosomal protein S11 (RPS11), RPS20, individually and together, consistently predicted poor survival of newly diagnosed primary GBM tumors when overexpressed at the RNA or protein level [RPS11: Hazard Ratio (HR) = 11.5, p<0.001; RPS20: HR = 4.5, p = 0.03; RPS11+RPS20: HR = 17.99, p = 0.001]. The prognostic significance of RPS11 and RPS20 was further supported by whole tissue section RPS11 immunostaining (27 GBM; HR = 4.05, p = 0.01) and TCGA gene expression data (578 primary GBM; RPS11: HR = 1.19, p = 0.06; RPS20: HR = 1.25, p = 0.02; RPS11+RPS20: HR = 1.43, p = 0.01). Moreover, tumors that exhibited unmethylated O-6-methylguanine-DNA methyltransferase (MGMT) or wild-type isocitrate dehydrogenase 1 (IDH1) were associated with higher RPS11 expression levels [corr (IDH1, RPS11) = 0.64, p = 0.03); [corr (MGMT, RPS11) = 0.52, p = 0.04]. These data indicate that increased expression of RPS11 and RPS20 predicts shorter patient survival. The study also suggests that TRGC are clinically relevant cells that represent resistant tumorigenic clones from patient tumors and that their properties, at least in part, are reflected in poor-prognosis GBM. The screening of TRGC signatures may represent a novel alternative strategy for identifying new prognostic biomarkers

Directory of Open Access Journals

PubMed Central

eScholarship - University of California