13 research outputs found
Two to Five Truths in Non-Negative Matrix Factorization
In this paper, we explore the role of matrix scaling on a matrix of counts
when building a topic model using non-negative matrix factorization. We present
a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly
improve the quality of a non-negative matrix factorization. The results
parallel those in the spectral graph clustering work of \cite{Priebe:2019},
where the authors proved adjacency spectral embedding (ASE) spectral clustering
was more likely to discover core-periphery partitions and Laplacian Spectral
Embedding (LSE) was more likely to discover affinity partitions. In text
analysis non-negative matrix factorization (NMF) is typically used on a matrix
of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired
by LSE gives significant improvement for text topic models in a variety of
datasets. We illustrate the dramatic difference a matrix scalings in NMF can
greatly improve the quality of a topic model on three datasets where human
annotation is available. Using the adjusted Rand index (ARI), a measure cluster
similarity we see an increase of 50\% for Twitter data and over 200\% for a
newsgroup dataset versus using counts, which is the analogue of ASE. For clean
data, such as those from the Document Understanding Conference, NL gives over
40\% improvement over ASE. We conclude with some analysis of this phenomenon
and some connections of this scaling with other matrix scaling methods
Proportion and characteristics of secondary progressive multiple sclerosis in five European registries using objective classifiers
Background: To assign a course of secondary progressive multiple sclerosis (MS) (SPMS) may be difficult and the proportion of persons with SPMS varies between reports. An objective method for disease course classification may give a better estimation of the relative proportions of relapsing-remitting MS (RRMS) and SPMS and may identify situations where SPMS is under reported.Materials and methods: Data were obtained for 61,900 MS patients from MS registries in the Czech Republic, Denmark, Germany, Sweden, and the United Kingdom (UK), including date of birth, sex, SP conversion year, visits with an Expanded Disability Status Scale (EDSS) score, MS onset and diagnosis date, relapses, and disease-modifying treatment (DMT) use. We included RRMS or SPMS patients with at least one visit between January 2017 and December 2019 if ≥ 18 years of age. We applied three objective methods: A set of SPMS clinical trial inclusion criteria ("EXPAND criteria") modified for a real-world evidence setting, a modified version of the MSBase algorithm, and a decision tree-based algorithm recently published.Results: The clinically assigned proportion of SPMS varied from 8.7% (Czechia) to 34.3% (UK). Objective classifiers estimated the proportion of SPMS from 15.1% (Germany by the EXPAND criteria) to 58.0% (UK by the decision tree method). Due to different requirements of number of EDSS scores, classifiers varied in the proportion they were able to classify; from 18% (UK by the MSBase algorithm) to 100% (the decision tree algorithm for all registries). Objectively classified SPMS patients were older, converted to SPMS later, had higher EDSS at index date and higher EDSS at conversion. More objectively classified SPMS were on DMTs compared to the clinically assigned.Conclusion: SPMS appears to be systematically underdiagnosed in MS registries. Reclassified patients were more commonly on DMTs.</p
Estimating the percentage of patients who might benefit from proton beam therapy instead of X-ray radiotherapy
SCRIB and PUF60 Are Primary Drivers of the Multisystemic Phenotypes of the 8q24.3 Copy-Number Variant.
Copy-number variants (CNVs) represent a significant interpretative challenge, given that each CNV typically affects the dosage of multiple genes. Here we report on five individuals with coloboma, microcephaly, developmental delay, short stature, and craniofacial, cardiac, and renal defects who harbor overlapping microdeletions on 8q24.3. Fine mapping localized a commonly deleted 78 kb region that contains three genes: SCRIB, NRBP2, and PUF60. In vivo dissection of the CNV showed discrete contributions of the planar cell polarity effector SCRIB and the splicing factor PUF60 to the syndromic phenotype, and the combinatorial suppression of both genes exacerbated some, but not all, phenotypic components. Consistent with these findings, we identified an individual with microcephaly, short stature, intellectual disability, and heart defects with a de novo c.505C>T variant leading to a p.His169Tyr change in PUF60. Functional testing of this allele in vivo and in vitro showed that the mutation perturbs the relative dosage of two PUF60 isoforms and, subsequently, the splicing efficiency of downstream PUF60 targets. These data inform the functions of two genes not associated previously with human genetic disease and demonstrate how CNVs can exhibit complex genetic architecture, with the phenotype being the amalgam of both discrete dosage dysfunction of single transcripts and also of binary genetic interactions