17 research outputs found

    Sparse topic modeling via spectral decomposition and thresholding

    Full text link
    The probabilistic Latent Semantic Indexing model assumes that the expectation of the corpus matrix is low-rank and can be written as the product of a topic-word matrix and a word-document matrix. In this paper, we study the estimation of the topic-word matrix under the additional assumption that the ordered entries of its columns rapidly decay to zero. This sparsity assumption is motivated by the empirical observation that the word frequencies in a text often adhere to Zipf's law. We introduce a new spectral procedure for estimating the topic-word matrix that thresholds words based on their corpus frequencies, and show that its â„“1\ell_1-error rate under our sparsity assumption depends on the vocabulary size pp only via a logarithmic term. Our error bound is valid for all parameter regimes and in particular for the setting where pp is extremely large; this high-dimensional setting is commonly encountered but has not been adequately addressed in prior literature. Furthermore, our procedure also accommodates datasets that violate the separability assumption, which is necessary for most prior approaches in topic modeling. Experiments with synthetic data confirm that our procedure is computationally fast and allows for consistent estimation of the topic-word matrix in a wide variety of parameter regimes. Our procedure also performs well relative to well-established methods when applied to a large corpus of research paper abstracts, as well as the analysis of single-cell and microbiome data where the same statistical model is relevant but the parameter regimes are vastly different

    Words and thoughts of Joy in the Miracles de Nostre Dame by Gautier de Coinci and in the first Vie des Pères.

    No full text
    Recueils de « contes du salut » rédigés en français dans le premier tiers du XIIIe siècle, les Miracles de Nostre Dame de Gautier de Coinci et l’œuvre anonyme de la première Vie des Pères ont en commun d’assimiler l’obtention de la joie céleste à l’achèvement du parcours spirituel des héros de leurs récits. Partant de l’hypothèse que la joie serait alors le point de fuite où se rejoignent dans ces textes projet pastoral et projet littéraire, ce travail se propose d’y interroger les enjeux attachés à l’écriture de l’émotion jubilatoire. S’inspirant des recherches menées par les historiens de l’affect, notre projet n’entend pas traiter de l’émotion comme d’un « champ autonome » dans notre corpus, mais cherche plutôt en quoi l’écriture de la joie révèle toute une conception de la littérature, de sa vocation et de sa transmission, en analysant les différentes dynamiques qui peuvent s’établir entre écriture et joie. Envisagée d’abord en tant qu’objet du texte, l’écriture de la quête de joie témoigne des écoles de pensées et des courants spirituels dans lesquels s’inscrivent les deux auteurs. Pensée comme une dynamique de création, elle cristallise des conceptions poétiques qui, d’un recueil à l’autre, ne tissent pas les mêmes relations entre travail littéraire et construction de la joie spirituelle. Considérée enfin en tant que dynamique de transmission, l’écriture de l’émotion est révélatrice du regard jeté par le conteur sur son public, et invite à questionner la figure du lectorat construite dans chaque œuvre, ainsi que les enjeux politiques et sociaux dont les conteurs ont pu avoir conscience, et dont les lecteurs postérieurs ont pu se saisir.The Miracles de Nostre Dame by Gautier de Coinci and the anonymous first Vie des Pères, two collections of “tales of salvation” written in French in the first third of thirteenth century, both consider the obtention of heavenly Joy as the ultimate goal of their heroes’ spiritual progression. Assuming that the emotion of Joy might be the link between both the pastoral and the literary projects of these texts, this dissertation investigates the goals attached by the authors to the written expression of this emotion. Although inspired by research in the field of the history of emotions, this work does not so much study joy as an autonomous concept but rather analyzes the dynamics between joy and the act of writing in an attempt to demonstrate how the act of writing about joy unveils a certain conception of literature, its purpose and its transmission.The quest for joy and the words chosen to express it will be studied first as the subject of the text, illustrating the intellectual and spiritual currents to which the two authors belong. Then, they will be analyzed as a creative dynamic providing an opportunity to oppose the poetical conceptions of the two authors, who have different notions of how spiritual joy can be achieved through a literary endeavor. Lastly, the written expression of joy will be considered through the purpose of transmission: each storyteller sees his audience differently, and prompts a reflection about the way the audience is perceived in each text, as well as the political and social issues at stake, whether they were crafted by the authors themselves, or seen afterwards by later readers

    Deep Generative Modeling for Volume Reconstruction in Cryo-Electron Microscopy

    Full text link
    Recent breakthroughs in high-resolution imaging of biomolecules in solution with cryo-electron microscopy (cryo-EM) have unlocked new doors for the reconstruction of molecular volumes, thereby promising further advances in biology, chemistry, and pharmacological research. Recent next-generation volume reconstruction algorithms that combine generative modeling with end-to-end unsupervised deep learning techniques have shown promising preliminary results, but still face considerable technical and theoretical hurdles when applied to experimental cryo-EM images. In light of the proliferation of such methods, we propose here a critical review of recent advances in the field of deep generative modeling for cryo-EM volume reconstruction. The present review aims to (i) unify and compare these new methods using a consistent statistical framework, (ii) present them using a terminology familiar to machine learning researchers and computational biologists with no specific background in cryo-EM, and (iii) provide the necessary perspective on current advances to highlight their relative strengths and weaknesses, along with outstanding bottlenecks and avenues for improvements in the field. This review might also raise the interest of computer vision practitioners, as it highlights significant limits of deep generative models in low signal-to-noise regimes -- therefore emphasizing a need for new theoretical and methodological developments

    geomstats: a Python Package for Riemannian Geometry in Machine Learning

    No full text
    Preprint NIPS2018We introduce geomstats, a python package that performs computations on manifolds such as hyperspheres, hyperbolic spaces, spaces of symmetric positive definite matrices and Lie groups of transformations. We provide efficient and extensively unit-tested implementations of these manifolds, together with useful Riemannian metrics and associated Exponential and Logarithm maps. The corresponding geodesic distances provide a range of intuitive choices of Machine Learning loss functions. We also give the corresponding Riemannian gradients. The operations implemented in geomstats are available with different computing backends such as numpy, tensorflow and keras. We have enabled GPU implementation and integrated geomstats manifold computations into keras deep learning framework. This paper also presents a review of manifolds in machine learning and an overview of the geomstats package with examples demonstrating its use for efficient and user-friendly Riemannian geometry