30 research outputs found

    Neural information extraction from natural language text

    Get PDF
    Natural language processing (NLP) deals with building computational techniques that allow computers to automatically analyze and meaningfully represent human language. With an exponential growth of data in this digital era, the advent of NLP-based systems has enabled us to easily access relevant information via a wide range of applications, such as web search engines, voice assistants, etc. To achieve it, a long-standing research for decades has been focusing on techniques at the intersection of NLP and machine learning. In recent years, deep learning techniques have exploited the expressive power of Artificial Neural Networks (ANNs) and achieved state-of-the-art performance in a wide range of NLP tasks. Being one of the vital properties, Deep Neural Networks (DNNs) can automatically extract complex features from the input data and thus, provide an alternative to the manual process of handcrafted feature engineering. Besides ANNs, Probabilistic Graphical Models (PGMs), a coupling of graph theory and probabilistic methods have the ability to describe causal structure between random variables of the system and capture a principled notion of uncertainty. Given the characteristics of DNNs and PGMs, they are advantageously combined to build powerful neural models in order to understand the underlying complexity of data. Traditional machine learning based NLP systems employed shallow computational methods (e.g., SVM or logistic regression) and relied on handcrafting features which is time-consuming, complex and often incomplete. However, deep learning and neural network based methods have recently shown superior results on various NLP tasks, such as machine translation, text classification, namedentity recognition, relation extraction, textual similarity, etc. These neural models can automatically extract an effective feature representation from training data. This dissertation focuses on two NLP tasks: relation extraction and topic modeling. The former aims at identifying semantic relationships between entities or nominals within a sentence or document. Successfully extracting the semantic relationships greatly contributes in building structured knowledge bases, useful in downstream NLP application areas of web search, question-answering, recommendation engines, etc. On other hand, the task of topic modeling aims at understanding the thematic structures underlying in a collection of documents. Topic modeling is a popular text-mining tool to automatically analyze a large collection of documents and understand topical semantics without actually reading them. In doing so, it generates word clusters (i.e., topics) and document representations useful in document understanding and information retrieval, respectively. Essentially, the tasks of relation extraction and topic modeling are built upon the quality of representations learned from text. In this dissertation, we have developed task-specific neural models for learning representations, coupled with relation extraction and topic modeling tasks in the realms of supervised and unsupervised machine learning paradigms, respectively. More specifically, we make the following contributions in developing neural models for NLP tasks: 1. Neural Relation Extraction: Firstly, we have proposed a novel recurrent neural network based architecture for table-filling in order to jointly perform entity and relation extraction within sentences. Then, we have further extended our scope of extracting relationships between entities across sentence boundaries, and presented a novel dependency-based neural network architecture. The two contributions lie in the supervised paradigm of machine learning. Moreover, we have contributed in building a robust relation extractor constrained by the lack of labeled data, where we have proposed a novel weakly-supervised bootstrapping technique. Given the contributions, we have further explored interpretability of the recurrent neural networks to explain their predictions for the relation extraction task. 2. Neural Topic Modeling: Besides the supervised neural architectures, we have also developed unsupervised neural models to learn meaningful document representations within topic modeling frameworks. Firstly, we have proposed a novel dynamic topic model that captures topics over time. Next, we have contributed in building static topic models without considering temporal dependencies, where we have presented neural topic modeling architectures that also exploit external knowledge, i.e., word embeddings to address data sparsity. Moreover, we have developed neural topic models that incorporate knowledge transfers using both the word embeddings and latent topics from many sources. Finally, we have shown improving neural topic modeling by introducing language structures (e.g., word ordering, local syntactic and semantic information, etc.) that deals with bag-of-words issues in traditional topic models. The class of proposed neural NLP models in this section are based on techniques at the intersection of PGMs, deep learning and ANNs. Here, the task of neural relation extraction employs neural networks to learn representations typically at the sentence level, without access to the broader document context. However, topic models have access to statistical information across documents. Therefore, we advantageously combine the two complementary learning paradigms in a neural composite model, consisting of a neural topic and a neural language model that enables us to jointly learn thematic structures in a document collection via the topic model, and word relations within a sentence via the language model. Overall, our research contributions in this dissertation extend NLP-based systems for relation extraction and topic modeling tasks with state-of-the-art performances

    Unravelling the Genomic Landscape of Metastatic Prostate Cancer

    Get PDF

    Unravelling the Genomic Landscape of Metastatic Prostate Cancer

    Get PDF

    Neural information extraction from natural language text

    Get PDF
    Natural language processing (NLP) deals with building computational techniques that allow computers to automatically analyze and meaningfully represent human language. With an exponential growth of data in this digital era, the advent of NLP-based systems has enabled us to easily access relevant information via a wide range of applications, such as web search engines, voice assistants, etc. To achieve it, a long-standing research for decades has been focusing on techniques at the intersection of NLP and machine learning. In recent years, deep learning techniques have exploited the expressive power of Artificial Neural Networks (ANNs) and achieved state-of-the-art performance in a wide range of NLP tasks. Being one of the vital properties, Deep Neural Networks (DNNs) can automatically extract complex features from the input data and thus, provide an alternative to the manual process of handcrafted feature engineering. Besides ANNs, Probabilistic Graphical Models (PGMs), a coupling of graph theory and probabilistic methods have the ability to describe causal structure between random variables of the system and capture a principled notion of uncertainty. Given the characteristics of DNNs and PGMs, they are advantageously combined to build powerful neural models in order to understand the underlying complexity of data. Traditional machine learning based NLP systems employed shallow computational methods (e.g., SVM or logistic regression) and relied on handcrafting features which is time-consuming, complex and often incomplete. However, deep learning and neural network based methods have recently shown superior results on various NLP tasks, such as machine translation, text classification, namedentity recognition, relation extraction, textual similarity, etc. These neural models can automatically extract an effective feature representation from training data. This dissertation focuses on two NLP tasks: relation extraction and topic modeling. The former aims at identifying semantic relationships between entities or nominals within a sentence or document. Successfully extracting the semantic relationships greatly contributes in building structured knowledge bases, useful in downstream NLP application areas of web search, question-answering, recommendation engines, etc. On other hand, the task of topic modeling aims at understanding the thematic structures underlying in a collection of documents. Topic modeling is a popular text-mining tool to automatically analyze a large collection of documents and understand topical semantics without actually reading them. In doing so, it generates word clusters (i.e., topics) and document representations useful in document understanding and information retrieval, respectively. Essentially, the tasks of relation extraction and topic modeling are built upon the quality of representations learned from text. In this dissertation, we have developed task-specific neural models for learning representations, coupled with relation extraction and topic modeling tasks in the realms of supervised and unsupervised machine learning paradigms, respectively. More specifically, we make the following contributions in developing neural models for NLP tasks: 1. Neural Relation Extraction: Firstly, we have proposed a novel recurrent neural network based architecture for table-filling in order to jointly perform entity and relation extraction within sentences. Then, we have further extended our scope of extracting relationships between entities across sentence boundaries, and presented a novel dependency-based neural network architecture. The two contributions lie in the supervised paradigm of machine learning. Moreover, we have contributed in building a robust relation extractor constrained by the lack of labeled data, where we have proposed a novel weakly-supervised bootstrapping technique. Given the contributions, we have further explored interpretability of the recurrent neural networks to explain their predictions for the relation extraction task. 2. Neural Topic Modeling: Besides the supervised neural architectures, we have also developed unsupervised neural models to learn meaningful document representations within topic modeling frameworks. Firstly, we have proposed a novel dynamic topic model that captures topics over time. Next, we have contributed in building static topic models without considering temporal dependencies, where we have presented neural topic modeling architectures that also exploit external knowledge, i.e., word embeddings to address data sparsity. Moreover, we have developed neural topic models that incorporate knowledge transfers using both the word embeddings and latent topics from many sources. Finally, we have shown improving neural topic modeling by introducing language structures (e.g., word ordering, local syntactic and semantic information, etc.) that deals with bag-of-words issues in traditional topic models. The class of proposed neural NLP models in this section are based on techniques at the intersection of PGMs, deep learning and ANNs. Here, the task of neural relation extraction employs neural networks to learn representations typically at the sentence level, without access to the broader document context. However, topic models have access to statistical information across documents. Therefore, we advantageously combine the two complementary learning paradigms in a neural composite model, consisting of a neural topic and a neural language model that enables us to jointly learn thematic structures in a document collection via the topic model, and word relations within a sentence via the language model. Overall, our research contributions in this dissertation extend NLP-based systems for relation extraction and topic modeling tasks with state-of-the-art performances

    The flexibility of myosin 7a

    Get PDF
    Myosin 7a is a molecular motor found in hair cells of the ear and the photoreceptor cells of the eye. Myosin 7a is comprised of an actin-binding motor domain, a lever; which is composed of 5 IQ motifs that can potentially bind 5 light chains followed by a single alpha helical (SAH) domain, and a tail composed of 2 MyTH4-FERM domains. The lever is an essential mechanical element in myosin 7a function, but an understanding of its mechanical properties and how these derive from its substructure is lacking. It has been observed in vitro that myosin 7a is able to regulate its activity through a head-tail interaction. How the flexibility of the sub-domains of the lever allows the molecule to fold up is not completely understood. To address this, the first aim of this study was to look for evidence of novel light chain binding partners in myosin 7a, which revealed calmodulin to be the preferred light chain. My second aim was to study the structure and flexibility of the lever of full-length myosin 7a using single-particle image processing of images from negative stain electron microscopy (EM). Image averaging revealed the lever to be much shorter than expected. Additionally, there was evidence of thermally-driven flexing at the motor-lever junction. A stiffness of 78 pN.nm.rad-2 for the flexing was inferred, which represents a significant compliance in the head. An investigation into lever bending analysis, by monitoring the decay of tangent-tangent correlations of the lever shapes, yielded a persistence length of 38 ± 3 nm. Finally, long time molecular dynamics (MD) simulations were compared with a novel coarse-grained (CG) simulation technique called Fluctuating Finite Element Analysis (FFEA), which treats proteins as visco-elastic continua subject to thermal noise to probe the flexibility of myosin 7a. FFEA allows sufficiently long time simulations that are computationally less expensive than corresponding all-atom MD simulations to allow myosin 7a to explore its full range of configurations. Extraction of flexibility data from all-atom MD simulations calculated the bending stiffness of the SAH domain to be 60.5 pN.nm2, with reasonable overlap of the major modes of motion between the all-atom and CG simulation types

    Caractérisation intégrative et développement d’outils moléculaires chez la bactérie "Mesoplasma florum"

    Get PDF
    L’émergence de la biologie synthétique marque l’entrée dans une nouvelle ère où il sera possible de modifier et reprogrammer des génomes entiers afin de répondre à des besoins spécifiques. Ce domaine de recherche est par conséquent appelé à jouer un rôle de premier plan dans le développement de nouvelles technologies visant à s’attaquer à certains des plus grands défis du 21e siècle tels que la multirésistance aux antibiotiques, la production d’énergies renouvelables et le traitement de maladies comme le cancer ou le diabète. Notre habileté actuelle à programmer des comportements cellulaires prévisibles est cependant très limitée, principalement parce que les organismes modèles couramment utilisés possèdent une complexité qui dépasse nos capacités d’analyse et que les règles fondamentales qui gouvernent le fonctionnement global des cellules demeurent encore mal comprises. En raison de leurs génomes remarquablement petits, les bactéries appartenant à la classe des Mollicutes représentent des candidats particulièrement intéressants afin de décortiquer le fonctionnement intégral de cellules via les approches intégratives de la biologie des systèmes et de la génomique synthétique. La majorité de ces microorganismes sont toutefois caractérisés par un style de vie parasitaire, des capacités métaboliques réduites et une croissance relativement lente nécessitant l’utilisation de milieux de culture complexes. Conjointement au manque d’outils génétiques efficaces, ces caractéristiques restreignent considérablement leur manipulation en laboratoire. Certains Mollicutes se démarquent néanmoins en tant qu’organismes modèles pour l’avancement de la biologie synthétique et de la biologie des systèmes. C’est le cas pour Mesoplasma florum, une bactérie étroitement apparentée aux mycoplasmes du groupe de Mycoplasma mycoides (mycoides cluster). Contrairement à la plupart des mycoplasmes, M. florum ne possède aucun pouvoir pathogène connu et croît rapidement en conditions de laboratoire. De plus, M. florum possède un génome comprenant seulement 793 224 paires de bases et 685 séquences codantes pour des protéines, ce qui positionne cette bactérie parmi les organismes à réplication autonome les plus simples connus à ce jour. Malgré ces avantages considérables, seulement quelques études avaient jusqu’à tout récemment spécifiquement exploré la biologie de M. florum, et ce même si sa découverte remonte à près de 40 ans. Ainsi, lors du commencement de mon doctorat, plusieurs aspects importants concernant ce microorganisme demeuraient toujours à définir. Par exemple, pratiquement aucune donnée quantitative sur la physiologie de cette bactérie était à ce moment-là disponible dans la littérature, et aucune étude sur l’expression de ses gènes n’avait encore été entreprise. De plus, très peu voire même aucun outil moléculaire n’était disponible afin de modifier le génome de M. florum, ce qui constituait une limitation technique importante à l’étude de la biologie de cet organisme, en plus de restreindre son utilisation en tant que châssis cellulaire pour l’ingénierie microbienne et le développement d’applications biotechnologiques. Face à cette problématique, j’ai tout d’abord développé un système de culture en continu flexible et peu dispendieux permettant de faire croître M. florum dans des conditions contrôlées, stables et hautement reproductibles. Cet appareil offre plusieurs modes de fonctionnement pour accommoder les différents besoins rencontrés en laboratoire, et nous avons rendu les détails de sa conception entièrement disponibles pour l’ensemble de la communauté scientifique. En diminuant les fluctuations physiologiques des cellules, ce système de culture permet de réduire les variations expérimentales lors de l’étude de M. florum, et ainsi de générer des données plus facilement interprétables et comparables entre expériences. J’ai ensuite développé les tout premiers plasmides spécifiquement conçus pour se répliquer chez M. florum. Basés sur l’origine de réplication du chromosome, ces plasmides ont permis de tester la fonctionnalité de différents marqueurs de sélection aux antibiotiques, en plus de mettre au point différentes méthodes de transformation pour cette bactérie. Grâce à leur tendance naturelle à recombiner avec le chromosome, ces plasmides ont d’ailleurs servi de fondement à la technique développée par notre laboratoire afin de cloner le génome complet de M. florum dans la levure. Cette souche de levure peut maintenant servir de plateforme afin de modifier efficacement le génome de M. florum et ensuite le transplanter dans une cellule réceptrice. Finalement, j’ai procédé à la caractérisation approfondie de cette bactérie quasi minimale en combinant différentes méthodes expérimentales et approches intégratives. Cette caractérisation intégrative comprend la mesure de plusieurs aspects physiques et physiologiques propres à M. florum, incluant son temps de doublement, diamètre cellulaire, masse cellulaire sèche, ainsi que la définition des fractions macromoléculaires de celle-ci. J’ai également réalisé les premières analyses du transcriptome et du protéome de ce microorganisme afin de définir les unités transcriptionnelles, estimer les abondances moléculaires absolues de chacun des transcrits et protéines exprimées, de même qu’évaluer l’importance globale des fonctions cellulaires prédites. En plus d’augmenter nos connaissances fondamentales sur différents aspects de la biologie de M. florum, ces efforts de caractérisation serviront de fondation pour le développement d’un modèle à l’échelle du génome décrivant le métabolisme de cette bactérie. L’ensemble de ces efforts visent à acquérir les connaissances et les outils moléculaires nécessaires afin de transformer M. florum en une plateforme simplifiée, hautement caractérisée et spécialement conçue pour explorer les règles gouvernant l’organisation et la plasticité des génomes, ainsi que les mécanismes cellulaires à la base du fonctionnement des cellules. Une telle plateforme a le potentiel de transformer la biologie synthétique en une discipline logique, prévisible et reproductible, rendant ainsi possible le prototypage rationnel et efficace de génomes dans le but de produire des souches bactériennes capables d’accomplir des tâches bien précises

    ACARORUM CATALOGUS IX. Acariformes, Acaridida, Schizoglyphoidea (Schizoglyphidae), Histiostomatoidea (Histiostomatidae, Guanolichidae), Canestrinioidea (Canestriniidae, Chetochelacaridae, Lophonotacaridae, Heterocoptidae), Hemisarcoptoidea (Chaetodactylidae, Hyadesiidae, Algophagidae, Hemisarcoptidae, Carpoglyphidae, Winterschmidtiidae)

    Get PDF
    The 9th volume of the series Acarorum Catalogus contains lists of mites of 13 families, 225 genera and 1268 species of the superfamilies Schizoglyphoidea, Histiostomatoidea, Canestrinioidea and Hemisarcoptoidea. Most of these mites live on insects or other animals (as parasites, phoretic or commensals), some inhabit rotten plant material, dung or fungi. Mites of the families Chetochelacaridae and Lophonotacaridae are specialised to live with Myriapods (Diplopoda). The peculiar aquatic or intertidal mites of the families Hyadesidae and Algophagidae are also included.Publishe
    corecore