Search CORE

4 research outputs found

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Author: Yang Wei
Lu Wei
Zheng Vincent W.
Publication venue
Publication date: 01/01/2019
Field of study

Learning word embeddings has received a significant amount of attention recently. Often, word embeddings are learned in an unsupervised manner from a large collection of text. The genre of the text typically plays an important role in the effectiveness of the resulting embeddings. How to effectively train word embedding models using data from different domains remains a problem that is underexplored. In this paper, we present a simple yet effective method for learning word embeddings based on text from different domains. We demonstrate the effectiveness of our approach through extensive experiments on various down-stream NLP tasks.Comment: 7 pages, accepted by EMNLP 201

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Integrative high-throughput study of arsenic hyper-accumulation in Pteris vittata

Author: Wu Qiong
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

Arsenic is a natural contaminant in the soil and ground water, which raises considerable concerns in food safety and human health worldwide. The fernPteris vittata (Chinese brake fern) is the first identified arsenic hyperaccumulator[1]. It and its close relatives have un-paralleled ability to tolerant arsenic and feature unique arsenic metabolisms. The focus of the research presented in this thesis is to elucidate the fundamentals of arsenic tolerance and hyper-accumulation in Pteris vittata through high throughput technology and bioinformatics tools. The transcriptome of the P. vittatagametophyte under arsenate stress was obtained using RNA-Seq technology and Trinity de novo assembly. Functional annotation of the transcriptome was performed in terms of blast search, Gene Ontology term assignment, Eukaryotic Orthologous Groups (KOG) classification, and pathway analysis. Differentially expressed genes induced by arsenic stress were identified, which revealed several key players in arsenic hyper-accumulation. As part of the efforts to annotate differentially expressed genes, literature of plant arsenic tolerance was collected and built into a searchable database using the Textpresso text-mining tool, which greatly facilitates the retrieval of biological facts involving arsenic related gene. In addition, an SVM-based named-entity recognition system was constructed to identify new references to genes in literature. The results provide excellent sequence resources for arsenic tolerance study in P.vittata, and establish a platform for integrative study using data of multiple types

Purdue E-Pubs