Search CORE

381 research outputs found

Learning Intonation Rules for Concept to Speech Generation

Author: McKeown Kathleen
Pan Shimei
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1998
Field of study

In this paper, we report on an effort to provide a general-purpose spoken language generation tool for Concept-to-Speech (CTS) applications by extending a widely used text generation package, FUF/SURGE, with an intonation generation component. As a first step, we applied machine learning and statistical models to learn intonation rules based on the semantic and syntactic information typically represented in FUF/SURGE at the sentence level. The results of this study are a set of intonation rules learned automatically which can be directly implemented in our intonation generation component. Through 5-fold cross-validation, we show that the learned rules achieve around 90% accuracy for break index, boundary tone and phrase accent and 80% accuracy for pitch accent. Our study is unique in its use of features produced by language generation to control intonation. The methodology adopted here can be employed directly when more discourse/pragmatic information is to be considered in the future

CiteSeerX

Crossref

Columbia University Academic Commons

Neural information extraction from natural language text

Author: Gupta Pankaj
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 26/09/2019
Field of study

Natural language processing (NLP) deals with building computational techniques that allow computers to automatically analyze and meaningfully represent human language. With an exponential growth of data in this digital era, the advent of NLP-based systems has enabled us to easily access relevant information via a wide range of applications, such as web search engines, voice assistants, etc. To achieve it, a long-standing research for decades has been focusing on techniques at the intersection of NLP and machine learning. In recent years, deep learning techniques have exploited the expressive power of Artificial Neural Networks (ANNs) and achieved state-of-the-art performance in a wide range of NLP tasks. Being one of the vital properties, Deep Neural Networks (DNNs) can automatically extract complex features from the input data and thus, provide an alternative to the manual process of handcrafted feature engineering. Besides ANNs, Probabilistic Graphical Models (PGMs), a coupling of graph theory and probabilistic methods have the ability to describe causal structure between random variables of the system and capture a principled notion of uncertainty. Given the characteristics of DNNs and PGMs, they are advantageously combined to build powerful neural models in order to understand the underlying complexity of data. Traditional machine learning based NLP systems employed shallow computational methods (e.g., SVM or logistic regression) and relied on handcrafting features which is time-consuming, complex and often incomplete. However, deep learning and neural network based methods have recently shown superior results on various NLP tasks, such as machine translation, text classification, namedentity recognition, relation extraction, textual similarity, etc. These neural models can automatically extract an effective feature representation from training data. This dissertation focuses on two NLP tasks: relation extraction and topic modeling. The former aims at identifying semantic relationships between entities or nominals within a sentence or document. Successfully extracting the semantic relationships greatly contributes in building structured knowledge bases, useful in downstream NLP application areas of web search, question-answering, recommendation engines, etc. On other hand, the task of topic modeling aims at understanding the thematic structures underlying in a collection of documents. Topic modeling is a popular text-mining tool to automatically analyze a large collection of documents and understand topical semantics without actually reading them. In doing so, it generates word clusters (i.e., topics) and document representations useful in document understanding and information retrieval, respectively. Essentially, the tasks of relation extraction and topic modeling are built upon the quality of representations learned from text. In this dissertation, we have developed task-specific neural models for learning representations, coupled with relation extraction and topic modeling tasks in the realms of supervised and unsupervised machine learning paradigms, respectively. More specifically, we make the following contributions in developing neural models for NLP tasks: 1. Neural Relation Extraction: Firstly, we have proposed a novel recurrent neural network based architecture for table-filling in order to jointly perform entity and relation extraction within sentences. Then, we have further extended our scope of extracting relationships between entities across sentence boundaries, and presented a novel dependency-based neural network architecture. The two contributions lie in the supervised paradigm of machine learning. Moreover, we have contributed in building a robust relation extractor constrained by the lack of labeled data, where we have proposed a novel weakly-supervised bootstrapping technique. Given the contributions, we have further explored interpretability of the recurrent neural networks to explain their predictions for the relation extraction task. 2. Neural Topic Modeling: Besides the supervised neural architectures, we have also developed unsupervised neural models to learn meaningful document representations within topic modeling frameworks. Firstly, we have proposed a novel dynamic topic model that captures topics over time. Next, we have contributed in building static topic models without considering temporal dependencies, where we have presented neural topic modeling architectures that also exploit external knowledge, i.e., word embeddings to address data sparsity. Moreover, we have developed neural topic models that incorporate knowledge transfers using both the word embeddings and latent topics from many sources. Finally, we have shown improving neural topic modeling by introducing language structures (e.g., word ordering, local syntactic and semantic information, etc.) that deals with bag-of-words issues in traditional topic models. The class of proposed neural NLP models in this section are based on techniques at the intersection of PGMs, deep learning and ANNs. Here, the task of neural relation extraction employs neural networks to learn representations typically at the sentence level, without access to the broader document context. However, topic models have access to statistical information across documents. Therefore, we advantageously combine the two complementary learning paradigms in a neural composite model, consisting of a neural topic and a neural language model that enables us to jointly learn thematic structures in a document collection via the topic model, and word relations within a sentence via the language model. Overall, our research contributions in this dissertation extend NLP-based systems for relation extraction and topic modeling tasks with state-of-the-art performances

Digitale Hochschulschriften der LMU

Neural information extraction from natural language text

Author: Gupta Pankaj
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 26/09/2019
Field of study

BGSU Football Program October 14, 2006

Author: Bowling Green State University. Department of Athletics
Publication venue: ScholarWorks@BGSU
Publication date: 14/10/2006
Field of study

Football program: Bowling Green State University vs. Eastern Michigan University, Doyt Perry Field, Homecoming, October 14, 2006.https://scholarworks.bgsu.edu/football_programs/1260/thumbnail.jp

Bowling Green State University: ScholarWorks@BGSU

Tree-Adjoining Grammars and Lexicalized Grammars

Author: Joshi Aravind K
Schabes Yves
Publication venue: ScholarlyCommons
Publication date: 01/03/1991
Field of study

In this paper, we will describe a tree generating system called tree-adjoining grammar(TAG)and state some of the recent results about TAGs. The work on TAGS is motivated by linguistic considerations. However, a number of formal results have been established for TAGs, which we believe, would be of interest to researchers in tree grammars and tree automata. After giving a short introduction to TAG, we briefly state these results concerning both the properties of the string sets and tree sets (Section 2). We will also describe the notion of lexicalization of grammars (Section 3) and investigate the relationship of lexicalization to context-free grammars (CFGs) and TAGS (Section 4)

ScholarlyCommons@Penn

Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Author: Ammanabrolu Prithviraj
Bauckhage Christian
Brantley Kianté
Choi Yejin
Hajishirzi Hannaneh
Hessel Jack
Ramamurthy Rajkumar
Sifa Rafet
Publication venue
Publication date: 03/10/2022
Field of study

We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of open-source libraries and benchmarks customized for LM alignment. Thus, a question rises in the research community: is RL a practical paradigm for NLP? To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL. The library consists of on-policy RL algorithms that can be used to train any encoder or encoder-decoder LM in the HuggingFace library (Wolf et al. 2020) with an arbitrary reward function. Next, we present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions which capture automated measures of human preference.GRUE is the first leaderboard-style evaluation of RL algorithms for NLP tasks. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization)} that learns to effectively reduce the combinatorial action space in language generation. We show 1) that RL techniques are generally better than supervised methods at aligning LMs to human preferences; and 2) that NLPO exhibits greater stability and performance than previous policy gradient methods (e.g., PPO (Schulman et al. 2017)), based on both automatic and human evaluation.Comment: Preprint. Under review. Code found at https://github.com/allenai/rl4lms and Project website at https://rl4lms.apps.allenai.org

arXiv.org e-Print Archive

The Cowl - v.63 - n.7 - Nov 5, 1998

Author
Publication venue: DigitalCommons@Providence
Publication date: 05/11/1998
Field of study

The Cowl - student newspaper of Providence College. Volume 63, Number 7 - Nov 5, 1998. 24 pages

DigitalCommons@Providence