71 research outputs found
Personalized Emphasis Framing for Persuasive Message Generation
In this paper, we present a study on personalized emphasis framing which can
be used to tailor the content of a message to enhance its appeal to different
individuals. With this framework, we directly model content selection decisions
based on a set of psychologically-motivated domain-independent personal traits
including personality (e.g., extraversion and conscientiousness) and basic
human values (e.g., self-transcendence and hedonism). We also demonstrate how
the analysis results can be used in automated personalized content selection
for persuasive message generation
Modeling Prosody Automatically in Concept-to-Speech Generation
A Concept-to-Speech (CTS) Generator is a system which integrates language generation with speech synthesis and produces speech from semantic representations. This is in contrast to Text-to-Speech (TTS) systems where speech is produced from text. CTS systems have an advantage over TTS because of the availability of semantic and pragmatic information, which are considered crucial for prosody generation, a process which models the variations in pitch, tempo and rhythm. My goal is to build a CTS system which produces more natural and intelligible speech than TTS. The CTS system is being developed as part of MAGIC (Dalal et al. 1996), a multimedia presentation generation system for health-care domain
Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts
Word embedding is a Natural Language Processing (NLP) technique that
automatically maps words from a vocabulary to vectors of real numbers in an
embedding space. It has been widely used in recent years to boost the
performance of a vari-ety of NLP tasks such as Named Entity Recognition,
Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such
as Word2Vec and GloVe work well when they are given a large text corpus. When
the input texts are sparse as in many specialized domains (e.g.,
cybersecurity), these methods often fail to produce high-quality vectors. In
this pa-per, we describe a novel method to train domain-specificword embeddings
from sparse texts. In addition to domain texts, our method also leverages
diverse types of domain knowledge such as domain vocabulary and semantic
relations. Specifi-cally, we first propose a general framework to encode
diverse types of domain knowledge as text annotations. Then we de-velop a novel
Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text
annotations in word em-bedding. We have evaluated our method on two
cybersecurity text corpora: a malware description corpus and a Common
Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have
demonstrated the effectiveness of our method in learning domain-specific word
embeddings
LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation
We present LDAExplore, a tool to visualize topic distributions in a given
document corpus that are generated using Topic Modeling methods. Latent
Dirichlet Allocation (LDA) is one of the basic methods that is predominantly
used to generate topics. One of the problems with methods like LDA is that
users who apply them may not understand the topics that are generated. Also,
users may find it difficult to search correlated topics and correlated
documents. LDAExplore, tries to alleviate these problems by visualizing topic
and word distributions generated from the document corpus and allowing the user
to interact with them. The system is designed for users, who have minimal
knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a
pilot study which uses the abstracts of 322 Information Visualization papers,
where every abstract is considered a document. The topics generated are then
explored by users. The results show that users are able to find correlated
documents and group them based on topics that are similar
Designing a speech corpus for instancebased spoken language generation
In spoken language applications such as conversation systems where not only the speech waveforms but also the content of the speech (the text) need to be generated automatically, a Concept-to-Speech (CTS) system is needed. In this paper, we address several issues on designing a speech corpus to facilitate an instance-based integrated CTS framework. Both the instance-based CTS generation approach and the corpus design process have not been addressed systematically in previous researches
- …