782 research outputs found

    Identifying prosodic prominence patterns for English text-to-speech synthesis

    Get PDF
    This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word informativeness and salience, on prosodic prominence. To improve prosodic prominence prediction we first follow up the classic approach in which prosodic prominence patterns are flattened into binary sequences of pitch accented and pitch unaccented words. We propose and motivate statistic and syntactic dependency based features that are complementary to the most predictive features proposed in previous works on automatic pitch accent prediction and show their utility on both read and spontaneous speech. Different accentuation patterns can be associated to the same sentence. Such variability rises the question on how evaluating pitch accent predictors when more patterns are allowed. We carry out a study on prosodic symbols variability on a speech corpus where different speakers read the same text and propose an information-theoretic definition of optionality of symbolic prosodic events that leads to a novel evaluation metric in which prosodic variability is incorporated as a factor affecting prediction accuracy. We additionally propose a method to take advantage of the optionality of prosodic events in unit-selection speech synthesis. To better account for the tight links between the prosodic prominence of a word and the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy and is devoted to a novel task, the automatic detection of contrast, where contrast is meant as a (Information Structure’s) relation that ties two words that explicitly contrast with each other. This task is mainly motivated by the fact that contrastive words tend to be prosodically marked with particularly prominent pitch accents. The identification of contrastive word pairs is achieved by combining lexical information, syntactic information (which mainly aims to identify the syntactic parallelism that often activates contrast) and semantic information (mainly drawn from the Word- Net semantic lexicon), within a Support Vector Machines classifier. Once we have identified patterns of prosodic prominence we propose methods to incorporate such information in TTS synthesis and test its impact on synthetic speech naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent distinction in Hidden Markov Model based speech synthesis while highlight the importance of contrastive accents

    Exploratory visual text analytics in the scientific literature domain

    Get PDF

    Web knowledge bases

    Get PDF
    Knowledge is key to natural language understanding. References to specific people, places and things in text are crucial to resolving ambiguity and extracting meaning. Knowledge Bases (KBs) codify this information for automated systems — enabling applications such as entity-based search and question answering. This thesis explores the idea that sites on the web may act as a KB, even if that is not their primary intent. Dedicated kbs like Wikipedia are a rich source of entity information, but are built and maintained at an ongoing cost in human effort. As a result, they are generally limited in terms of the breadth and depth of knowledge they index about entities. Web knowledge bases offer a distributed solution to the problem of aggregating entity knowledge. Social networks aggregate content about people, news sites describe events with tags for organizations and locations, and a diverse assortment of web directories aggregate statistics and summaries for long-tail entities notable within niche movie, musical and sporting domains. We aim to develop the potential of these resources for both web-centric entity Information Extraction (IE) and structured KB population. We first investigate the problem of Named Entity Linking (NEL), where systems must resolve ambiguous mentions of entities in text to their corresponding node in a structured KB. We demonstrate that entity disambiguation models derived from inbound web links to Wikipedia are able to complement and in some cases completely replace the role of resources typically derived from the KB. Building on this work, we observe that any page on the web which reliably disambiguates inbound web links may act as an aggregation point for entity knowledge. To uncover these resources, we formalize the task of Web Knowledge Base Discovery (KBD) and develop a system to automatically infer the existence of KB-like endpoints on the web. While extending our framework to multiple KBs increases the breadth of available entity knowledge, we must still consolidate references to the same entity across different web KBs. We investigate this task of Cross-KB Coreference Resolution (KB-Coref) and develop models for efficiently clustering coreferent endpoints across web-scale document collections. Finally, assessing the gap between unstructured web knowledge resources and those of a typical KB, we develop a neural machine translation approach which transforms entity knowledge between unstructured textual mentions and traditional KB structures. The web has great potential as a source of entity knowledge. In this thesis we aim to first discover, distill and finally transform this knowledge into forms which will ultimately be useful in downstream language understanding tasks

    Assessing mental wellbeing in urban areas using social media data: understanding when and where urbanites stress and de-stress

    Get PDF
    Are Americans more stressed out by living in dense, urbanized areas or less dense, car-oriented areas? To answer this question, can we use people's expressions of stress in different environments to understand what kinds of spaces help them de-stress? This study uses stress levels of geolocated tweets to help us answer such inquiries and resolve the longstanding disparities between the field of psychology and urban planning about mental health impacts of cities. This is important because more than 75 percent of Americans are moderately stressed. Long-term stress is associated with mental health disorders, including sleeplessness, anxiety, and depression. Additionally, chronic stress is linked to physical ailments, including high blood pressure, cardiovascular diseases, and diabetes. The psychology literature claims that urban areas witness elevated levels of mental health problems, manifested as stress, mood disorders, and anxiety issues. Density, crowding, traffic, crime, and pollution are identified as stressors associated with urban living conditions. Contending this claim, the urban planning literature positions stress in the context of longer commutes, lack of accessibility, and social isolation that comes with suburban living conditions. Urban Planners and urban designers have advocated for density. With rapid urbanization, 60 percent of the world population will live in urban areas by 2030, making it crucial for urban planners to address these disparities to support the mental wellbeing of the urbanites. This research uses multi-headed attention transformer model to classify tweets (token sequences), and assesses the stress levels of custom-defined assessment grids of ten acres within the city area of Atlanta and Boston. The assessed stress level of these assessment grids is called the mental wellbeing score (MWS). Mental wellbeing score is defined in this research as a measure of `mental wellbeing' of any given grid (higher score is better). Using this measure, the research investigates the relationship between mental wellbeing and built environment characteristics in urban areas to uncover the impact of long-term stress triggered by the conditions of the built environment in urban settings. In summary, the results of the exploration shed light on three critical aspects: 1. Mental wellbeing score increases with increasing urbanness. 2. The mental wellbeing score increases with the increase in the diversity of escape facilities, including green parks, open spaces, and other points of interest. 3. The mental wellbeing score is positively impacted by accessible high-density spaces with high symbolic value. The research also investigates the impact of safety perception and socio-economic status on mental wellbeing scores. The results show that addressing socio-economic disparity, crime, and investment in green infrastructure can improve mental wellbeing of urbanites. The methods and findings of the research show that 'urban areas' can positively impact mental health if designed appropriately. Furthermore, this study can empower urban planners and policymakers to develop tools to assess the mental wellbeing of urbanites, adjust infrastructure needs, and improve the urban amenities that support mental wellbeing.Ph.D

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Gesture in Automatic Discourse Processing

    Get PDF
    Computers cannot fully understand spoken language without access to the wide range of modalities that accompany speech. This thesis addresses the particularly expressive modality of hand gesture, and focuses on building structured statistical models at the intersection of speech, vision, and meaning.My approach is distinguished in two key respects. First, gestural patterns are leveraged to discover parallel structures in the meaning of the associated speech. This differs from prior work that attempted to interpret individual gestures directly, an approach that was prone to a lack of generality across speakers. Second, I present novel, structured statistical models for multimodal language processing, which enable learning about gesture in its linguistic context, rather than in the abstract.These ideas find successful application in a variety of language processing tasks: resolving ambiguous noun phrases, segmenting speech into topics, and producing keyframe summaries of spoken language. In all three cases, the addition of gestural features -- extracted automatically from video -- yields significantly improved performance over a state-of-the-art text-only alternative. This marks the first demonstration that hand gesture improves automatic discourse processing

    From user-generated text to insight context-aware measurement of social impacts and interactions using natural language processing

    Get PDF
    Recent improvements in information and communication technologies have contributed to an increasingly globalized and connected world. The digital data that are created as the result of people's online activities and interactions consist of different types of personal and social information that can be used to extract and understand people's implicit or explicit beliefs, ideas, and biases. This thesis leverages methods and theories from natural language processing and social sciences to study and analyze the manifestations of various attributes and signals, namely social impacts, personal values, and moral traits, in user-generated texts. This work provides a comprehensive understanding of people's viewpoints, social values, and interactions and makes the following contributions. First, we present a study that combines review mining and impact assessment to provide an extensive discussion on different types of impact that information products, namely documentary films, can have on people. We first establish a novel impact taxonomy and demonstrate that, with a rigorous analysis of user-generated texts and a theoretically grounded codebook, classification schema, and prediction model, we can detect multiple types of (self-reported) impact in texts and show that people's language can help in gaining insights about their opinions, socio-cultural information, and emotional states. Furthermore, the results of our analyses show that documentary films can shift peoples' perceptions and cognitions regarding different societal issues, e.g., climate change, and using a combination of informative features (linguistic, syntactic, and psychological), we can predict impact in sentences with high accuracy. Second, we investigate the relationship between principles of human morality and the expression of stances in user-generated text data, namely tweets. More specifically, we first introduce and expand the Moral Foundations Dictionary and operationalize moral values to enhance the measurement of social effects. In addition, we provide detailed explanation on how morality and stance are associated in user-generated texts. Through extensive analysis, we show that discussions related to various social issues have distinctive moral and lexical profiles, and leveraging moral values as an additional feature can lead to measurable improvements in prediction accuracy of stance analysis. Third, we utilize the representation of emotional and moral states in texts to study people's interactions in two different social networks. Moreover, we first expand the analysis of structural balance to include direction and multi-level balance assessment (triads, subgroups, and the whole network). Our results show that analyzing different levels of networks and using various linguistic cues can grant a more inclusive view of people and the stability of their interactions; we found that, unlike sentiments, moral statuses in discussions stay balanced throughout the networks even in the presence of tension. Overall, this thesis aims to contribute to the emerging field of "social" NLP and broadens the scope of research in it by (1) utilizing a combination of novel taxonomies, datasets, and tools to examine user-generated texts and (2) providing more comprehensive insights about human language, cultures, and experiences

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    • 

    corecore