Search CORE

526 research outputs found

Holistic corpus-based dialectology

This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

The University of Manchester - Institutional Repository

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Author: Falke Tobias
Gurevych Iryna
Publication venue
Publication date: 21/07/2017
Field of study

Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.Comment: Published at EMNLP 201

arXiv.org e-Print Archive

TUbiblio

Lexical typology through similarity semantics: Toward a semantic map of motion verbs

Author: Cysouw Michael
Wälchli Bernhard
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2012
Field of study

This paper discusses a multidimensional probabilistic semantic map of lexical motion verb stems based on data collected from parallel texts (viz. translations of the Gospel according to Mark) for 100 languages from all continents. The crosslinguistic diversity of lexical semantics in motion verbs is illustrated in detail for the domain of `go', `come', and `arrive' type contexts. It is argued that the theoretical bases underlying probabilistic semantic maps from exemplar data are the isomorphism hypothesis (given any two meanings and their corresponding forms in any particular language, more similar meanings are more likely to be expressed by the same form in any language), similarity semantics (similarity is more basic than identity), and exemplar semantics (exemplar meaning is more fundamental than abstract concepts)

Crossref

Open Access LMU

MPG.PuRe

Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis

Author: King Simon
Usabaev Bela
Watts Oliver
Yamagishi Junichi
Publication venue
Publication date: 01/01/2010
Field of study

In speaker-adaptive HMM-based speech synthesis, there are typically a few speakers for which the output synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from within the same corpus. This paper investigates these fluctuations in quality and concludes that as melcepstral distance from the average voice becomes larger, the MOS naturalness scores generally become worse. Although this negative correlation is not that strong, it suggests a way to improve the training and adaptation strategies. We also draw comparisons between our findings and the work of other researchers regarding ``vocal attractiveness.'

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

ShakerVis: Visual analysis of segment variation of German translations of Shakespeare's Othello

Author: Bob Laramee
Tom Cheesman
Publication venue: 'SAGE Publications'
Publication date: 23/07/2013
Field of study

Cronfa at Swansea University

Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

Author: Asaei Afsaneh
Bourlard Herve
Garner Philip N.
Parhizkar Reza
Taghizadeh Mohammad J.
Publication venue
Publication date: 07/03/2014
Field of study

This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean distance space. This approach confines the recovered matrix to the EDM cone at each iteration of the matrix completion algorithm. The theoretical guarantees of the calibration performance are obtained considering the random and locally structured missing entries as well as the measurement noise on the known distances. This study elucidates the links between the calibration error and the number of microphones along with the noise level and the ratio of missing distances. Thorough experiments on real data recordings and simulated setups are conducted to demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad hoc microphone array calibration.Comment: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Author: Mei Qiaozhu
Qu Meng
Tang Jian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/08/2015
Field of study

Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the \textit{predictive text embedding} (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents, but also has a strong predictive power for the particular task. Compared to recent supervised approaches based on convolutional neural networks, predictive text embedding is comparable or more effective, much more efficient, and has fewer parameters to tune.Comment: KDD 201

arXiv.org e-Print Archive

Using Gabmap

Author: Leinonen Therese
Nerbonne John
Çöltekin Çağrı
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/07/2016
Field of study

AbstractGabmap is a freely available, open-source web application that analyzes the data of language variation, e.g. varying words for the same concepts, varying pronunciations for the same words, or varying frequencies of syntactic constructions in transcribed conversations. Gabmap is an integrated part of CLARIN (see e.g. http://portal.clarin.nl). This article summarizes Gabmap's basic functionality, adding material on some new features and reporting on the range of uses to which Gabmap has been put. Gabmap is modestly successful, and its popularity underscores the fact that the study of language variation has crossed a watershed concerning the acceptability of automated language analysis. Automated analysis not only improves researchers’ efficiency, it also improves the replicability of their analyses and allows them to focus on inferences to be drawn from analyses and other more abstract aspects of that study

Elsevier - Publisher Connector

Variation-based distance and similarity modeling:a case study in world Englishes

Author: Grafmiller Jason
Rosseel Laura
Szmrecsanyi Benedikt
Publication venue
Publication date: 05/11/2019
Field of study

University of Birmingham Research Portal