2,906 research outputs found

    Segmenting broadcast news streams using lexical chains

    Get PDF
    In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our system SeLeCT first builds a set of lexical chains, in order to model the discourse structure of the text. A boundary detector is then used to search for breaking points in this structure indicated by patterns of cohesive strength and weakness within the text. We evaluate this technique on a test set of concatenated CNN news story transcripts and compare it with an established statistical approach to segmentation called TextTiling

    Research methods and intelligibility studies

    Full text link
    This paper first briefly reviews the concept of intelligibility as it has been employed in both English as a Lingua Franca (ELF) and world Englishes (WE) research. It then examines the findings of the Lingua Franca Core (LFC), a list of phonological features that empirical research has shown to be important for safeguarding mutual intelligibility between non-native speakers of English. The main point of the paper is to analyse these findings and demonstrate that many of them can be explained if three perspectives (linguistic, psycholinguistic and historical-variationist) are taken. This demonstration aims to increase the explanatory power of the concept of intelligibility by providing some theoretical background. An implication for ELF research is that at the phonological level, internationally intelligible speakers have a large number of features in common, regardless of whether they are non-native speakers or native speakers. An implication for WE research is that taking a variety-based, rather than a features-based, view of phonological variation and its connection with intelligibility is likely to be unhelpful, as intelligibility depends to some extent on the phonological features of individual speakers, rather than on the varieties per se

    Text mining without document context

    Get PDF
    We consider a challenging clustering task: the clustering of muti-word terms without document co-occurrence information in order to form coherent groups of topics. For this task, we developed a methodology taking as input multi-word terms and lexico-syntactic relations between them. Our clustering algorithm, named CPCL is implemented in the TermWatch system. We compared CPCL to other existing clustering algorithms, namely hierarchical and partitioning (k-means, k-medoids). This out-of-context clustering task led us to adapt multi-word term representation for statistical methods and also to refine an existing cluster evaluation metric, the editing distance in order to evaluate the methods. Evaluation was carried out on a list of multi-word terms from the genomic field which comes with a hand built taxonomy. Results showed that while k-means and k-medoids obtained good scores on the editing distance, they were very sensitive to term length. CPCL on the other hand obtained a better cluster homogeneity score and was less sensitive to term length. Also, CPCL showed good adaptability for handling very large and sparse matrices

    The semantic drift of quotations in blogspace: a case study in short-term cultural evolution

    Get PDF
    First revision (major) for Cognitive ScienceWe present an empirical case study which connects psycholinguistics with the field of cultural evolution, in order to test for the existence of cultural attractors in the evolution of quotations. Such attractors have been proposed as a useful concept for understanding cultural evolution in relation with individual cognition, but their existence has been hard to test. We focus on the transformation of quotations when they are copied from blog to blog or media website: by coding words with a number of well-studied lexical features, we show that the way words are substituted in quotations is consistent (1) with the hypothesis of cultural attractors, and (2) with known effects of the word features. In particular, words known to be harder to recall in lists have a higher tendency to be substituted, and words easier to recall are produced instead. Our results support the hypothesis that cultural attractors can result from the combination of individual cognitive biases in the interpretation and reproduction of representations

    Phonological features of Hong Kong English : patterns of variation and effects on local acceptability

    Full text link
    The changing dynamics of international communication in English have led to a intense questioning of the relevance of native-speaker pronunciation models in language teaching and testing. In addition, the World Englishes approach to local varieties has increased their level of recognition. Both of these developments suggest that English pronunciation models need to be reviewed, and Hong Kong represents an interesting case study. Although it has been claimed that Hong Kong English is at the ‘nativization’ stage, the existence of exonormative attitudes towards English is also well known. Two important questions arise from this inherent tension, neither of which has been intensively addressed in previous studies. Firstly, although many of the features of Hong Kong English pronunciation have been described, patterns of inter-speaker variation have not been investigated in detail. Secondly, the attitudes of Hong Kong English users towards the phonological features of their own variety have not been studied in ways that take account of such variation. This dissertation addresses both of these questions by being features-based in approach and using local listeners to evaluate accent samples. After an initial review of the features of Hong Kong English pronunciation, a preliminary study surveys the occurrence of consonantal phonological features within a mini-corpus of speech samples taken from local television programmes. Its findings are presented in the form of an implicational scale, which not only shows the relative frequencies with which different features occurred, but also indicates the existence of implicational patterns of co-occurrence. In the main study, twelve authentic accent samples (eleven Hong Kong speakers and one British speaker) were presented to 52 first-year undergraduate students for evaluation as to their acceptability, defined here as acceptability for pedagogical purposes. Multivariate statistical analysis discovered firstly that phonological ‘errors’, as marked by the student listeners, were the most important measured factor in determining the acceptability scores, and secondly that only certain types of ‘error’ or ‘feature’ had significant effects. These features were either related to L1 transfer or involved other salient phenomena such as idiosyncratic alterations to syllable structure. The explanatory part of the study includes acceptability as one of the factors determining feature persistence, in an ‘ecological’ or ‘evolutionary’ model of L2 phonology acquisition and development that combines the findings of the preliminary and main studies. Among the other factors that determine feature persistence or disappearance, salience, intelligibility and markedness are invoked as important influences. The acceptability data also has pedagogical implications, in that local listeners did not give the British accent the highest acceptability rating. This contrasts with the findings of previous studies regarding the pedagogical acceptability of the Hong Kong English accent. However, the features-based approach indicates that only certain types of local accent were acceptable to these listeners, and that these accents were more, rather than less, ‘native-like’. In various ways, the study contributes to an understanding of accent variation and acceptability within a new variety of English

    On slips of the pen

    Get PDF
    No abstrac
    • 

    corecore