Search CORE

47 research outputs found

CANELC: constructing an e-language corpus

Author: Allan S.
Baron A.
Baron N.
Bates E.
Baym N.
Biber D.
Borau K.
Boyd D.
Carter R.A.
Chafe W.L.
Condon S.L.
Crystal D.
Crystal D.
Crystal D.
Danet B.
Dawn Knight
Halliday M.A.K.
Herring S.C.
Honeycutt C.
Horn C.
Hymes D.
Jones Q.
Klimt B.
Ko K.
Labov W.
Myers B.
Myers G.
Orasan C.
Puschmann C.
Rheingold H.
Ronald Carter
Shortis T.
Sutherland J.
Svenja Adolphs
Thurlow C.
Wilson A.
Zappavingna M.
Publication venue: 'Edinburgh University Press'
Publication date: 01/01/2014
Field of study

This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). 3 CANELC stands for Cambridge and Nottingham e-language Corpus. This corpus has been built as part of a collaborative project between The University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, tweets, discussion board content and private/business emails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At the present time the annotated corpus is only available to authors and researchers working for CUP and is not more generally available

Nottingham ePrints

Nottingham eTheses

Crossref

Online Research @ Cardiff

Repository@Nottingham

Effects of Fluency Training on the Application of Linguistic Operations in Writing

Author: A. Gelderen Van
A. Gelderen Van
A. Gelderen Van
A. Reber
Amos van Gelderen
B. VanPatten
C. Bereiter
C. Doughty
C. Sanz
D. Alamargot
D. McCutchen
D. McCutchen
D. McCutchen
D.B. Willingham
E. Gatbonton
F. Christensen
F. O’Hare
G. Hillocks
J. Fitzgerald
J. Gein Van de
J. Stevens
J.C. Mellon
J.H. Hulstijn
J.M. Norris
J.R. Anderson
J.R. Anderson
J.R. Hayes
J.R. Hayes
L. Flower
L. Squire
M. Jacobs
M. Long
M.H. Long
N. Ellis
N.A. Chenoweth
N.C. Ellis
N.C. Ellis
P. Robinson
P. Robinson
P. Snellings
R. DeKeyser
R. DeKeyser
R. DeKeyser
R. Graaff De
R. Schmidt
R. Schmidt
R. Schoonen
R.T. Kellogg
Ron Oostdam
S. Graham
S. Krashen
U. Schuurs
W.L. Chafe
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Old and Thee, uh, New

Author: Arnold J.E.
Barr D.J.
Chafe W.L.
Chafe W.L.
Rossion B.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Observation of cell shapes in wood cross-sections during water adsorption by confocal laser-scanning microscopy (CLSM)

Author: Chafe S.C.
James W.L.
Kollmann F.
McIntosh D.C.
Nakato K.
Pentoney R.E.
Sakagami H.
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Crossref

Tag Clouds for Displaying Semantics: The Case of Filmscripts

Author: Adam Ganz
Benzécri J.-P.
Chafe W.L.
Fionn Murtagh
Gladwell M.
Josiane Mothe
Kurt Englmeier
Stewart McKie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/05/2009
Field of study

We relate tag clouds to other forms of visualization, including planar or reduced dimensionality mapping, and Kohonen self-organizing maps. Using a modified tag cloud visualization, we incorporate other information into it, including text sequence and most pertinent words. Our notion of word pertinence goes beyond just word frequency and instead takes a word in a mathematical sense as located at the average of all of its pairwise relationships. We capture semantics through context, taken as all pairwise relationships. Our domain of application is that of filmscript analysis. The analysis of filmscripts, always important for cinema, is experiencing a major gain in importance in the context of television. Our objective in this work is to visualize the semantics of filmscript, and beyond filmscript any other partially structured, time-ordered, sequence of text segments. In particular we develop an innovative approach to plot characterization.Comment: 23 pages, 7 figure

arXiv.org e-Print Archive

Crossref

De Montfort University Open Research Archive