Character encoding in corpus construction.

McEnery, A. M.; Xiao, R. Z.

research

oai:eprints.lancs.ac.uk:60

Character encoding in corpus construction.

Authors: A. M. McEnery
R. Z. Xiao
Publication date: 1 January 2005
Publisher: AHDS

Abstract

This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction

Similar works

Full text

Open in the Core reader

Download PDF

Lancaster E-Prints

oai:eprints.lancs.ac.uk:60

Last time updated on 02/07/2012Provided by our Supporting member

This paper was published in Lancaster E-Prints.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.