289 research outputs found
UNICODE
We review coding of multi-language text in digital form using Unicode standard, with special attention to
UTF-8 variant, which is the most convenient variant for coding latin text. We also give a short tutorial for
using UTF-8 in Microsoft Word, Netscape Composer and text editor Kate. Standard Unicode fonts are
recommended so that the texts can be easily transfered from a computer to another one or for publishing
on Internet
Duncode Characters Shorter
This paper investigates the employment of various encoders in text
transformation, converting characters into bytes. It discusses local encoders
such as ASCII and GB-2312, which encode specific characters into shorter bytes,
and universal encoders like UTF-8 and UTF-16, which can encode the complete
Unicode set with greater space requirements and are gaining widespread
acceptance. Other encoders, including SCSU, BOCU-1, and binary encoders,
however, lack self-synchronizing capabilities. Duncode is introduced as an
innovative encoding method that aims to encode the entire Unicode character set
with high space efficiency, akin to local encoders. It has the potential to
compress multiple characters of a string into a Duncode unit using fewer bytes.
Despite offering less self-synchronizing identification information, Duncode
surpasses UTF8 in terms of space efficiency. The application is available at
\url{https://github.com/laohur/duncode}. Additionally, we have developed a
benchmark for evaluating character encoders across different languages. It
encompasses 179 languages and can be accessed at
\url{https://github.com/laohur/wiki2txt}
Internet X.509 Public Key Infrastructure Operational Protocols -- LDAPv3
This document describes the features of the Lightweight Directory Access Protocol v3 that are needed in order to support a public key infrastructure based on X.509 certificates and CRLs
Anforderungsanalyse zur Mehrsprachigkeit eines Web-Content-Management-Systems
\u27Think global act local!\u27 Ein bekannter Spruch, der im World Wide Web seine Gültigkeit nicht verloren hat. Im Zuge der zunehmenden Globalisierung wächst die Notwendigkeit für einen internationalen mehrsprachigen Web-Auftritt, der auf die jeweilige Zielgruppe lokalisiert zugeschnitten wird. Für den Anbieter einer globalen Web Site stellen sich verschiedene Probleme und Aufgaben. Eine globale Web Site zu erstellen heißt unter anderem, kulturelle Unterschiede zu erkennen und entsprechend in der E-Business-Strategie zu berücksichtigen. Ziel des Arbeitspapiers ist es, grundlegende Anforderungen der Mehrsprachenfähigkeit einer Web Site und daraus resultierend an ein WCMS abzuleiten. Im zweiten Kapitel werden die Implikationen der Globalisierung auf eine Web Site dargestellt, um daraus Anforderungen und Vorgehensweisen für die Gestaltung einer Web Site abzuleiten. Darauf aufbauend werden die grundlegende Struktur von WCMS und die Unterstützungsmöglichkeiten bei der Gestaltung einer mehrsprachigen Web Site durch WCMS dargestellt. Im dritten Kaptitel werden die grundlegenden Anforderungen an ein mehrsprachiges WCMS erarbeitet. Dazu werden die aufgabenspezifischen Anforderungen an eine mehrsprachige Web Site und daraus abgeleitet an ein WCMS beschrieben. Abschließend werden die technikspezifischen Anforderungen näher untersucht
UNICODE
We review coding of multi-language text in digital form using Unicode standard, with special attention to
UTF-8 variant, which is the most convenient variant for coding latin text. We also give a short tutorial for
using UTF-8 in Microsoft Word, Netscape Composer and text editor Kate. Standard Unicode fonts are
recommended so that the texts can be easily transfered from a computer to another one or for publishing
on Internet
The Open Navigation Surface Project
Many hydrographic and oceanographic agencies have moved or are moving towards gridded bathymetric products. However, there is no accepted format to allow these grids to be exchanged while maintaining data and metadata integrity. This paper describes the Open Navigation Surface (ONS) Project, which aims to fill this gap. The ONS Project is an open-source software project designed to provide a freely available, portable source-code library to encapsulate gridded bathymetric surfaces with associated uncertainty values. The data file format is called a Bathymetric Attributed Grid (BAG). The BAG is developed and maintained by the ONS Working Group (ONSWG), and the source code is available via the ONS websit
Chinese localisation of Evergreen: an open source integrated library system
Purpose - The purpose of this paper is to investigate various issues related to Chinese language localisation in Evergreen, an open source integrated library system (ILS).
Design/methodology/approach - A Simplified Chinese version of Evergreen was implemented and tested and various issues such as encoding, indexing, searching, and sorting specifically associated with Simplified Chinese language were investigated.
Findings - The paper finds that Unicode eases a lot of ILS development problems. However, having another language version of an ILS does not simply require the translation from one language to another. Indexing, searching, sorting and other locale related issues should be tackled not only language by language, but locale by locale.
Practical implications - Most of the issues that have arisen during this project will be found with other ILS-like systems.
Originality/value - This paper provides insights into issues of, and various solutions to, indexing, searching, and sorting in the Chinese language in an ILS. These issues and the solutions may be applicable to other digital library systems such as institutional repositories
- …