Search CORE

160 research outputs found

An annotated bibliography on document processing

Author: Vliet J.C. (Hans) van
Warmer J.B.
Publication venue: CWI
Publication date: 01/01/1986
Field of study

Data DNA: The Next Generation of Statistical Metadata

Author: Cynthia M. Taeuber
Daniel W. Gillman
Laura Smith
Publication venue: 'Brookings Institution Press'
Publication date: 03/03/2007
Field of study

Describes the components of a complete statistical metadata system and suggests ways to create and structure metadata for better access and understanding of data sets by diverse users

IssueLab

Conjunctive programming: An interactive approach to software system synthesis

Author: Tausworthe Robert C.
Publication venue
Publication date
Field of study

This report introduces a technique of software documentation called conjunctive programming and discusses its role in the development and maintenance of software systems. The report also describes the conjoin tool, an adjunct to assist practitioners. Aimed at supporting software reuse while conforming with conventional development practices, conjunctive programming is defined as the extraction, integration, and embellishment of pertinent information obtained directly from an existing database of software artifacts, such as specifications, source code, configuration data, link-edit scripts, utility files, and other relevant information, into a product that achieves desired levels of detail, content, and production quality. Conjunctive programs typically include automatically generated tables of contents, indexes, cross references, bibliographic citations, tables, and figures (including graphics and illustrations). This report presents an example of conjunctive programming by documenting the use and implementation of the conjoin program

NASA Technical Reports Server

Multiple Format Dynamic Document Generation of Project Gutenberg Texts

Author: Perkins Brandon Dean
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/04/2003
Field of study

Project Gutenberg consists of over seven thousand eBooks of various and inconsistent file formats. While most books are accessible to all through ASCII format, this not necessarily the preferred format for all users of Project Gutenberg texts. The proposed solution to this problem consists of multiple parts to create a comprehensive package that can be implemented in a production environment. This project demonstrates how converting Project Gutenberg eBooks to XML (eXtensible Markup Language), using a standard DTD (Document Type Definition) or XML Schema, creates the opportunity for all available texts to automatically become available in multiple formats by applying stylesheets to the XML. These formats can include, but are not limited to, HTML (HyperText Markup Language), plain text, or PDF (Portable Document Format). This should provide a framework for future Project Gutenberg collection development

Carolina Digital Repository

X-Databases - The Integration of XML into Enterprise Database Management Systems

Author: Davis Leah
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/10/2000
Field of study

An examination of how the eXtensible Markup Language (XML) and database management systems (DBMS) fit together, and current approaches to providing database technologies that support XML. Analysis of how XML is being deployed in four classes of XML Database (X-Database) applications provides a basis for understanding the direction of X-Database technology and associated standards. In a simple implementation, an XML Document Type Definition (DTD) is mapped to relational structures, and XML data are stored in a DBMS (Oracle8i). Sample queries are presented to retrieve XML from the database. A middleware tool (XSQL Java Servlet) is used to transform query results into records on a Web page. The results demonstrate that relational databases require data to be rigidly mapped to relational structures. The paper concludes by exploring future challenges to integrating XML and DTDs with X-Databases, which establishes the need for a more "native" integration approach

Carolina Digital Repository

Digital document imaging systems: An overview and guide

Author
Publication venue
Publication date
Field of study

This is an aid to NASA managers in planning the selection of a Digital Document Imaging System (DDIS) as a possible solution for document information processing and storage. Intended to serve as a manager's guide, this document contains basic information on digital imaging systems, technology, equipment standards, issues of interoperability and interconnectivity, and issues related to selecting appropriate imaging equipment based upon well defined needs

NASA Technical Reports Server

Monikanavainen palveluntuotanto

Author: Ristimäki Juha
Publication venue
Publication date: 01/01/2002
Field of study

Aaltodoc Publication Archive

NetPDL: An Extensible XML-Based Language for Packet Header Description

Author: BALDI M
RISSO F.
Publication venue: 'Elsevier BV'
Publication date
Field of study

Although several applications need to know the format of network packets to perform their tasks, till now, each application uses its own packet description database. This paper addresses this problem by proposing the NetPDL, an XML-based language for describing packet headers, which has the potential of enabling the realization of a common, application-independent protocol description database that can be shared among several applications. Further, common functionalities related to the protocol database can be implemented in a library, which can be a basic building block for implementing networking applications

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Topics in Language Resources for Translation and Localisation (Chapter : Standardising the Management and the Representation of Multilingual Data : the Multi Lingual Information Framework)

Author: Bellalem Nadia
Cruz-Lara Samuel
Ducret Julien
Krammer Isabelle
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2008
Field of study

Due to the critical role that normalization plays during the translation and localization processes, we propose here to analyze some standards, as well as the related software tools that are used by professional translators and by several automatic translating services. We will first point out the importance of normalization within the translation and localization activities. Next, we will introduce a methodology of standardization, whose objective is to harmonize the management and the representation of multilingual data. Without a doubt, the control of the interoperability between the industrial standards currently used for localization [XLIFF], translation memory [TMX], or with some recent initiatives such as the internationalization tag set [ITS], constitutes a major objective for a coherent and global management of multilingual data. The Multi Lingual Information Framework MLIF [ISO AWI 24616] is based on a methodology of standardization resulting from the ISO (sub-committees TC37/SC3 "Computer Applications for Terminology" and SC4 "Language Resources Management"). MLIF aims at proposing a high-level abstract specification platform for a computer-oriented representation of multilingual data within a large variety of applications such as translation memories, localization, computer-aided translation, multimedia, or electronic document management

INRIA a CCSD electronic archive server

WAQS : a web-based approximate query system

Author: Chang George Jyh-Shian
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2001
Field of study

The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval. In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language. Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation

Digital Commons @ New Jersey Institute of Technology (NJIT)