254 research outputs found
The LaTeX project: A case study of open-source software
This is a case study of TeX, a typesetting software that was developed by Donald E. Knuth in the late 70's. Released with an open source license, it has become a reference in scientific publishing. TeX is now used to typeset and publish much of the world's scientific literature in physics and mathematics. This case study is part of a wider effort by academics to understand the open-source phenomenon. That development model is similar to the organization of the production of knowledge in academia; there is no set organization with a hierarchy, but free collaboration that is coordinated spontaneously and winds up generating complex products that are the property of all who can understand its functioning. The case study was led by gathering qualitative data via interviews with TeX developers and quantitative data on the TeX community -- the program's code, the software that is part of the TeX distribution, the newsgroups dedicated to the software, and many other indicators of the evolution and activity in that open-source project. The case study is aimed at economists who want to develop models to understand and analyze the open-source phenomenon. It is also geared towards policy-makers who would like to encourage or regulate open- source, and towards open-source developers who wonder what are the efficient strategies to make an open-source project successful.TeX, LaTeX, case study, open source, software, innovation, organisational structure, economic history, knowledge production, knowledge diffusion.
Algebraic specification of documents
According to recent research, nearly 95 percent of a corporate information is
stored in documents.
Further studies indicate that companies spent between 6 and 10 percent of their
gross revenues printing and distributing documents in several ways:
web and cdrom publishing, database storage and retrieval and printing.
In this context documents exist in some different formats, from pure ascii files
to internal database or text processor formats.
It is clear that document reusability and low-cost maintenance are two important issues in the near future.
The majority of available document processors
is purpose-oriented, reducing the necessary flexibility and reusability of
documents.
Some waste of time arises from adapting the same text to different purposes.
For example you may want to have the same document as an article
as a set of slides or as a poster; or you can have a dictionnary document
producing a book and a list of words for a spell-checker.
This conversion could be done automatically from the first version of the
document if it complies some standard requirements.
The key idea will be to keep a complete separation between syntax and
semantics. In this way we produce an abstract description separating conceptual
issues from those concerned with the use.
This note proposes a few guidelines to build a system to solve the
above problem.
Such a system should be an algebraic based environment and provide
facilities for:
- Document type definitions;
- Definition of functions over document types;
- Document definitions as algebraic terms.
This approach (rooted in the tradition of constructive algebraic
specification), will allow for homogeneous environment to
deal with operations such as merging documents, converting
formats,
translating documents, extracting different kinds of
information (to set up information repositories, data bases, or semantic
networks) or portions of documents (as it happens, for instance, in
literate programming), and some other actions, not so traditional,
like mail reply, or memo production.
We intend to use CAMILA (a specification language and prototyping
environment developed at Universidade do Minho, by the Computer Science
group) to develop the above mentioned system
Document semantics: Two approaches
SGML introduced DTD idea to formally describe document syntax and structure.
One of its main characteristics is the fact of being purely declarative
and fully independent of the future document's processing (typesetting,
formatting, translation/transformation).
In this context, SGML has become the international standard to be
followed.
Sooner or later, a document has to be processed. In order to do that we
need to associate semantics to the document's structure.
In a compiler context, normally we separate semantics in two, static and
dynamic.
Establishing a parallelism with document processing, we can think of the
document's decorated tree (as recognized by a SGML analyzer) as being the
static semantics and document's tree transformation and/or reaction
as dynamic semantics.
Pursuing this idea, we will present and discuss a study of the
relationship between SGML, DAST (Decorated Abstract
Syntax Tree), and Algebraic Specification tools, in order to better
understand how to formally process documents in general and how to
specify and build generic document processing tools
Literate Statistical Practice
Literate Statistical Practice (LSP, Rossini, 2001) describes an approach for creating self-documenting statistical results. It applies literate programming (Knuth, 1992) and related techniques in a natural fashion to the practice of statistics. In particular, documentation, specification, and descriptions of results are written concurrently with writing and evaluation of statistical programs. We discuss how and where LSP can be integrated into practice and illustrate this with an example derived from an actual statistical consulting project. The approach is simplified through the use of a comprehensive, open source toolset incorporating Noweb, Emacs Speaks Statistics (ESS), Sweave (Ramsey, 1994; Rossini, et al, 2002; Leisch, 2002; Ihaka and Gentlemen, 1996). We conclude with an assessment of LSP for the construction of reproducible, auditable, and comprehensible statistical analyses
Conjunctive programming: An interactive approach to software system synthesis
This report introduces a technique of software documentation called conjunctive programming and discusses its role in the development and maintenance of software systems. The report also describes the conjoin tool, an adjunct to assist practitioners. Aimed at supporting software reuse while conforming with conventional development practices, conjunctive programming is defined as the extraction, integration, and embellishment of pertinent information obtained directly from an existing database of software artifacts, such as specifications, source code, configuration data, link-edit scripts, utility files, and other relevant information, into a product that achieves desired levels of detail, content, and production quality. Conjunctive programs typically include automatically generated tables of contents, indexes, cross references, bibliographic citations, tables, and figures (including graphics and illustrations). This report presents an example of conjunctive programming by documenting the use and implementation of the conjoin program
The Investigation of an implementation of SGML based publishing of an graduate thesis
The Standard Generalized Markup Language (SGML) has been the International Organization of Standardization (ISO) published standard for text interchange for nearly a decade. Since 1986, SGML based publishing has been successfully implemented in many fields, notably those industries with massive and mission-critical publishing operations such as the military, legal, medical, and heavy industries. SGML based publishing differs from the WYSIWYG paradigm of desktop publishing in that an SGML document contains descriptive, structural markup rather than specific formatting markup. Specific markup describes the appearance of a document and is usually a proprietary code which makes the document difficult to re-use or interchange to different systems. The structurally generic markup codes in an SGML document allow the fullest exploitation of the information. An SGML document exhibits more re-usability than a document created and stored in a proprietary formatting code. In many cases, workflow and production are greatly improved by the implementation of SGML based publishing. Historical and anecdotal case studies of many applications clearly delineate the benefits of an SGML based publishing system. And certainly, the boom in Web publishing has spurred interest in enabling a publishing system with multi-output functionality. However, implementation is associated with high costs. The acquisition of new tools and new skills is a costly investment. A careful cost-benefit analysis must determine that the current publishing needs would be satisfied by moving to SGML. Increased productivity is the measure by which SGML is adopted. The purpose of this thesis project is to investigate the relative benefits and requirements of a simple SGML based publishing implementation. The graduate thesis for most of the School of Printing Management and Sciences at the Rochester Institute of Technology was used as an example. The author has expanded the requirements for the publication process of a graduate thesis with factors which do not exist in reality. The required output has been expanded from mere print output to include publishing on the World Wide Web (WWW) in the Hypertext Markup Language (HTML), and to some proprietary electronic browser such as Folio Views for inclusion in a searchable collection of graduate theses on CD-ROM. A proposed set of tools and methods are discussed in order to clarify the requirements of such an SGML implementation
TEI and LMF crosswalks
The present paper explores various arguments in favour of making the Text
Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO
standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the
issues that would have to be resolved in order to reach an appropriate
implementation of these ideas, in particular in terms of infor-mational
coverage. We show how the customisation facilities offered by the TEI
guidelines can provide an adequate background, not only to cover missing
components within the current Dictionary chapter of the TEI guidelines, but
also to allow specific lexical projects to deal with local constraints. We
expect this proposal to be a basis for a future ISO project in the context of
the on going revision of LMF
TBX goes TEI -- Implementing a TBX basic extension for the Text Encoding Initiative guidelines
This paper presents an attempt to customise the TEI (Text Encoding
Initiative) guidelines in order to offer the possibility to incorporate TBX
(TermBase eXchange) based terminological entries within any kind of TEI
documents. After presenting the general historical, conceptual and technical
contexts, we describe the various design choices we had to take while creating
this customisation, which in turn have led to make various changes in the
actual TBX serialisation. Keeping in mind the objective to provide the TEI
guidelines with, again, an onomasiological model, we try to identify the best
comprise in maintaining both the isomorphism with the existing TBX Basic
standard and the characteristics of the TEI framework
- âŠ