3,452 research outputs found

    NLP and the Humanities: The Revival of an Old Liaison

    Get PDF
    This paper presents an overview of some\ud emerging trends in the application of NLP\ud in the domain of the so-called Digital Humanities\ud and discusses the role and nature\ud of metadata, the annotation layer that is so\ud characteristic of documents that play a role\ud in the scholarly practises of the humanities.\ud It is explained how metadata are the\ud key to the added value of techniques such\ud as text and link mining, and an outline is\ud given of what measures could be taken to\ud increase the chances for a bright future for\ud the old ties between NLP and the humanities.\ud There is no data like metadata

    The Hypertext Internet Connection: E-mail, Online Search, Gopher

    Get PDF
    In this paper we show how to handle and organize the large amount of information accessible through the Internet or other public communication networks in a hypertext environment. The C(K)onstance-Hypertext-System (KHS) uses typed units to indicate the differences and the content and structure of information, comprising text, forms, images pointers to external information. We show how to imbed Internet services, which usual require rather different interaction styles, such as point-to-point communication (e-mail query formulation (online databases) or browsing (Gopher) into the uniform interaction model of the KHS. The integration of Internet services in an open hypertext environment produces value-adding effects which are also discussed. (DIPF/Orig.

    Design issues in the production of hyper‐books and visual‐books

    Get PDF
    This paper describes an ongoing research project in the area of electronic books. After a brief overview of the state of the art in this field, two new forms of electronic book are presented: hyper‐books and visual‐books. A flexible environment allows them to be produced in a semi‐automatic way starting from different sources: electronic texts (as input for hyper‐books) and paper books (as input for visual‐books). The translation process is driven by the philosophy of preserving the book metaphor in order to guarantee that electronic information is presented in a familiar way. Another important feature of our research is that hyper‐books and visual‐books are conceived not as isolated objects but as entities within an electronic library, which inherits most of the features of a paper‐based library but introduces a number of new properties resulting from its non‐physical nature

    HTML Macros -- Easing the Construction and Maintenance of Web Texts

    Get PDF
    Authoring and maintaining large collections of Web texts is a cumbersome, error-prone and time-consuming business. Ongoing development of courseware for the High Performance Computing Consortium (HPCC) TLTP has only helped to emphasise these problems. Courseware requires the application of a coherent document layout (templates) for each page, and also the use of standard icons with a consistent functionality, in order to create a constant look and feel throughout the material. This provides the user with an environment where he or she can access new pages, and instantly recognise the format used, making the extraction of the information on the page much quicker, and less immediately confusing. This paper describes a system that was developed at UKC to provide a solution to the above problems via the introduction of HTML macros. These macros can be used to provide a standard document layout with a consistent look and feel, as well as tools to ease user navigation. The software is written in Perl, and achieves macro expansion and replacement using the Common Gateway Interface (CGI) and filtering the HTML source. Using macros in your HTML results in your document source code being shorter, more robust, and more powerful. Webs of documents can be built extremely fast and maintenance is made much simpler. Keywords: Authoring, Automation Tools, Perl filters for HTML, Teaching and learning on the We

    Balancing SoNaR: IPR versus Processing Issues in a 500-Million-Word Written Dutch Reference Corpus

    Get PDF
    In The Low Countries, a major reference corpus for written Dutch is beingbuilt. We discuss the interplay between data acquisition and data processingduring the creation of the SoNaR Corpus. Based on developments in traditionalcorpus compiling and new web harvesting approaches, SoNaR is designed tocontain 500 million words, balanced over 36 text types including bothtraditional and new media texts. Beside its balanced design, every text sampleincluded in SoNaR will have its IPR issues settled to the largest extentpossible. This data collection task presents many challenges because everydecision taken on the level of text acquisition has ramifications for the levelof processing and the general usability of the corpus. As far as thetraditional text types are concerned, each text brings its own processingrequirements and issues. For new media texts - SMS, chat - the problem is evenmore complex, issues such as anonimity, recognizability and citation right, allpresent problems that have to be tackled. The solutions actually lead to thecreation of two corpora: a gigaword SoNaR, IPR-cleared for research purposes,and the smaller - of commissioned size - more privacy compliant SoNaR,IPR-cleared for commercial purposes as well

    Towards a Framework for Developing Mobile Agents for Managing Distributed Information Resources

    No full text
    Distributed information management tools allow users to author, disseminate, discover and manage information within large-scale networked environments, such as the Internet. Agent technology provides the flexibility and scalability necessary to develop such distributed information management applications. We present a layered organisation that is shared by the specific applications that we build. Within this organisation we describe an architecture where mobile agents can move across distributed environments, integrate with local resources and other mobile agents, and communicate their results back to the user

    Industrial-Strength Documentation for ACL2

    Full text link
    The ACL2 theorem prover is a complex system. Its libraries are vast. Industrial verification efforts may extend this base with hundreds of thousands of lines of additional modeling tools, specifications, and proof scripts. High quality documentation is vital for teams that are working together on projects of this scale. We have developed XDOC, a flexible, scalable documentation tool for ACL2 that can incorporate the documentation for ACL2 itself, the Community Books, and an organization's internal formal verification projects, and which has many features that help to keep the resulting manuals up to date. Using this tool, we have produced a comprehensive, publicly available ACL2+Books Manual that brings better documentation to all ACL2 users. We have also developed an extended manual for use within Centaur Technology that extends the public manual to cover Centaur's internal books. We expect that other organizations using ACL2 will wish to develop similarly extended manuals.Comment: In Proceedings ACL2 2014, arXiv:1406.123

    Building application dependent hypertexts

    Get PDF
    The Konstanz Hypertext System offers a domain-specific developmental environment for the construction of large hypertexts. Through its flexibility, the structuring means employed in the Konstanz Hypertext System offers an instrument which permits one to respond directly to the demands relevant to specific applications in the construction of hypertexts. Especially the integration of information obtained from external resources is emphasized. After a discussion of the information sources which can be connected to the KHS a short introduction to the hypertext model of the KHS is provided. The role of structuring means in the integration of external information is pointed out. The scope of possible applications and the flexibility of the system are demonstrated by the following three comprehensive examples: resource discovery of online databases, management of electronic mail and the compilation of an issue of an electronic journal. (DIPF/Orig.

    Hypermedia Information Systems in Industry

    No full text
    The requirements for industrial strength hypermedia are well known. If hypermedia applications are to be used successfully in the industrial environment, then considerable effort is required to integrate them into with the organisation’s current business practices. This implies that any proposed model must be simple to maintain and implement, as well as bringing real benefits to the organisation as a whole. This article discusses the development of such a system, its implementation and evaluation to support manufacturing operations at Pirelli Cables, Eastleigh

    Digitometric Services for Open Archives Environments

    No full text
    We describe “digitometric” services and tools that add value to open-access eprint archives using the Open Archives Initiative (OAI) Protocol for Metadata Harvesting. Celestial is an OAI cache and gateway tool. Citebase Search enhances OAI-harvested metadata with linked references harvested from the full-text to provide a web service for citation navigation and research impact analysis. Digitometrics builds on data harvested using OAI to provide advanced visualisation and hypertext navigation for the research community. Together these services provide a modular, distributed architecture for building a “semantic web” for the research literature
    • 

    corecore