2,669 research outputs found

    SWI-Prolog and the Web

    Get PDF
    Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

    Multiple hierarchies : new aspects of an old solution

    Get PDF
    In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications

    Development of Use Cases, Part I

    Get PDF
    For determining requirements and constructs appropriate for a Web query language, or in fact any language, use cases are of essence. The W3C has published two sets of use cases for XML and RDF query languages. In this article, solutions for these use cases are presented using Xcerpt. a novel Web and Semantic Web query language that combines access to standard Web data such as XML documents with access to Semantic Web metadata such as RDF resource descriptions with reasoning abilities and rules familiar from logicprogramming. To the best knowledge of the authors, this is the first in depth study of how to solve use cases for accessing XML and RDF in a single language: Integrated access to data and metadata has been recognized by industry and academia as one of the key challenges in data processing for the next decade. This article is a contribution towards addressing this challenge by demonstrating along practical and recognized use cases the usefulness of reasoning abilities, rules, and semistructured query languages for accessing both data (XML) and metadata (RDF)

    Recent development in XML-IR

    Get PDF
    The Web is characterized by a huge amount of heterogeneous data sources, which have different media support and format representation. Because XML can represent files of different formats, it can play an important role in IR since it is becoming a standard form for data representation and exchange over the Web. Under this assumption, the problem of querying heterogeneous sources can be reduced to the problem of querying XML data sources. This paper shows the influence of XML on the IR techniques and methodologies during the last five years through serving over 400 papers published in different conferences and journals

    Non-hierarchical Structures: How to Model and Index Overlaps?

    Full text link
    Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.Comment: The paper has been accepted at the Balisage 2014 conferenc

    Description of the LTG system used for MUC-7

    Get PDF
    The basic building blocks in our muc system are reusable text handling tools which wehave been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tooldoesavery speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an o cial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. Wewere involved in the developmentofthexml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously adde

    ATLAS: A flexible and extensible architecture for linguistic annotation

    Full text link
    We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

    EDI - XML Standards and Technologies in the Agri-Food Industry

    Get PDF
    Due to globalisation, the new technological developments and the complexity of food supply processes, the European food sector is increasingly becoming more complex. The consumers’ trust in food, triggered and affected by a number of food crises, is low. Today, consumers increasingly expect safe and high quality food and demand information about the origin of their food. Also, the economic health of the food industry can be greatly affected by food crises; therefore, efficient and effective mechanisms are required to assist the food industry in tracking and tracing products along the food chain. In this paper, we discuss the criteria for an efficient and effective traceability system from an IT perspective (mainly data exchange) and we identify key requirements for ICT enabled traceability

    The multimedia documentation of endangered and minority languages : a thesis presented in partial fulfilment of the requirements for the degree of Master of Philosophy in Linguistics at Massey University

    Get PDF
    This thesis examines the impending loss of linguistic diversity in the world and advocates a change in emphasis in linguistic research towards the documentation of minority and endangered languages. Various models for documentation are examined, along with some of the ethical issues involved in linguistic research amongst small groups, and a new model is proposed. The new model is centred around the collection of a wide variety of high-quality data, but includes the collection of other related materials that will be of particular use and interest to the ethnic community. The collected data and other materials are then structured as an internet-ready multimedia documentation designed for use by the ethnic community as primary audience, while still catering for the needs of linguistic researchers worldwide. A pilot project is carried out using the model
    • 

    corecore