210 research outputs found

    Breadth-first serialisation of trees and rational languages

    Full text link
    We present here the notion of breadth-first signature and its relationship with numeration system theory. It is the serialisation into an infinite word of an ordered infinite tree of finite degree. We study which class of languages corresponds to which class of words and,more specifically, using a known construction from numeration system theory, we prove that the signature of rational languages are substitutive sequences.Comment: 15 page

    Regular Rooted Graph Grammars

    Get PDF
    In dieser Arbeit wir ein pragmatischer Ansatz zur Typisierung, statischen Analyse und Optimierung von Web-Anfragespachen, speziell Xcerpt, untersucht. Pragmatisch ist der Ansatz in dem Sinne, dass dem Benutzer keinerlei Einschränkungen aus Entscheidbarkeits- oder Effizienzgründen auf modellierbare Typen gestellt werden. Effizienz und Entscheidbarkeit werden stattdessen, falls nötig, durch Vergröberungen bei der Typprüfung erkauft. Eine Typsprache zur Typisierung von Graph-strukturierten Daten im Web wird eingeführt. Modellierbare Graphen sind so genannte gewurzelte Graphen, welche aus einem Spannbaum und Querreferenzen aufgebaut sind. Die Typsprache basiert auf reguläre Baum Grammatiken, welche um typisierte Referenzen erweitert wurde. Neben wie im Web mit XML üblichen geordneten strukturierten Daten, sind auch ungeordnete Daten, wie etwa in Xcerpt oder RDF üblich, modellierbar. Der dazu verwendete Ansatz---ungeordnete Interpretation Regulärer Ausdrücke---ist neu. Eine operationale Semantik für geordnete wie ungeordnete Typen wird auf Basis spezialisierter Baumautomaten und sog. Counting Constraints (welche wiederum auf presburgerarithmetische Ausdrücke) basieren. Es wird ferner statische Typ-Prüfung und -Inferenz von Xcerpt Anfrage- und Konstrukttermen, wie auch Optimierung von Xcerpt Anfragen auf Basis von Typinformation eingeführt.This thesis investigates a pragmatic approach to typing, static analysis and static optimization of Web query languages, in special the Web query language Xcerpt. The approach is pragmatic in the sense, that no restriction on the types are made for decidability or efficiency reasons, instead precision is given up if necessary. Pragmatics on the dynamic side means to use types not only to ensure validity of objects operating on, but also influencing query selection based on types. A typing language for typing of graph structured data on the Web is introduced. The Graphs in mind are based on spanning trees with references, the typing languages is based on regular tree grammars with typed reference extensions. Beside ordered data in the spirit of XML, unordered data (i.e. in the spirit of the Xcerpt data model or RDF) can be modelled using regular expressions under unordered interpretation – this approach is new. An operational semantics for ordered and unordered types is given based on specialized regular tree automata and counting constraints (them again based on Presburger arithmetic formulae). Static type checking of Xcerpt query and construct terms is introduced, as well as optimization of Xcerpt query terms based on schema information

    Automatic sequences: from rational bases to trees

    Full text link
    The nnth term of an automatic sequence is the output of a deterministic finite automaton fed with the representation of nn in a suitable numeration system. In this paper, instead of considering automatic sequences built on a numeration system with a regular numeration language, we consider these built on languages associated with trees having periodic labeled signatures and, in particular, rational base numeration systems. We obtain two main characterizations of these sequences. The first one is concerned with rr-block substitutions where rr morphisms are applied periodically. In particular, we provide examples of such sequences that are not morphic. The second characterization involves the factors, or subtrees of finite height, of the tree associated with the numeration system and decorated by the terms of the sequence.Comment: 25 pages, 15 figure

    Automatic sequences in rational base numeration systems (and even more)

    Full text link
    The nth term of an automatic sequence is the output of a deterministic finite automaton fed with the representation of n in a suitable numeration system. Here, instead of considering automatic sequences built on a numeration system with a regular numeration language, we consider these built on languages associated with trees having periodic labeled signatures and, in particular, rational base numeration systems. We obtain two main characterizations of these sequences. The first one is concerned with r-block substitutions where r morphisms are applied periodically. In particular, we provide examples of such sequences that are not morphic. The second characterization involves the factors, or subtrees of finite height, of the tree associated with the numeration system and decorated by the terms of the sequence

    Trust on the semantic web

    Get PDF
    The Semantic Web is a vision to create a “web of knowledge”; an extension of the Web as we know it which will create an information space which will be usable by machines in very rich ways. The technologies which make up the Semantic Web allow machines to reason across information gathered from the Web, presenting only relevant results and inferences to the user. Users of the Web in its current form assess the credibility of the information they gather in a number of different ways. If processing happens without the user being able to check the source and credibility of each piece of information used in the processing, the user must be able to trust that the machine has used trustworthy information at each step of the processing. The machine should therefore be able to automatically assess the credibility of each piece of information it gathers from the Web. A case study on advanced checks for website credibility is presented, and the site presented in the case presented is found to be credible, despite failing many of the checks which are presented. A website with a backend based on RDF technologies is constructed. A better understanding of RDF technologies and good knowledge of the RAP and Redland RDF application frameworks is gained. The second aim of constructing the website was to gather information to be used for testing various trust metrics. The website did not gain widespread support, and therefore not enough data was gathered for this. Techniques for presenting RDF data to users were also developed during website development, and these are discussed. Experiences in gathering RDF data are presented next. A scutter was successfully developed, and the data smushed to create a database where uniquely identifiable objects were linked, even where gathered from different sources. Finally, the use of digital signature as a means of linking an author and content produced by that author is presented. RDF/XML canonicalisation is discussed in the provision of ideal cryptographic checking of RDF graphs, rather than simply checking at the document level. The notion of canonicalisation on the semantic, structural and syntactic levels is proposed. A combination of an existing canonicalisation algorithm and a restricted RDF/XML dialect is presented as a solution to the RDF/XML canonicalisation problem. We conclude that a trusted Semantic Web is possible, with buy in from publishing and consuming parties

    Robert Louis Stevenson’s South Seas Writing: Its Production and Context within the Victorian Study of Culture

    Get PDF
    The thesis is split into two parts. The first part investigates the production, reception, and reconstruction of Robert Louis Stevenson's South Seas writing during his travels there in the years 1888-91, and its subsequent publication as In the South Seas in 1896. The second part concentrates on the findings that Stevenson makes during his Pacific travels in respect of the discourses of culture that shape the academic sciences of his day. The writing that Stevenson produces for his 'Big Book' on the cultures of the Pacific Islands is among the least-examined of all his works. His book of travel, In the South Seas, is published posthumously and contains material that is presented in a way that is not intended by the author. In the present study the reasons for this situation are investigated, and the work that the author intends to produce is recovered on the basis of the plans, notes, and photographs which remain from the period of his travels. The reconstructed work is then compared with the published volume of In the South Seas to show the extent to which textual mutilation and re-editing has significantly altered the meaning of Stevenson's original writing. In the South Seas is a text that is re-shaped by his editor in order to satisfy what is seen as a Victorian readership's desire for sentimental voyaging. Stevenson's writing is then read within the context of the Victorian study of culture as represented by Edward Burnett Tylor, to show his engagement with and criticism of some of Tylor's theories. Finally, the South Seas writing is framed within the perspective of Stevenson's reading of GWF Hegel, showing the extent to which his observations on Pacific landscapes and cultures are informed by Hegel's discussion of the antinomies of Immanuel Kant

    Engineering a semantic web trust infrastructure

    No full text
    The ability to judge the trustworthiness of information is an important and challenging problem in the field of Semantic Web research. In this thesis, we take an end-to-end look at the challenges posed by trust on the Semantic Web, and present contributions in three areas: a Semantic Web identity vocabulary, a system for bootstrapping trust environments, and a framework for trust aware information management. Typically Semantic Web agents, which consume and produce information, are not described with sufficient information to permit those interacting with them to make good judgements of trustworthiness. A descriptive vocabulary for agent identity is required to enable effective inter agent discourse, and the growth of trust and reputation within the Semantic Web; we therefore present such a foundational identity ontology for describing web-based agents.It is anticipated that the Semantic Web will suffer from a trust network bootstrapping problem. In this thesis, we propose a novel approach which harnesses open data to bootstrap trust in new trust environments. This approach brings together public records published by a range of trusted institutions in order to encourage trust in identities within new environments. Information integrity and provenance are both critical prerequisites for well-founded judgements of information trustworthiness. We propose a modification to the RDF Named Graph data model in order to address serious representational limitations with the named graph proposal, which affect the ability to cleanly represent claims and provenance records. Next, we propose a novel graph based approach for recording the provenance of derived information. This approach offers computational and memory savings while maintaining the ability to answer graph-level provenance questions. In addition, it allows new optimisations such as strategies to avoid needless repeat computation, and a delta-based storage strategy which avoids data duplication.<br/

    Interactive global illumination on the CPU

    Get PDF
    Computing realistic physically-based global illumination in real-time remains one of the major goals in the fields of rendering and visualisation; one that has not yet been achieved due to its inherent computational complexity. This thesis focuses on CPU-based interactive global illumination approaches with an aim to develop generalisable hardware-agnostic algorithms. Interactive ray tracing is reliant on spatial and cache coherency to achieve interactive rates which conflicts with needs of global illumination solutions which require a large number of incoherent secondary rays to be computed. Methods that reduce the total number of rays that need to be processed, such as Selective rendering, were investigated to determine how best they can be utilised. The impact that selective rendering has on interactive ray tracing was analysed and quantified and two novel global illumination algorithms were developed, with the structured methodology used presented as a framework. Adaptive Inter- leaved Sampling, is a generalisable approach that combines interleaved sampling with an adaptive approach, which uses efficient component-specific adaptive guidance methods to drive the computation. Results of up to 11 frames per second were demonstrated for multiple components including participating media. Temporal Instant Caching, is a caching scheme for accelerating the computation of diffuse interreflections to interactive rates. This approach achieved frame rates exceeding 9 frames per second for the majority of scenes. Validation of the results for both approaches showed little perceptual difference when comparing against a gold-standard path-traced image. Further research into caching led to the development of a new wait-free data access control mechanism for sharing the irradiance cache among multiple rendering threads on a shared memory parallel system. By not serialising accesses to the shared data structure the irradiance values were shared among all the threads without any overhead or contention, when reading and writing simultaneously. This new approach achieved efficiencies between 77% and 92% for 8 threads when calculating static images and animations. This work demonstrates that, due to the flexibility of the CPU, CPU-based algorithms remain a valid and competitive choice for achieving global illumination interactively, and an alternative to the generally brute-force GPU-centric algorithms

    Terminological Methods in Lexicography: Conceptualising, Organising, and Encoding Terms in General Language Dictionaries

    Get PDF
    Os dicionários de língua geral apresentam inconsistências de uniformização e cientificidade no tratamento do conteúdo lexicográfico especializado. Analisando a presença e o tratamento de termos em dicionários de língua geral, propomos um tratamento mais uniforme e cientificamente rigoroso desse conteúdo, considerando também a necessidade de compilar e alinhar futuros recursos lexicais em consonância com padrões interoperáveis. Partimos da premissa de que o tratamento dos itens lexicais, sejam unidades lexicais (palavras em geral) ou unidades terminológicas (termos ou palavras pertencentes a determinados domínios), deve ser diferenciado, e recorremos a métodos terminológicos para tratar os termos dicionarizados. A nossa abordagem assume que a terminologia – na sua dupla dimensão linguística e conceptual – e a lexicografia, como domínios interdisciplinares, podem ser complementares. Assim, apresentamos objetivos teóricos (aperfeiçoamento da metalinguagem e descrição lexicográfica a partir de pressupostos terminológicos) e práticos (representação consistente de dados lexicográficos), que visam facilitar a organização, descrição e modelização consistente de componentes lexicográficos, nomeadamente a hierarquização das etiquetas de domínio, que são marcadores de identificação de léxico especializados. Queremos ainda facilitar a redação de definições, as quais podem ser otimizadas e elaboradas com maior precisão científica ao seguir uma abordagem terminológica no tratamento dos termos. Analisámos os dicionários desenvolvidos por três instituições académicas distintas: a Academia das Ciências de Lisboa, a Real Academia Española e a Académie Française, que representam um valioso legado da tradição lexicográfica académica europeia. A análise inicial inclui um levantamento exaustivo e a comparação das etiquetas de domínio usadas, bem como um debate sobre as opções escolhidas e um estudo comparativo do tratamento dos termos. Elaborámos, depois, uma proposta metodológica para o tratamento de termos em dicionários de língua geral, tomando como exemplo dois domínios, GEOLOGIA e FUTEBOL, extraídos da edição de 2001 do dicionário da Academia das Ciências de Lisboa. Revimos os termos selecionados de acordo com os princípios terminológicos defendidos, dando origem a sentidos especializados revistos/novos para a primeira edição digital deste dicionário. Representamos e anotamos os dados usando as especificações da TEI Lex-0, uma extensão da TEI (Text Encoding Initiative), dedicada à codificação de dados lexicográficos. Destacamos também a importância de ter etiquetas de domínio hierárquicas em vez de uma lista simples de domínios, vantajosas para a organização dos dados, correspondência e possíveis futuros alinhamentos entre diferentes recursos lexicográficos. A investigação revelou que a) os modelos estruturais dos recursos lexicais são complexos e contêm informação de natureza diversa; b) as etiquetas de domínio nos dicionários gerais da língua são planas, desequilibradas, inconsistentes e, muitas vezes, estão desatualizadas, havendo necessidade de as hierarquizar para organizar o conhecimento especializado; c) os critérios adotados para a marcação dos termos e as fórmulas utilizadas na definição são díspares; d) o tratamento dos termos é heterogéneo e formulado de diferentes formas, pelo que o recurso a métodos terminológicos podem ajudar os lexicógrafos a redigir definições; e) a aplicação de métodos terminológicos e lexicográficos interdisciplinares, e também de padrões, é vantajosa porque permite a construção de bases de dados lexicais estruturadas, concetualmente organizadas, apuradas do ponto de vista linguístico e interoperáveis. Em suma, procuramos contribuir para a questão urgente de resolver problemas que afetam a partilha, o alinhamento e vinculação de dados lexicográficos.General language dictionaries show inconsistencies in terms of uniformity and scientificity in the treatment of specialised lexicographic content. By analysing the presence and treatment of terms in general language dictionaries, we propose a more uniform and scientifically rigorous treatment of this content, considering the necessity of compiling and aligning future lexical resources according to interoperable standards. We begin from the premise that the treatment of lexical items, whether lexical units (words in general) or terminological units (terms or words belonging to particular subject fields), must be differentiated, and resort to terminological methods to treat dictionary terms. Our approach assumes that terminology – in its dual dimension, both linguistic and conceptual – and lexicography, as interdisciplinary domains, can be complementary. Thus, we present theoretical (improvement of metalanguage and lexicographic description based on terminological assumptions) and practical (consistent representation of lexicographic data) objectives that aim to facilitate the organisation, description and consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are specialised lexicon identification markers. We also want to facilitate the drafting of definitions, which can be optimised and elaborated with greater scientific precision by following a terminological approach for the treatment of terms. We analysed the dictionaries developed by three different academic institutions: the Academia das Ciências de Lisboa, the Real Academia Española and the Académie Française, which represent a valuable legacy of the European academic lexicographic tradition. The initial analysis includes an exhaustive survey and comparison of the domain labels used, as well as a debate on the chosen options and a comparative study of the treatment of the terms. We then developed a methodological proposal for the treatment of terms in general language dictionaries, exemplified using terms from two domains, GEOLOGY and FOOTBALL, taken from the 2001 edition of the dictionary of the Academia das Ciências de Lisboa. We revised the selected terms according to the defended terminological principles, giving rise to revised/new specialised meanings for the first digital edition of this dictionary. We represent and annotate the data using the TEI Lex-0 specifications, a TEI (Text Encoding Initiative) subset for encoding lexicographic data. We also highlight the importance of having hierarchical domain labels instead of a simple list of domains, which are beneficial to the data organisation itself, correspondence and possible future alignments between different lexicographic resources. Our investigation revealed the following: a) structural models of lexical resources are complex and contain information of a different nature; b) domain labels in general language dictionaries are flat, unbalanced, inconsistent and often outdated, requiring the need to hierarchise them for organising specialised knowledge; c) the criteria adopted for marking terms and the formulae used in the definition are disparate; d) the treatment of terms is heterogeneous and formulated differently, whereby terminological methods can help lexicographers to draft definitions; e) the application of interdisciplinary terminological and lexicographic methods, and of standards, is advantageous because it allows the construction of structured, conceptually organised, linguistically accurate and interoperable lexical databases. In short, we seek to contribute to the urgent issue of solving problems that affect the sharing, alignment and linking of lexicographic data
    corecore