36 research outputs found

    Approximate text generation from non-hierarchical representations in a declarative framework

    Get PDF
    This thesis is on Natural Language Generation. It describes a linguistic realisation system that translates the semantic information encoded in a conceptual graph into an English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem where one is not pre-committed to a choice of the syntactically prominent elements in the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation — we use D-Tree Grammars which stem from work on Tree-Adjoining Grammars. The declarative specification of the mapping between semantics and syntax allows for different processing strategies to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful computations to be reused in other branches of the search space. Having a generator with increased paraphrasing power as a consequence of using non-hierarchical input and approximate matching raises the issue whether certain 'better' paraphrases can be generated before others. We investigate preference-based processing in the context of generation

    Optimization of feature learning through grammar-guided genetic programming

    Get PDF
    Tese de Mestrado, Ciência de Dados, 2022, Universidade de Lisboa, Faculdade de CiênciasMachine Learning (ML) is becoming more prominent in daily life. A key aspect in ML is Feature Engineering (FE), which can entail a long and tedious process. Therefore, the automation of FE, known as Feature Learning (FL), can be highly rewarding. FL methods need not only have high prediction performance, but should also produce interpretable methods. Many current high-performance ML methods that can be considered FL methods, such as Neural Networks and PCA, lack interpretability. A popular ML used for FL that produces interpretable models is Genetic Programming (GP), with multiple successful applications and methods like M3GP. In this thesis, I present two new GP-based FL methods, namely M3GP with Domain Knowledge (DK-M3GP) and DK-M3GP with feature Aggregation (DKA-M3GP). Both use grammars to enhance the search process of GP, in a method called GrammarGuided GP (GGGP). DK-M3GP uses grammars to incorporate domain knowledge in the search process. In particular, I use DK-M3GP to define what solutions are humanly valid, in this case by disallowing operating arithmetically on categorical features. For example, the multiplication of the postal code of an individual with their wage is not deemed sensible and thus disallowed. In DKA-M3GP, I use grammars to include a feature aggregation method in the search space. This method can be used for time series and panel datasets, to aggregate the target value of historic data based on a known feature value of a new data point. For example, if I want to predict the number of bikes seen daily in a city, it is interesting to know how many were seen on average in the last week. Furthermore, DKA-M3GP allows for filtering the aggregation based on some other feature value. For example, we can include the average number of bikes seen on past Sundays. I evaluated my FL methods for two ML problems in two environments. First, I evaluate the independent FL process, and, after that, I evaluate the FL steps within four ML pipelines. Independently, DK-M3GP shows a two-fold advantage over normal M3GP; better interpretability in general, and higher prediction performance for one problem. DKA-M3GP has a much better prediction performance than M3GP for one problem, and a slightly better one for the other. Furthermore, within the ML pipelines it performed well in one of two problems. Overall, my methods show potential for FL. Both methods are implemented in Genetic Engine an individual-representation-independent GGGP framework, created as part of this thesis. Genetic Engine is completely implemented in Python and shows competing performance with the mature GGGP framework PonyGE2.A Inteligência Artificial (IA) e o seu subconjunto de Aprendizagem Automática (AA) estão a tornarse mais importantes para nossas vidas a cada dia que passa. Ambas as áreas estão presentes no nosso dia a dia em diversas aplicações como o reconhecimento automático de voz, os carros autónomos, ou o reconhecimento de imagens e deteção de objetos. A AA foi aplicada com sucesso em muitas áreas, como saúde, finanças e marketing. Num contexto supervisionado, os modelos de AA são treinados com dados e, posteriormente, são usados para prever o comportamento de dados futuros. A combinação de etapas realizadas para construir um modelo de AA, totalmente treinado e avaliado, é chamada um AA pipeline, ou simplesmente pipeline. Todos os pipelines seguem etapas obrigatórias, nomeadamente a recuperação, limpeza e manipulação dos dados, a seleção e construção de features, a seleção do modelo e a otimização dos seus parâmetros, finalmente, a avaliação do modelo. A construção de AA pipelines é uma tarefa desafiante, com especificidades que dependem do domínio do problema. Existem desafios do lado do design, otimização de hiperparâmetros, assim como no lado da implementação. No desenho de pipelines, as escolhas devem ser feitas em relação aos componentes a utilizar e à sua ordem. Mesmo para especialistas em AA, desenhar pipelines é uma tarefa entediante . As escolhas de design exigem experiência em AA e um conhecimento do domínio do problema, o que torna a construção do pipeline num processo intensivo de recursos. Após o desenho do pipeline, os parâmetros do mesmo devem ser otimizados para melhorar o seu desempenho. A otimização de parâmetros, geralmente, requer a execução e avaliação sequencial do pipeline, envolvendo altos custos. No lado da implementação, os programadores podem introduzir bugs durante o processo de desenvolvimento. Esses bugs podem levar à perda de tempo e dinheiro para serem corrigidos, e, se não forem detectados, podem comprometer a robustez e correção do modelo ou introduzir problemas de desempenho. Para contornar esses problemas de design e implementação, surgiu uma nova linha de investigação designada por AutoML (Automated Machine Learning). AutoML visa automatizar o desenho de AA pipelines, a otimização de parâmetros, e a sua implementação. Uma parte importante dos pipelines de AA é a maneira como os features dos dados são manipulados. A manipulação de dados tem muitos aspetos, reunidos sob o termo genérico Feature Engineering (FE). Em suma, FE visa melhorar a qualidade do espaço de solução selecionando as features mais importantes e construindo novas features relevantes. Contudo, este é um processo que consome muitos recursos, pelo que a sua automação é uma sub-área altamente recompensadora de AutoML. Nesta tese, defino Feature Learning (FL) como a área de FE automatizado. Uma métrica importante de FE e, portanto, de FL, é a interpretabilidade das features aprendidas. Interpretabilidade, que se enquadra na área de Explainable IA (XIA), refere-se à facilidade de entender o significado de uma feature. A ocorrência de diversos escândalos em IA, como modelos racistas e sexistas, levaram a União Europeia a propor legislação sobre modelos sem interpretabilidade. Muitos métodos clássicos, e portanto amplamente usados, carecem de interpretabilidade, dando origem ao interesse recémdescoberto em XIA. A atual investigação em FL trata os valores de features existentes sem os relacionar com o seu significado semântico. Por exemplo, engenharia de uma feature que representa a multiplicação do código postal com a idade de uma pessoa não é um uso lógico do código postal. Embora os códigos postais possam ser representados como números inteiros, eles devem ser tratados como valores categóricos. A prevenção deste tipo de interações entre features, melhora o desempenho do pipeline, uma vez que reduz o espaço de procura de possíveis features ficando apenas com as que fazem semanticamente sentido. Além disso, este processo resulta em features que são intrinsecamente interpretáveis. Deste modo, o conhecimento sobre o domínio do problema, impede a engenharia de features sem significado durante o processo de FE.. Outro aspecto de FL normalmente não considerado nos métodos existentes, é a agregação de valores de uma única feature por várias entidades de dados. Por exemplo, vamos considerar um conjunto de dados sobre fraude de cartão de crédito. A quantidade média de transações anteriores de um cartão é potencialmente uma feature interessante para incluir, pois transmite o significado de uma transação ’normal’. No entanto, isso geralmente não é diretamente inferível nos métodos de FL existentes. Refirome a este método de FL como agregação de entidades, ou simplesmente agregação. Por fim, apesar da natureza imprevisível dos conjuntos de dados da vida real, os métodos existentes exigem principalmente features que tenham dados homogêneos. Isso exige que os cientistas de dados realizem um pré-processamento do conjunto de dados. Muitas vezes, isso requer transformar categorias em números inteiros ou algum tipo de codificação, como por exemplo one-hot encoding. Contudo, conforme discutido acima, isso pode reduzir a interpretabilidade e o desempenho do pipeline. A Programação Genética (GP), um método de ML, é também usado para FL e permite a criação de modelos mais interpretáveis que a maioria dos métodos tradicionais. GP é um método baseado em procura que evolui programas ou, no caso de FL, mapeamentos entre apresentas de espaços. Os métodos de FL baseados em GP existentes não incorporam os três aspectos acima mencionados: o conhecimento do domínio, a agregação e a conformidade com tipos de dados heterogêneos. Algumas abordagens incorporam algumas partes desses aspetos, principalmente usando gramáticas para orientar o processo de procura. O objetivo deste trabalho é explorar se a GP consegue usar gramáticas para melhorar a qualidade da FL, quer em termos de desempenho preditivo ou de interpretabilidade. Primeiro, construímos o Genetic Engine, uma framework de GP guiada por gramática (Grammar-Guided GP (GGGP)). O Genetic Engine é uma framework de GGGP fácil de usar que permite expressar gramáticas complexas. Mostramos que o Genetic Engine tem um bom desempenho quando comparado com a framework de Python do estado da arte, PonyGE2. Em segundo lugar, proponho dois novos métodos de FL baseados em GGGP implementados no Genetic Engine. Ambos os métodos estendem o M3GP, o método FL do estado da arte baseado em GP. A primeira incorpora o conhecimento do domínio, denominado M3GP com conhecimento do domínio (M3GP with Domain Knowledge (DK-M3GP)). O primeiro método restringe o comportamento das features permitindo apenas interações sensatas, por meio de condições e declarações. O segundo método estende X DK-M3GP, introduzindo agregação no espaço de procura, e é denominado DK-M3GP com Agregação (DK-M3GP with Aggregation (DKA-M3GP)). O DKA-M3GP usa totalmente a facilidade de implementação do Genetic Engine, pois requer a implementação de uma gramática complexa. Neste trabalho, o DK-M3GP e DKA-M3GP foram avaliados em comparação com o GP Tradicional, M3GP e numerosos métodos clássicos de FL em dois problemas de ML. As novas abordagens foram avaliadas assumindo que são métodos autônomos de FL e fazendo parte de uma pipeline maior. Como métodos FL independentes, ambos os métodos demonstram boa previsão de desempenho em pelo menos um dos dois problemas. Como parte da pipeline, os métodos apresentam pouca vantagem em relação aos métodos clássicos no seu desempenho de previsão. Após a análise dos resultados, uma possível explicação encontra-se no overfitting dos métodos FL para a função de fitness e no conjunto de dados de treino. O Neste trabalho, discuto também a melhoria na interpretabilidade após incorporar conhecimento do domínio no processo de procura. Uma avaliação preliminar do DK-M3GP indica que, utilizando a medida de complexidade Expression Size (ES), é possível obter uma melhoria na interpretabilidade. Todavia, verifiquei também que a medida de complexidade utilizada pode não ser a mais adequada devido a estrutura de características em forma de árvore das características construídas por DK-M3GP que potencia um ES. Considero que um método de avaliação de interpretabilidade mais complexo deve apontar isso

    XMG : eXtensible MetaGrammar

    Get PDF
    International audienceIn this article, we introduce eXtensible MetaGrammar (xmg), a framework for specifying tree-based grammars such as Feature-Based Lexicalised Tree-Adjoining Grammars (FB-LTAG) and Interaction Grammars (IG). We argue that xmg displays three features which facilitate both grammar writing and a fast prototyping of tree-based grammars. Firstly, \xmg\ is fully declarative. For instance, it permits a declarative treatment of diathesis that markedly departs from the procedural lexical rules often used to specify tree-based grammars. Secondly, the \xmg\ language has a high notational expressivity in that it supports multiple linguistic dimensions, inheritance and a sophisticated treatment of identifiers. Thirdly, xmg is extensible in that its computational architecture facilitates the extension to other linguistic formalisms. We explain how this architecture naturally supports the design of three linguistic formalisms namely, FB-LTAG, IG, and Multi-Component Tree-Adjoining Grammar (MC-TAG). We further show how it permits a straightforward integration of additional mechanisms such as linguistic and formal principles. To further illustrate the declarativity, notational expressivity and extensibility of \xmg , we describe the methodology used to specify an FB-LTAG for French augmented with a unification-based compositional semantics. This illustrates both how xmg facilitates the modelling of the tree fragment hierarchies required to specify tree-based grammars and of a syntax/semantics interface between semantic representations and syntactic trees. Finally, we briefly report on several grammars for French, English and German that were implemented using \xmg\ and compare \xmg\ to other existing grammar specification frameworks for tree-based grammars

    Entwurf und Implementation einer auf Graph-Grammatiken beruhenden Sprache zur Funktions-Struktur-Modellierung von Pflanzen

    Get PDF
    Increasing biological knowledge requires more and more elaborate methods to translate the knowledge into executable model descriptions, and increasing computational power allows to actually execute these descriptions. Such a simulation helps to validate, extend and question the knowledge. For plant modelling, the well-established formal description language of Lindenmayer systems reaches its limits as a method to concisely represent current knowledge and to conveniently assist in current research. On one hand, it is well-suited to represent structural and geometric aspects of plant models - of which units is a plant composed, how are these connected, what is their location in 3D space -, but on the other hand, its usage to describe functional aspects - what internal processes take place in the plant structure, how does this interact with the structure - is not as convenient as desirable. This can be traced back to the underlying representation of structure as a linear chain of units, while the intrinsic nature of the structure is a tree or even a graph. Therefore, we propose to use graphs and graph grammars as a basis for plant modelling which combines structural and functional aspects. In the first part of this thesis, we develop the necessary theoretical framework. Starting with a presentation of the state of the art concerning Lindenmayer systems and graph grammars, we develop the formalism of relational growth grammars as a variant of graph grammars. We show that this formalism has a natural embedding of Lindenmayer systems which keeps all relevant properties, but represents branched structures directly as axial trees and not as linear chains with indirect encoding of branches. In the second part, we develop the main practical result, the XL programming language as an extension of the Java programming language by very general rule-based features. Short examples illustrate the application of the new language features. We describe the built-in pattern matching algorithm of the implemented run-time system for the XL programming language, and we sketch a possible implementation of an XL compiler. The third part is an application of relational growth grammars and the XL programming language. We show how the general XL interfaces can be customized for relational growth grammars. On top of this customization, several examples from a variety of disciplines demonstrate the usefulness of the developed formalism and language to describe plant growth, especially functional-structural plant models, but also artificial life, architecture or interactive games. Some examples operate on custom graphs like XML DOM trees or scene graphs of commercial 3D modellers, while the majority uses the 3D modelling platform GroIMP, a software developed in conjunction with this thesis. The appendix gives an overview of the GroIMP software. The practical usage of its plug-in for relational growth grammars is also illustrated.Das zunehmende Wissen über biologische Prozesse verlangt nach geeigneten Methoden, es in ausführbare Modelle zu übersetzen, und die zunehmende Rechenleistung der Computer ermöglicht es, diese Modelle auch tatsächlich auszuführen. Solche Simulationen dienen zur Validierung, Erweiterung und Hinterfragung des Wissens. Speziell für die Pflanzenmodellierung wurden Lindenmayer-Systeme mit Erfolg eingesetzt, jedoch stoßen diese bei aktuellen Modellierungsproblemen und Forschungsvorhaben an ihre Grenzen. Zwar sind sie gut geeignet, Pflanzenstruktur und Geometrie abzubilden - aus welchen Einheiten setzt sich eine Pflanze zusammen, wie sind diese verbunden, wie ist ihre räumliche Lage -, aber die lineare Datenstruktur erschwert die Integration von Funktionsmodellen, welche Prozesse innerhalb der verzweigten Struktur und des beanspruchten Raumes beschreiben. Daher wird in dieser Arbeit vorgeschlagen, anstelle der linearen Stuktur Graphen und Graph-Grammatiken als Grundlage für die kombinierte Funktions-Struktur-Modellierung von Pflanzen zu verwenden. Im ersten Teil der Dissertation wird der theoretische Unterbau entwickelt. Nach einer Vorstellung des aktuellen Wissensstandes auf dem Gebiet der Lindenmayer-Systeme und Graph-Grammatiken werden relationale Wachstumsgrammatiken eingeführt, die auf bekannten Mechanismen für parallele Graph-Grammatiken aufbauen und Lindenmayer-Systeme als Spezialfall enthalten, dabei jedoch verzweigte Strukturen direkt als axiale Bäume darstellen. Zur praktischen Anwendung wird im zweiten Teil die Programmiersprache XL entwickelt, die Java um allgemein gehaltene Sprachkonstrukte für Graph-Grammatiken erweitert. Kurze Beispiele zeigen die Anwendung der neuen Sprachmerkmale. Der Algorithmus zur Mustersuche wird erläutert, und die Implementation des XL-Compilers wird vorgestellt. Im dritten Teil werden mögliche Anwendungen relationaler Wachstumsgrammatiken aufgezeigt. Dazu werden zunächst die allgemeinen XL-Schnittstellen für relationale Wachstumsgrammatiken konkretisiert, um dieses System dann für Modelle aus verschiedenen Bereichen zu nutzen, darunter Funktions-Struktur-Modelle von Pflanzen, Künstliches Leben, Architektur und interaktive Spiele. Einige Beispiele nutzen spezifische Graphen wie XML-DOM-Bäume oder Szenengraphen kommerzieller 3D-Modellierprogramme, aber der überwiegende Teil baut auf der 3D-Plattform GroIMP auf, die zusammen mit dieser Dissertation entwickelt wurde. Im Anhang wird die Software GroIMP kurz vorgestellt und ihre praktische Anwendung für relationale Wachstumsgrammatiken erläutert

    An incremental clustering and associative learning architecture for intelligent robotics

    Get PDF
    The ability to learn from the environment and memorise the acquired knowledge is essential for robots to become autonomous and versatile artificial companions. This thesis proposes a novel learning and memory architecture for robots, which performs associative learning and recall of sensory and actuator patterns. The approach avoids the inclusion of task-specific expert knowledge and can deal with any kind of multi-dimensional real-valued data, apart from being tolerant to noise and supporting incremental learning. The proposed architecture integrates two machine learning methods: a topology learning algorithm that performs incremental clustering, and an associative memory model that learns relationship information based on the co-occurrence of inputs. The evaluations of both the topology learning algorithm and the associative memory model involved the memorisation of high-dimensional visual data as well as the association of symbolic data, presented simultaneously and sequentially. Moreover, the document analyses the results of two experiments in which the entire architecture was evaluated regarding its associative and incremental learning capabilities. One experiment comprised an incremental learning task with visual patterns and text labels, which was performed both in a simulated scenario and with a real robot. In a second experiment a robot learned to recognise visual patterns in the form of road signs and associated them with di erent con gurations of its arm joints. The thesis also discusses several learning-related aspects of the architecture and highlights strengths and weaknesses of the proposed approach. The developed architecture and corresponding ndings contribute to the domains of machine learning and intelligent robotics

    Space station data system analysis/architecture study. Task 2: Options development DR-5. Volume 1: Technology options

    Get PDF
    The second task in the Space Station Data System (SSDS) Analysis/Architecture Study is the development of an information base that will support the conduct of trade studies and provide sufficient data to make key design/programmatic decisions. This volume identifies the preferred options in the technology category and characterizes these options with respect to performance attributes, constraints, cost, and risk. The technology category includes advanced materials, processes, and techniques that can be used to enhance the implementation of SSDS design structures. The specific areas discussed are mass storage, including space and round on-line storage and off-line storage; man/machine interface; data processing hardware, including flight computers and advanced/fault tolerant computer architectures; and software, including data compression algorithms, on-board high level languages, and software tools. Also discussed are artificial intelligence applications and hard-wire communications

    Interpretación tabular de autómatas para lenguajes de adjunción de árboles

    Get PDF
    [Resumen] Las gramáticas de adjunción de árboles son una extensión de las gramáticas independientes del contexto que utilizan árboles en vez de producciones como estructuras elementales y que resultan adecuadas para la descripción de la mayor parte de las construcciones sintácticas presentes en el lenguaje natural. Los lenguajes generados por esta clase de gramáticas se denominan lenguajes de adjunción de árboles y son equivalentes a los lenguajes generados por las gramáticas lineales de índices y otros formalismos suavemente dependientes del contexto. En la primera parte de esta memoria se presenta el problema del análisis sintáctico de los lenguajes de adjunción de árboles. Para ello, se establece un camino evolutivo continuo en el que se sitúan los algoritmos de análisis sintáctico que incorporan las estrategias de análisis más importantes, tanto para el caso de las gramáticas de adjunción de árboles como para el caso de las gramáticas lineales de índices. En la segunda parte se definen diferentes modelos de autómata que aceptan exactamente los lenguajes de adjunción de árboles y se proponen técnicas que permiten su ejecución eficiente. La utilización de autómatas para realizar el análisis sintáctico es interesante porque permite separar el problema de la definición de un algoritmo de análisis sintáctico del problema de la ejecución del mismo, al tiempo que simplifica las pruebas de corrección. Concretamente, hemos estudiado los siguientes modelos de autómata: • Los autómatas a pila embebidos descendentes y ascendentes, dos extensiones de ^ los autómatas a pila que utilizan como estructura de almacenamiento una pila de pilas. Hemos definido nuevas versiones de estos autómatas en las cuales se simplifica la forma de las transiciones y se elimina el control de estado finito, manteniendo la potencia expresiva. • La restricción de los autómatas lógicos a pila para adaptarlos al reconocimiento de las gramáticas lineales de índices, obteniéndose diferentes tipos de autómatas especializados en diversas estrategias de análisis según el conjunto de transiciones permitido. • Los autómatas lineales de índices, tanto los orientados a la derecha, adecuados para estrategias en las cuales las adjunciones se reconocen de manera ascendente, los orientados a la izquierda, aptos para estrategias de análisis en las que las adjunciones se tratan de forma descendente, como los fuertemente dirigidos, capaces de incorporar estrategias de análisis en las cuales las adjunciones se tratan de manera ascendente y/o descendente. • Los autómatas con dos pilas, una extensión de los autómatas a pila que trabaja con una pila maestra encargada de dirigir el proceso de análisis y una pila auxiliar que restringe las transiciones aplicables en un momento dado. Hemos descrito dos versiones diferentes de este tipo de autómatas, los autómatas con dos pilas fuertemente dirigidos, aptos para describir estrategias de análisis arbitrarias, y los autómatas con dos pilas ascendentes, adecuados para describir estrategias de análisis en las cuales las adjunciones se procesan ascendentemente. Hemos definido esquemas de compilación para todos estos modelos de autómata. Estos esquemas permiten obtener el conjunto de transiciones correspondiente a la implantación de una determinada estrategia de análisis sintáctico para una gramática dada. Todos los modelos de autómata pueden ser ejecutados en tiempo polinomial con respecto a la longitud de la cadena de entrada mediante la aplicación de técnicas de interpretación tabular. Estas técnicas se basan en la manipulación de representaciones colapsadas de las configuraciones del autómata, denominadas ítems, que se almacenan en una tabla para su posterior reutilización. Con ello se evita la realización de cálculos redundantes. Finalmente, hemos analizado conjuntamente los diferentes modelos de autómata, los cuales se pueden dividir en tres grandes grupos: la familia de los autómatas generales, de la que forman parte los autómatas lineales de índices fuertemente dirigidos y los autómatas con dos pilas fuertemente dirigidos; la familia de los autómatas descendentes, en la que se encuadran los autómatas a pila embebidos y los autómatas lineales de índices orientados a la izquierda; y la familia de los autómatas ascendentes, en la que se enmarcan los autómatas a pila embebidos ascendentes, los autómatas lineales de índices orientados a la derecha y los autómatas con dos pilas ascendentes.[Abstract] Tree adjoining grammars are an extension of context-free grammars that use trees instead of productions as the primary representing structure and that are considered to be adequate to describe most of syntactic phenomena occurring in natural languages. These grammars generate the class of tree adjoining languages, which is equivalent to the class of languages generated by linear indexed grammars and other mildly context-sensitive formalisms. In the first part of this dissertation, we introduce the problem of parsing tree adjoining grammars and linear indexed grammars, creating, for both formalisms, a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum. In the second part, we define several models of automata that accept the class of tree adjoining languages, proposing techniques for their efficient execution. The use of automata for parsing is interesting because they allow us to separate the problem of the definition of parsing algorithms from the problem of their execution. We have considered the following types of automata: • Top-down and bottom-up embedded push-down automata, two extensions of push-down automata working on nested stacks. A new definition is provided in which the finite-state control has been eliminated and several kinds of normalized transition have been defined, preserving the equivalence with tree adjoining languages. • Logical push-down automata restricted to the case of tree adjoining languages. Depending on the set of allowed transitions, we obtain three different types of automata. • Linear indexed automata, left-oriented and right-oriented to describe parsing strategies in which adjuntions are recognized top-down and bottom-up, respectively, and stronglydriven to define parsing strategies recognizing adjunctions top-down and/or bottom-up. • 2-stack automata, an extension of push-down automata working on a pair of stacks, a master stack driving the parsing process and an auxiliary stack restricting the set of transitions that can be applied at a given moment. Strongly-driven 2-stack automata can be used to describe bottom-up, top-down or mixed parsing strategies for tree adjoining languages with respect to the recognition of the adjunctions. Bottom-up 2-stack automata are specifically designed for parsing strategies recognizing adjunctions bottom-up. Compilation schemata for these models of automata have been defined. A compilation schema allow us to obtain the set of transitions corresponding to the implementation of a^ parsing strategy for a given grammar. All the presented automata can be executed in polynomial time with respect to the length of the input string by applying tabulation techniques. A tabular technique makes possible to interpret an automaton by means of the manipulation of collapsed representation of configurations (called items) instead of actual configurations. Items are stored into a table in order to be reused, avoiding redundant computations. Finally, we have studied the relations among the diíferent classes of automata, the main dif%rence being the storage structure used: embedded stacks, indices lists or coupled stacks. According to the strategies that can be implemented, we can distinguish three kinds of automata: bottom-up automata, including bottom-up embedded push-down automata, bottomup restricted logic push-down automata, right-oriented linear indexed automata and bottom-up 2-stack automata; top-down automata, including (top-down) embedded push-down automata, top-down restricted logic push-down automata and left-oriented linear indexed automata; and general automata, including strongly-driven linear indexed automata and strongly-driven 2- stack automata

    Symbolic Proofs for Lattice-Based Cryptography

    Get PDF
    International audienceSymbolic methods have been used extensively for proving security of cryptographic protocols in the Dolev-Yao model, and more recently for proving security of cryptographic primitives and constructions in the computational model. However, existing methods for proving security of cryptographic constructions in the computational model often require significant expertise and interaction, or are fairly limited in scope and expressivity. This paper introduces a symbolic approach for proving security of cryptographic constructions based on the Learning With Errors assumption (Regev, STOC 2005). Such constructions are instances of lattice-based cryptography and are extremely important due to their potential role in post-quantum cryptography. Following (Barthe, Grégoire and Schmidt, CCS 2015), our approach combines a computational logic and deducibility problems-a standard tool for representing the adversary's knowledge, the Dolev-Yao model. The computational logic is used to capture (indistinguishability-based) security notions and drive the security proofs whereas deducibility problems are used as side-conditions to control that rules of the logic are applied correctly. We then use AutoLWE, an implementation of the logic, to deliver very short or even automatic proofs of several emblematic constructions, including CPA-PKE (Gentry et al., STOC 2008), (Hierarchical) Identity-Based Encryption (Agrawal et al. Eurocrypt 2010), Inner Product Encryption (Agrawal et al. Asiacrypt 2011), CCA-PKE (Micciancio et al., Eurocrypt 2012). The main technical novelty beyond AutoLWE is a set of (semi-)decision procedures for deducibility problems, using extensions of Gröbner basis computations for subalgebras in the (non-)commutative setting (instead of ideals in the commutative setting). Our procedures cover the theory of matrices, which is required for lattice-based assumption, as well as the theory of non-commutative rings, fields, and Diffie-Hellman exponentiation, in its standard, bilinear and mul-tilinear forms. Additionally, AutoLWE supports oracle-relative assumptions , which are used specifically to apply (advanced forms of) the Leftover Hash Lemma, an information-theoretical tool widely used in lattice-based proofs

    Anomaly detection & object classification using multi-spectral LiDAR and sonar

    Get PDF
    In this thesis, we present the theory of high-dimensional signal approximation of multifrequency signals. We also present both linear and non-linear compressive sensing (CS) algorithms that generate encoded representations of time-correlated single photon counting (TCSPC) light detection and ranging (LiDAR) data, side-scan sonar (SSS) and synthetic aperture sonar (SAS). The main contributions of this thesis are summarised as follows: 1. Research is carried out studying full-waveform (FW) LiDARs, in particular, the TCSPC data, capture, storage and processing. 2. FW-LiDARs are capable of capturing large quantities of photon-counting data in real-time. However, the real-time processing of the raw LiDAR waveforms hasn’t been widely exploited. This thesis answers some of the fundamental questions: • can semantic information be extracted and encoded from raw multi-spectral FW-LiDAR signals? • can these encoded representations then be used for object segmentation and classification? 3. Research is carried out into signal approximation and compressive sensing techniques, its limitations and the application domains. 4. Research is also carried out in 3D point cloud processing, combining geometric features with material spectra (spectral-depth representation), for object segmentation and classification. 5. Extensive experiments have been carried out with publicly available datasets, e.g. the Washington RGB Image and Depth (RGB-D) dataset [108], YaleB face dataset1 [110], real-world multi-frequency aerial laser scans (ALS)2 and an underwater multifrequency (16 wavelengths) TCSPC dataset collected using custom-build targets especially for this thesis. 6. The multi-spectral measurements were made underwater on targets with different shapes and materials. A novel spectral-depth representation is presented with strong discrimination characteristics on target signatures. Several custom-made and realistically scaled exemplars with known and unknown targets have been investigated using a multi-spectral single photon counting LiDAR system. 7. In this work, we also present a new approach to peak modelling and classification for waveform enabled LiDAR systems. Not all existing approaches perform peak modelling and classification simultaneously in real-time. This was tested on both simulated waveform enabled LiDAR data and real ALS data2 . This PhD also led to an industrial secondment at Carbomap, Edinburgh, where some of the waveform modelling algorithms were implemented in C++ and CUDA for Nvidia TX1 boards for real-time performance. 1http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ 2This dataset was captured in collaboration with Carbomap Ltd. Edinburgh, UK. The data was collected during one of the trials in Austria using commercial-off-the-shelf (COTS) sensors

    15th SC@RUG 2018 proceedings 2017-2018

    Get PDF
    corecore