7 research outputs found

    Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books

    Full text link
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.This work has been partially supported through the European Union’s H2020 grant READ (Ref: 674943), the European project ERC-2010-AdG-20100407-269796, the MINECO/FEDER, UE projects TIN2015-70924-C2-1-R and TIN2015-70924-C2-2-R, and the Ramon y Cajal Fellowship RYC-2014-16831.Romero Gómez, V.; Fornes, A.; Vidal Ruiz, E.; Sánchez Peiró, JA. (2016). Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books. IEEE. https://doi.org/10.1109/ICFHR.2016.0069

    Dynamic Protocol Reverse Engineering a Grammatical Inference Approach

    Get PDF
    Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration

    Symbolic and connectionist learning techniques for grammatical inference

    Get PDF
    This thesis is structured in four parts for a total of ten chapters. The first part, introduction and review (Chapters 1 to 4), presents an extensive state-of-the-art review of both symbolic and connectionist GI methods, that serves also to state most of the basic material needed to describe later the contributions of the thesis. These contributions constitute the contents of the rest of parts (Chapters 5 to 10). The second part, contributions on symbolic and connectionist techniques for regular grammatical inference (Chapters 5 to 7), describes the contributions related to the theory and methods for regular GI, which include other lateral subjects such as the representation oí. finite-state machines (FSMs) in recurrent neural networks (RNNs).The third part of the thesis, augmented regular expressions and their inductive inference, comprises Chapters 8 and 9. The augmented regular expressions (or AREs) are defined and proposed as a new representation for a subclass of CSLs that does not contain all the context-free languages but a large class of languages capable of describing patterns with symmetries and other (context-sensitive) structures of interest in pattern recognition problems.The fourth part of the thesis just includes Chapter 10: conclusions and future research. Chapter 10 summarizes the main results obtained and points out the lines of further research that should be followed both to deepen in some of the theoretical aspects raised and to facilitate the application of the developed GI tools to real-world problems in the area of computer vision

    Sixth NASTRAN (R) Users' Colloquium

    Get PDF
    Papers are presented on NASTRAN programming, and substructuring methods, as well as on fluids and thermal applications. Specific applications and capabilities of NASTRAN were also delineated along with general auxiliary programs

    Recombinant expression and analysis of tetraspanin extracellular-2 domains

    Get PDF
    Tetraspanins are a superfamily of membrane proteins which span the membrane four times; they are found predominantly at the cell surface but are also located on intracellular vesicles. Tetraspanins (with a few exceptions) do not have conventional receptor ligand functionality and instead form lateral associations with other molecules within the membrane. Binding partners include, but are not limited to, MHC proteins, integrins, signalling proteins and other members of the tetraspanin superfamily. This large network of interactions has led to the idea of tetraspanin enriched microdomains (TEMs) or the tetraspanin web, in which tetraspanins function by bringing together proteins to form functional clusters which allow processes (such as signal transduction or adhesion) to take place more efficiently. Due to the numerous diverse binding partners, tetraspanins have been implicated in a number of cellular and pathological processes. Despite tetraspanins being involved in fundamental physiological processes, relatively little is known about the function of individual members. Difficulties arise as only a few monoclonal antibodies are available to the native proteins and mouse knock outs often show only a mild phenotypic change. Our group and others have used recombinant human EC2 domains in the form of GST fusion proteins to assess tetraspanin involvement in several processes. This region is thought to attribute specificity to individual tetraspanin members and when recombinant versions are added to cells exogenously, they have been shown to modulate different cellular events including adhesion, migration and fusion. Due to a number of inherent drawbacks associated with bacterially expressed recombinant EC2 domains (such as LPS contamination and inferior folding), the initial aim of this work was to express the recombinant proteins in a mammalian host. Despite multiple attempts to express the proteins in mammalian or insect cells using different vector systems, this was not successful, although it was demonstrated that DNA was integrated into the host genome and that EC2 encoding mRNA was expressed. Following this, it was decided to focus on bacterial expression and use the EC2 domains generated to further our understanding of tetraspanin involvement in IgE mediated degranulation this is a critical first step in Type I Hypersensitivity and although several tetraspanin family members have been implicated in this pathway, their exact involvement is not yet clear. Here, recombinant EC2 domains of tetraspanins CD9, CD63, CD81 and CD151 were used for the first time in conjunction with RBL-2H3 cells (a cell line commonly used as a model for mast cell degranulation) to examine tetraspanin involvement in IgE mediated degranulation. These particular tetraspanins were selected because past studies iii have implicated these members in mast cell activation, but dispite an anti-CD63 antibody being able to down regulate degranulation, in this instance the EC2 domains did not exhibit any modulating effect on this form of degranulation in RBL-2H3 cells. The activity of these particular recombinant proteins was demonstrated in two other functional assays; bacterial adhesion to endothelial cells and bacterial induced giant cell formation. Later work sought to characterise the EC2 proteins in terms of their secondary structure, LPS content and their ability to bind to cells, with the hope of elucidating their mechanism of action. The binding of each EC2 domains to two cell lines was examined: RBL-2H3 where EC2 domains show no effect on degranulation, and HEC-1B cells, where EC2 domains were previously shown exhibit biological activity by reducing bacterial adhesion. At highest concentrations utilised, the EC2 domains were shown to bind to both cell lines significantly more than the GST control protein, but attempts to examine the specificity of the EC2 interaction with the cells by competitive inhibition gave inconclusive results. Whilst this may have been due to technical issues it is tempting to speculate that this indicates cellular interactions that do not follow conventional binding mechanisms. The LPS content of the EC2 domains was shown not to correlate with their ability to modulate bacterial-induced cell fusion. Finally, to facilitate structural and future functional studies, attempts were made to optimise the removal of the GST tag from the fusion proteins. CD spectroscopy was then performed and attributed the EC2 domains of CD9 and CD81 with 50% and 52% α-helical structure, respectively, as expected for these proteins. Although initial aims of producing mammalian EC2 domains were not fulfilled, they were successfully produced in bacteria and used in RBL degranulation assays. later work indicated that LPS contamination was not the causative component of EC2 preparations and confirmed some level of secondary structure. Furthermore, the apparent lack of effect of recombinant EC2 domains in RBL-2H3 activation may shed light on tetraspanin interaction with the high affinity IgE receptor
    corecore