12 research outputs found

    Mathematical Formula Recognition and Automatic Detection and Translation of Algorithmic Components into Stochastic Petri Nets in Scientific Documents

    Get PDF
    A great percentage of documents in scientific and engineering disciplines include mathematical formulas and/or algorithms. Exploring the mathematical formulas in the technical documents, we focused on the mathematical operations associations, their syntactical correctness, and the association of these components into attributed graphs and Stochastic Petri Nets (SPN). We also introduce a formal language to generate mathematical formulas and evaluate their syntactical correctness. The main contribution of this work focuses on the automatic segmentation of mathematical documents for the parsing and analysis of detected algorithmic components. To achieve this, we present a synergy of methods, such as string parsing according to mathematical rules, Formal Language Modeling, optical analysis of technical documents in forms of images, structural analysis of text in images, and graph and Stochastic Petri Net mapping. Finally, for the recognition of the algorithms, we enriched our rule based model with machine learning techniques to acquire better results

    Learning Semantic Correspondences in Technical Documentation

    Full text link
    We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals.Comment: accepted to ACL-201

    Object-oriented GUI design of a modeling environment for logical discrete event systems

    Get PDF
    Ce mémoire porte sur la conception et l'implémentation d'une partie de l'interface personne-machine orientée objet d'un environnement de modélisation de systèmes réactifs appelé MELODIES (Modeling Environment for LOgical Discrete Event Systems ). Cet environnement permet la conception, l'analyse, la simulation et le contrôle de systèmes à événements discrets. L'architecture de l'interface est basée sur certaines idées empruntées au schéma de conception Model-View-Controller (MVC) et au paradigme JSP Model 2 Architecture . Il en résulte une nette séparation entre l'interface personne-machine, les structures de données ainsi que les fonctions sous-jacentes. L'adoption d'une approche orientée objet, comme celle supportée par VisualAge for Java 4.0, au lieu d'une approche orientée fichier supportée par plusieurs environnements de développement (par exemple JBuilder for Java 3.0) permet une plus grande convivialité et une meilleure organisation des artéfacts de modélisation. Afin de déployer cette interface sur différentes plateformes et d'assurer une rapidité d'exécution, la boîte à outils Qt 3.0 et le langage C++, avec sa librairie STL, ont été utilisés dans l'étape de codification. De plus, XML a été retenu comme langage de représentation de données afin de permettre un déploiement éventuel de MELODIES sur le Web

    Composable Distributed Access Control and Integrity Policies for Query-Based Wireless Sensor Networks

    Get PDF
    An expected requirement of wireless sensor networks (WSN) is the support of a vast number of users while permitting limited access privileges. While WSN nodes have severe resource constraints, WSNs will need to restrict access to data, enforcing security policies to protect data within WSNs. To date, WSN security has largely been based on encryption and authentication schemes. WSN Authorization Specification Language (WASL) is specified and implemented using tools coded in JavaTM. WASL is a mechanism{independent policy language that can specify arbitrary, composable security policies. The construction, hybridization, and composition of well{known security models is demonstrated and shown to preserve security while providing for modifications to permit inter{network accesses with no more impact on the WSN nodes than any other policy update. Using WASL and a naive data compression scheme, a multi-level security policy for a 1000-node network requires 66 bytes of memory per node. This can reasonably be distributed throughout a WSN. The compilation of a variety of policy compositions are shown to be feasible using a notebook{class computer like that expected to be performing typical WSN management responsibilities

    Cyber physical complex networks, modeling, analysis, and control

    Full text link
    This research scrutinize various attributes of complex networks; mainly, modeling, sensing, estimation, safety analysis, and control. In this study, formal languages and finite automata are used for modeling incident management processes. Safety properties are checked in order to verify the system. This method introduces a systematic approach to incident management protocols that are governed by mostly unsystematic algorithms. A portion of the used data in this study is collected by means of radar and loop detectors. A weighted t-statistics methodology is developed in order to validate these detectors. The detector data is then used to extract travel time information where travel time reliability is investigated. Classical reliability measures are examined and compared with the new entropy based reliability measure proposed in this study. The novel entropy based reliability measure introduces a more consistent measure with the classical definition of travel time reliability than traditional measures. Furthermore, it measures uncertainty directly using the full distribution of the examined random variable where previously developed reliability measures only use first and second moments. Various approaches of measuring network reliability are also investigated in this study. Finally, feedback linearization control scheme is developed for a ramp meter that is modeled using Godunov\u27s conditions at the boundaries representing a switched system. This study demonstrates the advantages of implementing a feedback liberalized control scheme with recursive real time parameter estimation over the commonly practiced velocity based thresholds

    Subtyping with Generics: A Unified Approach

    Get PDF
    Reusable software increases programmers\u27 productivity and reduces repetitive code and software bugs. Variance is a key programming language mechanism for writing reusable software. Variance is concerned with the interplay of parametric polymorphism (i.e., templates, generics) and subtype (inclusion) polymorphism. Parametric polymorphism enables programmers to write abstract types and is known to enhance the readability, maintainability, and reliability of programs. Subtyping promotes software reuse by allowing code to be applied to a larger set of terms. Integrating parametric and subtype polymorphism while maintaining type safety is a difficult problem. Existing variance mechanisms enable greater subtyping between parametric types, but they suffer from severe deficiencies. They are unable to express several common type abstractions. They can cause a proliferation of types and redundant code. They are difficult for programmers to use due to its inherent complexity. This dissertation aims to improve variance mechanisms in programming languages supporting parametric polymorphism. To address the shortcomings of current mechanisms, I will combine two popular approaches, definition-site variance and use-site variance, in a single programming language. I have developed formal languages or calculi for reasoning about variance. The calculi are example languages supporting both notions of definition-site and use-site variance. They enable stating precise properties that can be proved rigorously. The VarLang calculus demonstrates fundamental issues in variance from a language neutral perspective. The VarJ calculus illustrates realistic complications by modeling a mainstream programming language, Java. VarJ not only supports both notions of use-site and definition-site variance but also language features with complex interactions with variance such as F-bounded polymorphism and wildcard capture. A mapping from Java to VarLang was implemented in software that infers definition-site variance for Java. Large, standard Java libraries (e.g. Oracle\u27s JDK 1.6) were analyzed using the software to compute metrics measuring the benefits of adding definition-site variance to Java, which only supports use-site variance. Applying this technique to six Java generic libraries shows that 21-47% (depending on the library) of generic definitions are inferred to have single-variance; 7-29% of method signatures can be relaxed through this inference, and up to 100% of existing wildcard annotations are unnecessary and can be elided. Although the VarJ calculus proposes how to extend Java with definition-site variance, no mainstream language currently supports both definition-site and use-site variance. To assist programmers with utilizing both notions with existing technology, I developed a refactoring tool that refactors Java code by inferring definition-site variance and adding wildcard annotations. This tool is practical and immediately applicable: It assumes no changes to the Java type system, while taking into account all its intricacies. This system allows users to select declarations (variables, method parameters, return types, etc.) to generalize and considers declarations not declared in available source code. I evaluated our technique on six Java generic libraries. I found that 34% of available declarations of variant type signatures can be generalized-i.e., relaxed with more general wildcard types. On average, 146 other declarations need to be updated when a declaration is generalized, showing that this refactoring would be too tedious and error-prone to perform manually. The result of applying this refactoring is a more general interface that supports greater software reuse

    Neural and Computational Principles of Real-World Sequence Processing

    Get PDF
    We are constantly processing sequential information in our day-to-day life, from listening to a piece of music (processing a stream of notes), watching a movie (processing a series of scenes), to having conversations with people around us (processing a stream of syllables, words, and sentences). What are the neural and computational principles underlying this ubiquitous cognitive process? In this thesis, I first review the background and prior studies regarding the neural and computational mechanisms of real-life sequence processing and present our research questions. I then present four research projects to answer those questions: By combining neuroimaging data analysis and computational modeling, I discovered the neural phenomena of integrating and forgetting temporal information during naturalistic sequence processing in the human cerebral cortex. Furthermore, I identified computational principles (e.g., hierarchical architecture) and processes (e.g., dynamical context gating) which can help to explain the neural state changes observed during naturalistic processing. These neural and computational findings not only validate the existing components of hierarchical temporal integration theory, but also rule out alternative models, and propose important new elements of the theory, including context gating at event boundaries. I next explored the computations for natural language processing in brains and machines, by (1) applying our neuroscience-inspired methods to examine the timescale and functional organization of neural network language models, thereby revealing their own architecture for processing information over multiple timescales; and by (2) investigating the context and entity representations in two neural networks with brain-inspired architectures, thereby revealing a gap between brain-inspired and performance-optimized architectures. Finally, I discuss the positions and contributions of our findings in the field and some future directions

    Semantic analysis for improved multi-document summarization of text

    Get PDF
    Excess amount of unstructured data is easily accessible in digital format. This information overload places too heavy a burden on society for its analysis and execution needs. Focused (i.e. topic, query, question, category, etc.) multi-document summarization is an information reduction solution which has reached a state-of-the-art that now demands the need to further explore other techniques to model human summarization activity. Such techniques have been mainly extractive and rely on distribution and complex machine learning on corpora in order to perform closely to human summaries. Overall, these techniques are still being used, and the field now needs to move toward more abstractive approaches to model human way of summarizing. A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process. The proposed system is novel in its use of a new semantic analysis metric to better score sentences for selection into a summary. It also simplifies semantic processing of sentences to better capture more likely semantic-related information, reduce redundancy and reduce complexity. The system is evaluated against participants in the Document Understanding Conference and the later Text Analysis Conference using the performance ROUGE measures of n-gram recall between automated systems, human and baseline gold standard baseline summaries. The goal was to show that semantic analysis used for summarization can perform well, while remaining simple and inexpensive without significant loss of recall as compared to the foundational baseline system. Current results show improvement over the gold standard baseline when all factors of this work's semantic analysis technique are used in combination. These factors are the semantic cue words feature and semantic class weighting to determine sentences with important information. Also, the semantic triples clustering used to decompose natural language sentences to their most basic meaning and select the most important sentences added to this improvement. In competition against the gold standard baseline system on the standardized summarization evaluation metric ROUGE, this work outperforms the baseline system by more than ten position rankings. This work shows that semantic analysis and light-weight, open-domain techniques have potential.Ph.D., Information Studies -- Drexel University, 201

    New resources and ideas for semantic parser induction

    Get PDF
    In this thesis, we investigate the general topic of computational natural language understanding (NLU), which has as its goal the development of algorithms and other computational methods that support reasoning about natural language by the computer. Under the classical approach, NLU models work similar to computer compilers (Aho et al., 1986), and include as a central component a semantic parser that translates natural language input (i.e., the compiler’s high-level language) to lower-level formal languages that facilitate program execution and exact reasoning. Given the difficulty of building natural language compilers by hand, recent work has centered around semantic parser induction, or on using machine learning to learn semantic parsers and semantic representations from parallel data consisting of example text-meaning pairs (Mooney, 2007a). One inherent difficulty in this data-driven approach is finding the parallel data needed to train the target semantic parsing models, given that such data does not occur naturally “in the wild” (Halevy et al., 2009). Even when data is available, the amount of domain- and language-specific data and the nature of the available annotations might be insufficient for robust machine learning and capturing the full range of NLU phenomena. Given these underlying resource issues, the semantic parsing field is in constant need of new resources and datasets, as well as novel learning techniques and task evaluations that make models more robust and adaptable to the many applications that require reliable semantic parsing. To address the main resource problem involving finding parallel data, we investigate the idea of using source code libraries, or collections of code and text documentation, as a parallel corpus for semantic parser development and introduce 45 new datasets in this domain and a new and challenging text-to-code translation task. As a way of addressing the lack of domain- and language-specific parallel data, we then use these and other benchmark datasets to investigate training se- mantic parsers on multiple datasets, which helps semantic parsers to generalize across different domains and languages and solve new tasks such as polyglot decoding and zero-shot translation (i.e., translating over and between multiple natural and formal languages and unobserved language pairs). Finally, to address the issue of insufficient annotations, we introduce a new learning framework called learning from entailment that uses entailment information (i.e., high-level inferences about whether the meaning of one sentence follows from another) as a weak learning signal to train semantic parsers to reason about the holes in their analysis and learn improved semantic representations. Taken together, this thesis contributes a wide range of new techniques and technical solutions to help build semantic parsing models with minimal amounts of training supervision and manual engineering effort, hence avoiding the resource issues described at the onset. We also introduce a diverse set of new NLU tasks for evaluating semantic parsing models, which we believe help to extend the scope and real world applicability of semantic parsing and computational NLU
    corecore