Search CORE

3,839 research outputs found

Generalizing input-driven languages: theoretical and practical benefits

Author: Mandrioli Dino
Pradella Matteo
Publication venue
Publication date: 02/05/2017
Field of study

Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Mathematical documents faithfully computerised : the grammatical and text & symbol aspects of the MathLang framework.

Author: Maarek Manuel
Publication venue: Mathematical and Computer Sciences
Publication date: 01/06/2007
Field of study

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Trustworthy Formal Natural Language Specifications

Author: Gordon Colin S.
Matskevich Sergey
Publication venue
Publication date: 05/10/2023
Field of study

Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness. This paper shows that it is possible to build support for specifications written in expressive subsets of natural language, within existing proof assistants, consistent with the principles used to establish trust and auditability in proof assistants themselves. We implement a means to provide specifications in a modularly extensible formal subset of English, and have them automatically translated into formal claims, entirely within the Lean proof assistant. Our approach is extensible (placing no permanent restrictions on grammatical structure), modular (allowing information about new words to be distributed alongside libraries), and produces proof certificates explaining how each word was interpreted and how the sentence's structure was used to compute the meaning. We apply our prototype to the translation of various English descriptions of formal specifications from a popular textbook into Lean formalizations; all can be translated correctly with a modest lexicon with only minor modifications related to lexicon size.Comment: arXiv admin note: substantial text overlap with arXiv:2205.0781

arXiv.org e-Print Archive

Stroke order normalization for improving recognition of online handwritten mathematical expressions

Author: Indurkhya Bipin
Le Anh Duc
Nakagawa Masaki
Nguyen Hai Dai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We present a technique based on stroke order normalization for improving recognition of online handwritten mathematical expressions (ME). The stroke order dependent system has less time complexity than the stroke order free system, but it must incorporate special grammar rules to cope with stroke order variations. The stroke order normalization technique solves this problem and also the problem of unexpected stroke order variations without increasing the time complexity of ME recognition. In order to normalize stroke order, the X-Y cut method is modified since its original form causes problems when structural components in ME overlap. First, vertically ordered strokes are located by detecting vertical symbols and their upper/lower components, which are treated as MEs and reordered recursively. Second, unordered strokes on the left side of the vertical symbols are reordered as horizontally ordered strokes. Third, the remaining strokes are reordered recursively. The horizontally ordered strokes are reordered from left to right, and the vertically ordered strokes are reordered from top to bottom. Finally, the proposed stroke order normalization is combined with the stroke order dependent ME recognition system. The evaluations on the CROHME 2014 database show that the ME recognition system incorporating the stroke order normalization outperforms all other systems that use only CROHME 2014 for training while the processing time is kept low

Jagiellonian Univeristy Repository

Gradual computerisation and verification of mathematics : MathLang's path into Mizar

Author: Retel Krzysztof
Publication venue: 'Heriot-Watt University'
Publication date: 01/04/2009
Field of study

There are many proof checking tools that allow capturing mathematical knowledge into formal representation. Those proof systems allow further automatic verifica- tion of the logical correctness of the captured knowledge. However, the process of encoding common mathematical documents in a chosen proof system is still labour- intensive and requires comprehensive knowledge of such system. This makes the use of proof checking tools inaccessible for ordinary mathematicians. This thesis provides a solution for the computerisation of mathematical documents via a num- ber of gradual steps using the MathLang framework. We express the full process of formalisation into the Mizar proof checker. The first levels of such gradual computerisation path have been developing well before the course of this PhD started. The whole project, called MathLang, dates back to 2000 when F. Kamareddine and J.B. Wells started expressing their ideas of novel approach for computerising mathematical texts. They mainly aimed at developing a mathematical framework which is flexible enough to connect existing, in many cases different, approaches of computerisation mathematics, which allows various degrees of formalisation (e.g., partial, full formalisation of chosen parts, or full formalisation of the entire doc- ument), which is compatible with different mathematical foundations (e.g., type theory, set theory, category theory, etc.) and proof systems (e.g., Mizar, Isar, Coq, HOL, Vampire). The first two steps in the gradual formalisation were developed by F. Kamareddine, J.B. Wells and M. Maarek with a small contribution of R. Lamar to the second step. In this thesis we develop the third level of the gradual path, which aims at capturing the rhetorical structure of mathematical documents. We have also integrated further steps of the gradual formalisation, whose final goal is the Mizar system. We present in this thesis a full path of computerisation and formalisation of math- ematical documents into the Mizar proof checker using the MathLang framework. The development of this method was driven by the experience of computerising a number of mathematical documents (covering different authoring styles)

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Wittgenstein and the Grammar of Physics: A Study of Ludwig Wittgenstein\u27s 1929-1930 Manuscripts and the Roots of His Later Philosophy

Author: Alterman Anton
Publication venue: CUNY Academic Works
Publication date: 01/01/2000
Field of study

In 1929 Wittgenstein began to work on the first philosophical manuscripts he had kept since completing the Tractatus Logico-Philosophicus (TLP) in 1918. The impetus for this was his conviction that the logic of the TLP was flawed: it was unable to account for the fact that a proposition that assigns a single value on a continuum to a simple object thereby excludes all assignments of different values to the object (the color exclusion problem). Consequently Wittgenstein\u27s atomic propositions could not be logically independent of one another. Initially he thought he could replace the logically perfect language of the TLP with a phenomenological language in which experiential propositions about various spaces (e.g., visual space ) would form systems. The system described by a phenomenological language would be independent of the one described by ordinary physicalistic language; the physical world was known only by inference from the phenomenological. But he soon realized there was a fundamental error in this conception: phenomenology has to describe the same world as physics or it fails to provide a foundation for it. This suggests that there is only one world, and one language with two different modes of expression. Wittgenstein\u27s now comes to his fundamental insight: ordinary language is biased towards the description of physical objects and their relations, but it is our only method of expressing phenomenological or abstract concepts. Failure to recognize this difficulty leads to the misapplication of physicalistic concepts, i.e., to grammatical errors. This insight had the following impact: (a) the task of philosophy is not to invent logical or phenomenological languages, but to understand the grammar of ordinary language; (b) the notion of a language of pure experience was itself connected with a false, physicalistic idea of the soul as an observer of a private world; (c) the conception of analysis that had guided philosophy since Frege was based on a misused metaphor taken from the grammar of physics: the analysis of an object into its parts or chemical constituents. These ideas reach their ultimate expression in Wittgenstein\u27s Philosophical Investigations. Thus his later work actually begins with these 1929–30 manuscripts

City University of New York

Mathematical Formula Recognition and Automatic Detection and Translation of Algorithmic Components into Stochastic Petri Nets in Scientific Documents

Author: Kostalia Elisavet Elli
Publication venue: CORE Scholar
Publication date: 01/01/2021
Field of study

A great percentage of documents in scientific and engineering disciplines include mathematical formulas and/or algorithms. Exploring the mathematical formulas in the technical documents, we focused on the mathematical operations associations, their syntactical correctness, and the association of these components into attributed graphs and Stochastic Petri Nets (SPN). We also introduce a formal language to generate mathematical formulas and evaluate their syntactical correctness. The main contribution of this work focuses on the automatic segmentation of mathematical documents for the parsing and analysis of detected algorithmic components. To achieve this, we present a synergy of methods, such as string parsing according to mathematical rules, Formal Language Modeling, optical analysis of technical documents in forms of images, structural analysis of text in images, and graph and Stochastic Petri Net mapping. Finally, for the recognition of the algorithms, we enriched our rule based model with machine learning techniques to acquire better results

CORE

Programming language complexity analysis and its impact on Checkmarx activities

Author: Pinto Gonçalo Rodrigues
Publication venue
Publication date: 15/12/2022
Field of study

Dissertação de mestrado integrado em Informatics EngineeringTools for Programming Languages processing, like Static Analysers (for instance, a Static Application Security Testing (SAST) tool, one of Checkmarx’s main products), must be adapted to cope with a given input when the source programming language changes. Complexity of the programming language is one of the key factors that deeply impact the time of giving support to it. This Master’s Project aims at proposing an approach for assessing language complexity, measuring, at a first stage, the complexity of its underlying context-free grammar (CFG). From the analysis of concrete case studies, factors have been identified that make the support process more time-consuming, in particular in the stages of language recognition and in the transformation to an abstract syntax tree (AST). In this sense, at a second stage, a set of language features is analysed in order to take into account the referred factors that also impact on the language processing. The main objective of the Master’s work here reported is to help development teams to improve the estimation of time and effort needed to adapt the SAST Tool in order to cope with a new programming language. In this dissertation a tool is proposed, that allows for the evaluation of the complexity of a language based on a set of metrics to classify the complexity of its grammar, along with a set of language properties. The tool compares the new language complexity so far determined with previously supported languages, to predict the effort to process the new language.Ferramentas para processamento de Linguagens de Programação, como os Analisadores Estáticos (por exemplo, uma ferramenta de Testes Estáticos para Análise da Segurança de Aplicações, um dos principais produtos da Checkmarx), devem ser adaptadas para lidar com uma dada entrada quando a linguagem de programação de origem muda. A complexidade da linguagem de programação é um dos fatores-chave que influencia profundamente o tempo de suporte à mesma. Este projeto de Mestrado visa propor uma abordagem para avaliar a complexidade de uma linguagem de programação, medindo, numa primeira fase, a complexidade da gramática independente de contexto (GIC) subjacente. A partir da análise de casos concretos, foram identificados fatores (relacionados como facilidades específicas oferecidas pela linguagem) que tornam o processo de suporte mais demorado, em particular nas fases de reconhecimento da linguagem e na transformação para uma árvore de sintaxe abstrata (AST). Neste sentido, numa segunda fase, foi identificado um conjunto de características linguísticas de modo a ter em conta os referidos fatores que também têm impacto no processamento da linguagem. O principal objetivo do trabalho de mestrado aqui relatado é auxiliar as equipas de desenvolvimento a melhorar a estimativa do tempo e esforço necessários para adaptar a ferramenta SAST de modo a lidar com uma nova linguagem de programação. Como resultado deste projeto, tal como se descreve na dissertação, é proposta uma ferramenta, que permite a avaliação da complexidade de uma linguagem com base num conjunto de métricas para classificar a complexidade da sua gramática, e em um conjunto de propriedades linguísticas. A ferramenta compara a complexidade da nova linguagem, avaliada por aplicação do processo referido, com as linguagens anteriormente suportadas, para prever o esforço para processar a nova linguagem

Universidade do Minho: RepositoriUM

Do Neural Nets Learn Statistical Laws behind Natural Language?

Author: Takahashi Shuntaro
Tanaka-Ishii Kumiko
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/11/2017
Field of study

The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.Comment: 21 pages, 11 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare