998 research outputs found

    MECA: Mathematical Expression Based Post Publication Content Analysis

    Get PDF
    Mathematical expressions (ME) are critical abstractions for technical publications. While the sheer volume of technical publications grows in time, few ME centric applications have been developed due to the steep gap between the typesetting data in post-publication digital documents and the high-level technical semantics. With the acceleration of the technical publications every year, word-based information analysis technologies are inadequate to enable users in discovery, organizing, and interrelating technical work efficiently and effectively. This dissertation presents a modeling framework and the associated algorithms, called the mathematical-centered post-publication content analysis (MECA) system to address several critical issues to build a layered solution architecture for recovery of high-level technical information. Overall, MECA is consisted of four layers of modeling work, starting from the extraction of MEs from Portable Document Format (PDF) files. Specifically, a weakly-supervised sequential typesetting Bayesian model is developed by using a concise font-value based feature space for Bayesian inference of ME vs. words for the rendering units separated by space. A Markov Random Field (MRF) model is designed to merge and correct the MEs identified from the rendering units, which are otherwise prone to fragmentation of large MEs. At the next layer, MECA aims at the recovery of ME semantics. The first step is the ME layout analysis to disambiguate layout structures based on a Content-Constrained Spatial (CCS) global inference model to overcome local errors. It achieves high accuracy at low computing cost by a parametric lognormal model for the feature distribution of typographic systems. The ME layout is parsed into ME semantics with a three-phase processing workflow to overcome a variety of semantic ambiguities. In the first phase, the ME layout is linearized into a token sequence, upon which the abstract syntax tree (AST) is constructed in the second phase using probabilistic context-free grammar. Tree rewriting will transform the AST into ME objects in the third phase. Built upon the two layers of ME extraction and semantics modeling work, next we explore one of the bonding relationships between words and MEs: ME declarations, where the words and MEs are respectively the qualitative and quantitative (QuQn) descriptors of technical concepts. Conventional low-level PoS tagging and parsing tools have poor performance in the processing of this type of mixed word-ME (MWM) sentences. As such, we develop an MWM processing toolkit. A semi-automated weakly-supervised framework is employed for mining of declaration templates from a large amount of unlabeled data so that the templates can be used for the detection of ME declarations. On the basis of the three low-level content extraction and prediction solutions, the MECA system can extract MEs, interpret their mathematical semantics, and identify their bonding declaration words. By analyzing the dependency among these elements in a paper, we can construct a QuQn map, which essentially represents the reasoning flow of a paper. Three case studies are conducted for QuQn map applications: differential content comparison of papers, publication trend generation, and interactive mathematical learning. Outcomes from these studies suggest that MECA is a highly practical content analysis technology based on a theoretically sound framework. Much more can be expanded and improved upon for the next generation of deep content analysis solutions

    Conceptual roles of data in program: analyses and applications

    Get PDF
    Program comprehension is the prerequisite for many software evolution and maintenance tasks. Currently, the research falls short in addressing how to build tools that can use domain-specific knowledge to provide powerful capabilities for extracting valuable information for facilitating program comprehension. Such capabilities are critical for working with large and complex program where program comprehension often is not possible without the help of domain-specific knowledge.;Our research advances the state-of-art in program analysis techniques based on domain-specific knowledge. The program artifacts including variables and methods are carriers of domain concepts that provide the key to understand programs. Our program analysis is directed by domain knowledge stored as domain-specific rules. Our analysis is iterative and interactive. It is based on flexible inference rules and inter-exchangeable and extensible information storage. We designed and developed a comprehensive software environment SeeCORE based on our knowledge-centric analysis methodology. The SeeCORE tool provides multiple views and abstractions to assist in understanding complex programs. The case studies demonstrate the effectiveness of our method. We demonstrate the flexibility of our approach by analyzing two legacy programs in distinct domains

    Languages and Tools for Optimization of Large-Scale Systems

    Get PDF
    Modeling and simulation are established techniques for solving design problems in a wide range of engineering disciplines today. Dedicated computer languages, such as Modelica, and efficient software tools are available. In this thesis, an extension of Modelica, Optimica, targeted at dynamic optimization of Modelica models is proposed. In order to demonstrate the Optimica extension, supporting software has been developed. This includes a modularly extensible Modelica compiler, the JModelica compiler, and an extension that supports also Optimica. A Modelica library for paper machine dryer section modeling, DryLib, has been developed. The classes in the library enable structured and hierarchical modeling of dryer sections at the application user level, while offering extensibility for the expert user. Based on DryLib, a parameter optimization problem, a model reduction problem, and an optimization-based control problem have been formulated and solved. A start-up optimization problem for a plate reactor has been formulated in Optimica, and solved by means of the Optimica compiler. In addition, the robustness properties of the start-up trajectories have been evaluated by means of Monte-Carlo simulation. In many control systems, it is necessary to consider interaction with a user. In this thesis, a manual control scheme for an unstable inverted pendulum system, where the inputs are bounded, is presented. The proposed controller is based on the notion of reachability sets and guarantees semi global stability for all references. An inverted pendulum on a two wheels robot has been developed. A distributed control system, including sensor processing algorithms and a stabilizing control scheme has been implemented on three on-board embedded processors

    Simple algorithm for judging equivalence of differential-algebraic equation systems

    Get PDF
    Mathematical formulas play a prominent role in science, technology, engineering, and mathematics (STEM) documents; understanding STEM documents usually requires knowing the difference between equation groups containing multiple equations. When two equation groups can be transformed into the same form, we call the equation groups equivalent. Existing tools cannot judge the equivalence of two equation groups; thus, we develop an algorithm to judge such an equivalence using a computer algebra system. The proposed algorithm first eliminates variables appearing only in either equation group. It then checks the equivalence of the equations one by one: the equations with identical algebraic solutions for the same variable are judged equivalent. If each equation in one equation group is equivalent to an equation in the other, the equation groups are judged equivalent; otherwise, non-equivalent. We generated 50 pairs of equation groups for evaluation. The proposed method accurately judged the equivalence of all pairs. This method is expected to facilitate comprehension of a large amount of mathematical information in STEM documents. Furthermore, this is a necessary step for machines to understand equations, including process models

    Mathematical Formula Recognition and Automatic Detection and Translation of Algorithmic Components into Stochastic Petri Nets in Scientific Documents

    Get PDF
    A great percentage of documents in scientific and engineering disciplines include mathematical formulas and/or algorithms. Exploring the mathematical formulas in the technical documents, we focused on the mathematical operations associations, their syntactical correctness, and the association of these components into attributed graphs and Stochastic Petri Nets (SPN). We also introduce a formal language to generate mathematical formulas and evaluate their syntactical correctness. The main contribution of this work focuses on the automatic segmentation of mathematical documents for the parsing and analysis of detected algorithmic components. To achieve this, we present a synergy of methods, such as string parsing according to mathematical rules, Formal Language Modeling, optical analysis of technical documents in forms of images, structural analysis of text in images, and graph and Stochastic Petri Net mapping. Finally, for the recognition of the algorithms, we enriched our rule based model with machine learning techniques to acquire better results

    gbeta - a Language with Virtual Attributes, Block Structure, and Propagating, Dynamic Inheritance

    Get PDF
    A language design development process is presented which leads to a language, gbeta, with a tight integration of virtual classes, general block structure, and a multiple inheritance mechanism based on coarse-grained structural type equivalence. From this emerges the concept of propagating specialization. The power lies in the fact that a simple expression can have far-reaching but well-organized consequences, e.g., in one step causing the combination of families of classes, then by propagation the members of those families, and finally by propagation the methods of the members. Moreover, classes are first class values which can be constructed at run-time, and it is possible to inherit from classes whether or not they are compile-time constants, and whether or not they were created dynamically. It is also possible to change the class and structure of an existing object at run-time, preserving object identity. Even though such dynamism is normally not seen in statically type-checked languages, these constructs have been integrated without compromising the static type safety of the language
    corecore