426 research outputs found

    Program Analysis Scenarios in Rascal

    Get PDF
    Rascal is a meta programming language focused on the implementation of domain-specific languages and on the rapid construction of tools for software analysis and software transformation. In this paper we focus on the use of Rascal for software analysis. We illustrate a range of scenarios for building new software analysis tools through a number of examples, including one showing integration with an existing Maude-based analysis. We then focus on ongoing work on alias analysis and type inference for PHP, showing how Rascal is being used, and sketching a hypothetical solution in Maude. We conclude with a high-level discussion on the commonalities and differences between Rascal and Maude when applied to program analysis

    Rascal: From Algebraic Specification to Meta-Programming

    Full text link
    Algebraic specification has a long tradition in bridging the gap between specification and programming by making specifications executable. Building on extensive experience in designing, implementing and using specification formalisms that are based on algebraic specification and term rewriting (namely Asf and Asf+Sdf), we are now focusing on using the best concepts from algebraic specification and integrating these into a new programming language: Rascal. This language is easy to learn by non-experts but is also scalable to very large meta-programming applications. We explain the algebraic roots of Rascal and its main application areas: software analysis, software transformation, and design and implementation of domain-specific languages. Some example applications in the domain of Model-Driven Engineering (MDE) are described to illustrate this.Comment: In Proceedings AMMSE 2011, arXiv:1106.596

    A heuristic-based approach to code-smell detection

    Get PDF
    Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache

    A Language for Specifying Compiler Optimizations for Generic Software

    Get PDF
    Compiler optimization is important to software performance, and modern processor architectures make optimization even more critical. However, many modern software applications use libraries providing high levels of abstraction. Such libraries often hinder effective optimization—the libraries are difficult to analyze using current compiler technology. For example, high-level libraries often use dynamic memory allocation and indirectly expressed control structures, such as iterator-based loops. Programs using these libraries often cannot achieve an optimal level of performance. On the other hand, software libraries have also been recognized as potentially aiding in program optimization. One proposed implementation of library-based optimization is to allow the library author, or a library user, to define custom analyses and optimizations. Only limited systems have been created to take advantage of this potential, however. One problem in creating a framework for defining new optimizations and analyses is how users are to specify them: implementing them by hand inside a compiler is difficult and prone to errors. Thus, a domain-specific language for library-based compiler optimizations would be beneficial. Many optimization specification languages have appeared in the literature, but they tend to be either limited in power or unnecessarily difficult to use. Therefore, I have designed, implemented, and evaluated the Pavilion language for specifying program analyses and optimizations, designed for library authors and users. These analyses and optimizations can be based on the implementation of a particular library, its use in a specific program, or on the properties of a broad range of types, expressed through concepts. The new system is intended to provide a high level of expressiveness, even though the intended users are unlikely to be compiler experts

    Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

    Full text link
    The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio

    Preventing atomicity violations with contracts

    Get PDF
    Concurrent programming is a difficult and error-prone task because the programmer must reason about multiple threads of execution and their possible interleavings. A concurrent program must synchronize the concurrent accesses to shared memory regions, but this is not enough to prevent all anomalies that can arise in a concurrent setting. The programmer can misidentify the scope of the regions of code that need to be atomic, resulting in atomicity violations and failing to ensure the correct behavior of the program. Executing a sequence of atomic operations may lead to incorrect results when these operations are co-related. In this case, the programmer may be required to enforce the sequential execution of those operations as a whole to avoid atomicity violations. This situation is specially common when the developer makes use of services from third-party packages or modules. This thesis proposes a methodology, based on the design by contract methodology, to specify which sequences of operations must be executed atomically. We developed an analysis that statically verifies that a client of a module is respecting its contract, allowing the programmer to identify the source of possible atomicity violations.Fundação para a Ciência e Tecnologia - research project Synergy-VM(PTDC/EIA-EIA/113613/2009

    Proceedings of Monterey Workshop 2001 Engineering Automation for Sofware Intensive System Integration

    Get PDF
    The 2001 Monterey Workshop on Engineering Automation for Software Intensive System Integration was sponsored by the Office of Naval Research, Air Force Office of Scientific Research, Army Research Office and the Defense Advance Research Projects Agency. It is our pleasure to thank the workshop advisory and sponsors for their vision of a principled engineering solution for software and for their many-year tireless effort in supporting a series of workshops to bring everyone together.This workshop is the 8 in a series of International workshops. The workshop was held in Monterey Beach Hotel, Monterey, California during June 18-22, 2001. The general theme of the workshop has been to present and discuss research works that aims at increasing the practical impact of formal methods for software and systems engineering. The particular focus of this workshop was "Engineering Automation for Software Intensive System Integration". Previous workshops have been focused on issues including, "Real-time & Concurrent Systems", "Software Merging and Slicing", "Software Evolution", "Software Architecture", "Requirements Targeting Software" and "Modeling Software System Structures in a fastly moving scenario".Office of Naval ResearchAir Force Office of Scientific Research Army Research OfficeDefense Advanced Research Projects AgencyApproved for public release, distribution unlimite

    A Language for Specifying Compiler Optimizations for Generic Software

    Full text link

    DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining

    Get PDF
    The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties

    H-CFA: a Simplified Approach for Pushdown Control Flow Analysis

    Get PDF
    In control flow analysis (CFA), call/return mismatch is a problem that reduces analysis precision. So-called k-CFA uses bounded call-strings to obtain limited call/return matching, but it has a serious performance problem due to its coupling of call/return matching with context-sensitivity of values. CFA2 and PDCFA are the first two algorithms that bring pushdown (context-free reachability) approach to the CFA area, which provide perfect call/return mathcing. However, CFA2 and PDCFA both need significant engineering effort to implement. The abstracting abstract machine (AAM), a configurable framework for constructing abstract interpreters, introduces store-allocated continuations that make the soundness of abstract interpreters easily obtainable. Recently, two related approaches (AAC and P4F) provide call/return matching using AAM by modeling the call-stack as a pushdown system. However, AAC incurs high overhead and is hard to understand, while P4F cannot compute monovariant analysis. To overcome the above shortcomings, we developed a new method, h-CFA, to address the call/return mismatch problem. h-CFA records the program execution history during abstract interpretation and uses it to avoid control flow merging that causes call/return mismatch. Our method uses AAM and is very easy to implement for ANF style program. ANF is a popular intermediate representation of programs that converts all complex intra-procedural control flows to linear let-bindings and sets a syntactic variable to each sub-expression. In addition, our method reveals an essential property of any pushdown CFA, which we exploited in the development of a static analyzer for JavaScript, named JsCFA. This application of the essential property avoids recording the program execution history, so source programs are no long required being the ANF form. Meanwhile, JsCFA adopts a technique to solve the environment problem or fake rebinding, which eliminates more defects of monovariant analysis. This, in cooperation with exact call/return matching, yield more precise analysis and better performance. Moreover, JsCFA supports a configurable interface to add context-sensitivity to selected areas of programs. JsCFA applies the interface to improve the analysis precision for runtime object extensions. Finally, we quantitatively evaluated the performance of JsCFA
    • …
    corecore