1,072 research outputs found

    A web assessment approach based on summarisation and visualisation

    Get PDF
    The number of Web sites has noticeably increased to roughly 224 million in last ten years. This means there is a rapid growth of information on the Internet. Although search engines can help users to filter their desired information, the searched result is normally presented in the form of a very long list, and users have to visit each Web page in order to determine the appropriateness of the result. This leads to a considerable amount of time has to be spent on finding the required information. To address this issue, this paper proposes a Web assessment approach in order to provide an overview of the information on a Website using an integration of existing summarisation and visualisation techniques, which are text summarisation, tag cloud, Document Type View, and interactive features. This approach is capable to reduce the time required to identify and search for information from the Web

    Automated subject classification of textual web documents

    Full text link

    Prediction based task scheduling in distributed computing

    Full text link

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    Tracking sub-page components in document workflows

    Get PDF
    Documents go through numerous transformations and intermediate formats as they are processed from abstract markup into final printable form. This notion of a document workflow is well established but it is common to find that ideas about document components, which might exist in the source code for the document, become completely lost within an amorphous, unstructured, page of PDF prior to being rendered. Given the importance of a component-based approach in Variable Data Printing (VDP) we have developed a collection of tools that allow information about the various transformations to be embedded at each stage in the workflow, together with a visualization tool that uses this embedded information to display the relationships between the various intermediate documents. In this paper, we demonstrate these tools in the context of an example document workflow but the techniques described are widely applicable and would be easily adaptable to other workflows and for use in teaching tools to illustrate document component and VDP concepts

    Content vs metrics: Using language modeling to evaluate in-line source code comments for Python

    Get PDF
    Undergraduate thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in / Computer Science, May 2020Documentation is vital to the understanding, maintenance and, ultimately, survival of software projects . And yet, a lot of software projects either lack documentation, or are very poorly documented. This results in a gradual decline in the quality of the code and may require complete overhauls in extreme cases. It is therefore important to evaluate documentation to ensure that it conveys clear and meaningful ideas. While existing methods of evaluating documentation are metrics based and look at the structure of documentation examples, this paper explores the possibility of evaluating documentation by assessing its contents. There is, however, a lack of an existing corpus of documentation for natural language processing tasks. A corpus of Python function/method comments is assembled, and a language modeling experiment is performed on them. The results of this experiment are mixed. While they show that it is possible to evaluate documentation by looking at its content as opposed to structure, they also show that this approach may not necessarily be more accurate, with lower quality comment examples having higher probability than those of higher quality.Ashesi Universit

    Head to head: Semantic similarity of multi-word terms

    Get PDF
    Terms are linguistic signifiers of domain–specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this study, we use lexico–syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi–word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag–of–word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi–automate taxonomy construction

    Formal Requirements-Based Programming for Complex Systems

    Get PDF
    Computer science as a field has not yet produced a general method to mechanically transform complex computer system requirements into a provably equivalent implementation. Such a method would be one major step towards dealing with complexity in computing, yet it remains the elusive holy grail of system development. Currently available tools and methods that start with a formal model of a system and mechanically produce a provably equivalent implementation are valuable but not sufficient. The gap that such tools and methods leave unfilled is that the formal models cannot be proven to be equivalent to the system requirements as originated by the customer For the classes of complex systems whose behavior can be described as a finite (but significant) set of scenarios, we offer a method for mechanically transforming requirements (expressed in restricted natural language, or appropriate graphical notations) into a provably equivalent formal model that can be used as the basis for code generation and other transformations. While other techniques are available, this method is unique in offering full mathematical tractability while using notations and techniques that are well known and well trusted. We illustrate the application of the method to an example procedure from the Hubble Robotic Servicing Mission currently under study and preliminary formulation at NASA Goddard Space Flight Center
    • …
    corecore