1,072 research outputs found
A web assessment approach based on summarisation and visualisation
The number of Web sites has noticeably increased to roughly 224 million in last ten years. This means there is a rapid growth of information on the Internet. Although search engines can help users to filter their desired information, the searched result is normally presented in the form of a very long list, and users have to visit each Web page in order to determine the appropriateness of the result. This leads to a considerable amount of time has to be spent on finding the required information. To address this issue, this paper proposes a Web assessment approach in order to provide an overview of the information on a Website using an integration of existing summarisation and visualisation techniques, which are text summarisation, tag cloud, Document Type View, and interactive features. This approach is capable to reduce the time required to identify and search for information from the Web
XML Matchers: approaches and challenges
Schema Matching, i.e. the process of discovering semantic correspondences
between concepts adopted in different data source schemas, has been a key topic
in Database and Artificial Intelligence research areas for many years. In the
past, it was largely investigated especially for classical database models
(e.g., E/R schemas, relational databases, etc.). However, in the latest years,
the widespread adoption of XML in the most disparate application fields pushed
a growing number of researchers to design XML-specific Schema Matching
approaches, called XML Matchers, aiming at finding semantic matchings between
concepts defined in DTDs and XSDs. XML Matchers do not just take well-known
techniques originally designed for other data models and apply them on
DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical
structure of a DTD/XSD) to improve the performance of the Schema Matching
process. The design of XML Matchers is currently a well-established research
area. The main goal of this paper is to provide a detailed description and
classification of XML Matchers. We first describe to what extent the
specificities of DTDs/XSDs impact on the Schema Matching task. Then we
introduce a template, called XML Matcher Template, that describes the main
components of an XML Matcher, their role and behavior. We illustrate how each
of these components has been implemented in some popular XML Matchers. We
consider our XML Matcher Template as the baseline for objectively comparing
approaches that, at first glance, might appear as unrelated. The introduction
of this template can be useful in the design of future XML Matchers. Finally,
we analyze commercial tools implementing XML Matchers and introduce two
challenging issues strictly related to this topic, namely XML source clustering
and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure
Tracking sub-page components in document workflows
Documents go through numerous transformations and intermediate formats as they are processed from abstract markup into final printable form. This notion of a document workflow is well established but it is common to find that ideas about document components, which might exist in the source code for the document, become completely lost within an amorphous, unstructured, page of PDF prior to being rendered. Given the importance of a component-based approach in Variable Data Printing (VDP) we have developed a collection of tools that allow information about the various transformations to be embedded at each stage in the workflow, together with a visualization tool that uses this embedded information to display the relationships between the various intermediate documents.
In this paper, we demonstrate these tools in the context of an example document workflow but the techniques described are widely applicable and would be easily adaptable to other workflows and for use in teaching tools to illustrate document component and VDP concepts
Content vs metrics: Using language modeling to evaluate in-line source code comments for Python
Undergraduate thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in / Computer Science, May 2020Documentation is vital to the understanding, maintenance and, ultimately, survival of
software projects . And yet, a lot of software projects either lack documentation, or are
very poorly documented. This results in a gradual decline in the quality of the code
and may require complete overhauls in extreme cases. It is therefore important to evaluate
documentation to ensure that it conveys clear and meaningful ideas. While existing
methods of evaluating documentation are metrics based and look at the structure of documentation
examples, this paper explores the possibility of evaluating documentation by
assessing its contents. There is, however, a lack of an existing corpus of documentation
for natural language processing tasks. A corpus of Python function/method comments
is assembled, and a language modeling experiment is performed on them. The results of
this experiment are mixed. While they show that it is possible to evaluate documentation
by looking at its content as opposed to structure, they also show that this approach may
not necessarily be more accurate, with lower quality comment examples having higher
probability than those of higher quality.Ashesi Universit
Head to head: Semantic similarity of multi-word terms
Terms are linguistic signifiers of domain–specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this study, we use lexico–syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi–word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag–of–word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi–automate taxonomy construction
Formal Requirements-Based Programming for Complex Systems
Computer science as a field has not yet produced a general method to mechanically transform complex computer system requirements into a provably equivalent implementation. Such a method would be one major step towards dealing with complexity in computing, yet it remains the elusive holy grail of system development. Currently available tools and methods that start with a formal model of a system and mechanically produce a provably equivalent implementation are valuable but not sufficient. The gap that such tools and methods leave unfilled is that the formal models cannot be proven to be equivalent to the system requirements as originated by the customer For the classes of complex systems whose behavior can be described as a finite (but significant) set of scenarios, we offer a method for mechanically transforming requirements (expressed in restricted natural language, or appropriate graphical notations) into a provably equivalent formal model that can be used as the basis for code generation and other transformations. While other techniques are available, this method is unique in offering full mathematical tractability while using notations and techniques that are well known and well trusted. We illustrate the application of the method to an example procedure from the Hubble Robotic Servicing Mission currently under study and preliminary formulation at NASA Goddard Space Flight Center
- …