1,188,015 research outputs found

    Text categorization and similarity analysis: similarity measure, architecture and design

    Get PDF
    This research looks at the most appropriate similarity measure to use for a document classification problem. The goal is to find a method that is accurate in finding both semantically and version related documents. A necessary requirement is that the method is efficient in its speed and disk usage. Simhash is found to be the measure best suited to the application and it can be combined with other software to increase the accuracy. Pingar have provided an API that will extract the entities from a document and create a taxonomy displaying the relationships and this extra information can be used to accurately classify input documents. Two algorithms are designed incorporating the Pingar API and then finally an efficient comparison algorithm is introduced to cut down the comparisons required

    SMIL State: an architecture and implementation for adaptive time-based web applications

    Get PDF
    In this paper we examine adaptive time-based web applications (or presentations). These are interactive presentations where time dictates which parts of the application are presented (providing the major structuring paradigm), and that require interactivity and other dynamic adaptation. We investigate the current technologies available to create such presentations and their shortcomings, and suggest a mechanism for addressing these shortcomings. This mechanism, SMIL State, can be used to add user-defined state to declarative time-based languages such as SMIL or SVG animation, thereby enabling the author to create control flows that are difficult to realize within the temporal containment model of the host languages. In addition, SMIL State can be used as a bridging mechanism between languages, enabling easy integration of external components into the web application. Finally, SMIL State enables richer expressions for content control. This paper defines SMIL State in terms of an introductory example, followed by a detailed specification of the State model. Next, the implementation of this model is discussed. We conclude with a set of potential use cases, including dynamic content adaptation and delayed insertion of custom content such as advertisements. © 2009 Springer Science+Business Media, LLC

    Modeling the object-oriented software process: OPEN and the unified process

    Get PDF
    A short introduction to software process modeling is presented, particularly object-oriented modeling. Two major industrial process models are discussed: the OPEN model and the Unified Process model. In more detail, the quality assurance in the Unified Process tool (formally called Objectory) is reviewed

    The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

    Full text link
    The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping sub- communities within a research specialty and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we aim to increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set in order to adequately represent a research specialty, and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS

    Text categorization and similarity analysis: implementation and evaluation

    Get PDF
    This report covers the implementation of software that aims to identify document versions and se-mantically related documents. This is important due to the increasing amount of digital information. Key criteria were that the software was fast and required limited disk space. Previous research de-termined that the Simhash algorithm was the most appropriate for this application so this method was implemented. The structure of each component was well defined with the inputs and outputs constant and the result was a software system that can have interchangeable parts if required

    A document-like software visualization method for effective cognition of c-based software systems

    Get PDF
    It is clear that maintenance is a crucial and very costly process in a software life cycle. Nowadays there are a lot of software systems particularly legacy systems that are always maintained from time to time as new requirements arise. One important source to understand a software system before it is being maintained is through the documentation, particularly system documentation. Unfortunately, not all software systems developed or maintained are accompanied with their reliable and updated documents. In this case, source codes will be the only reliable source for programmers. A number of studies have been carried out in order to assist cognition based on source codes. One way is through tool automation via reverse engineering technique in which source codes will be parsed and the information extracted will be visualized using certain visualization methods. Most software visualization methods use graph as the main element to represent extracted software artifacts. Nevertheless, current methods tend to produce more complicated graphs and do not grant an explicit, document-like re-documentation environment. Hence, this thesis proposes a document-like software visualization method called DocLike Modularized Graph (DMG). The method is realized in a prototype tool named DocLike Viewer that targets on C-based software systems. The main contribution of the DMG method is to provide an explicit structural re-document mechanism in the software visualization tool. Besides, the DMG method provides more level of information abstractions via less complex graph that include inter-module dependencies, inter-program dependencies, procedural abstraction and also parameter passing. The DMG method was empirically evaluated based on the Goal/Question/Metric (GQM) paradigm and the findings depict that the method can improve productivity and quality in the aspect of cognition or program comprehension. A usability study was also conducted and DocLike Viewer had the most positive responses from the software practitioners
    • 

    corecore