87 research outputs found

    Detection of subtle variations as consensus motifs

    Get PDF
    AbstractWe address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences

    Dealing with clones in software : a practical approach from detection towards management

    Get PDF
    Despite the fact that duplicated fragments of code also called code clones are considered one of the prominent code smells that may exist in software, cloning is widely practiced in industrial development. The larger the system, the more people involved in its development and the more parts developed by different teams result in an increased possibility of having cloned code in the system. While there are particular benefits of code cloning in software development, research shows that it might be a source of various troubles in evolving software. Therefore, investigating and understanding clones in a software system is important to manage the clones efficiently. However, when the system is fairly large, it is challenging to identify and manage those clones properly. Among the various types of clones that may exist in software, research shows detection of near-miss clones where there might be minor to significant differences (e.g., renaming of identifiers and additions/deletions/modifications of statements) among the cloned fragments is costly in terms of time and memory. Thus, there is a great demand of state-of-the-art technologies in dealing with clones in software. Over the years, several tools have been developed to detect and visualize exact and similar clones. However, usually the tools are standalone and do not integrate well with a software developer's workflow. In this thesis, first, a study is presented on the effectiveness of a fingerprint based data similarity measurement technique named 'simhash' in detecting clones in large scale code-base. Based on the positive outcome of the study, a time efficient detection approach is proposed to find exact and near-miss clones in software, especially in large scale software systems. The novel detection approach has been made available as a highly configurable and fully fledged standalone clone detection tool named 'SimCad', which can be configured for detection of clones in both source code and non-source code based data. Second, we show a robust use of the clone detection approach studied earlier by assembling its detection service as a portable library named 'SimLib'. This library can provide tightly coupled (integrated) clone detection functionality to other applications as opposed to loosely coupled service provided by a typical standalone tool. Because of being highly configurable and easily extensible, this library allows the user to customize its clone detection process for detecting clones in data having diverse characteristics. We performed a user study to get some feedback on installation and use of the 'SimLib' API (Application Programming Interface) and to uncover its potential use as a third-party clone detection library. Third, we investigated on what tools and techniques are currently in use to detect and manage clones and understand their evolution. The goal was to find how those tools and techniques can be made available to a developer's own software development platform for convenient identification, tracking and management of clones in the software. Based on that, we developed a clone-aware software development platform named 'SimEclipse' to promote the practical use of code clone research and to provide better support for clone management in software. Finally, we evaluated 'SimEclipse' by conducting a user study on its effectiveness, usability and information management. We believe that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspect of code clone research and manage cloned code in software systems

    Automatic extraction of mobility activities in microblogs

    Get PDF
    Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Self-learning Anomaly Detection in Industrial Production

    Get PDF

    Toward an improved understanding of software change

    Get PDF
    Structural changes, including moving, renaming, merging and splitting are important design change decisions made by programmers. However, during the process of software evolution, this information often gets lost. Recovering instances of structural changes in the past, as well as understanding them, are essential for us to achieve a better understanding of how and why software changes. In this thesis, we propose an approach that helps to recover and understand the lost information of structural changes

    A disk-resident suffix tree index and generic framework for managing tunable indexes

    Get PDF
    This thesis introduces two related technologies. The first is a disk-resident index for biological sequence data, and the second is a framework and toolkit for the management of operational parameters for applications of which this index is typical. The Top-Compressed Suffix Tree is a novel data structure that can be used to provide a scalable, disk-resident index for large sequences. This data structure is based on the suffix tree, but has been designed to overcome the problems associated with using such structures on secondary memory. Top-Compressed Suffix Trees can be constructed incrementally, allowing indexes to be created that are larger than the amount of available main memory. Correspondingly, querying such an index only requires part of the data structure to be resident in main memory, thus allowing support for on-demand faulting and eviction of index sections during search. Such an index may be of great benefit to scientists requiring efficient access to vast repositories of genomic data. The Generic Index Development and Operation Framework (GIDOF) is a framework and toolkit that supports various tasks relating to the management of operational parameters. The performance of an index's implementation is typically influenced by several operational parameters parameters that must be tuned carefully if optimum performance is to be obtained. Indexes implemented using GIDOF can be structured in such a way that values of selected operational parameters can be adjusted; resulting in an index implementation that can be tuned to suit a given workload or system environment. This thesis presents a detailed description of the design of both the Top-Compressed Suffix Tree and the algorithms that operate over it. Extensive performance measurements are then presented and discussed, covering such aspects of index performance as construction time, average query performance and the size of the completed index. An overview of the GIDOF parameter model and toolkit is then given together with examples of how this framework can be used to manage tunable indexes, such as the Top-Compressed Suffix Tree

    Knowledge Based Systems: A Critical Survey of Major Concepts, Issues, and Techniques

    Get PDF
    This Working Paper Series entry presents a detailed survey of knowledge based systems. After being in a relatively dormant state for many years, only recently is Artificial Intelligence (AI) - that branch of computer science that attempts to have machines emulate intelligent behavior - accomplishing practical results. Most of these results can be attributed to the design and use of Knowledge-Based Systems, KBSs (or ecpert systems) - problem solving computer programs that can reach a level of performance comparable to that of a human expert in some specialized problem domain. These systems can act as a consultant for various requirements like medical diagnosis, military threat analysis, project risk assessment, etc. These systems possess knowledge to enable them to make intelligent desisions. They are, however, not meant to replace the human specialists in any particular domain. A critical survey of recent work in interactive KBSs is reported. A case study (MYCIN) of a KBS, a list of existing KBSs, and an introduction to the Japanese Fifth Generation Computer Project are provided as appendices. Finally, an extensive set of KBS-related references is provided at the end of the report

    Matching XML schemas by a new tree matching algorithm.

    Get PDF
    Schema matching is one of the key operations in XML-based information integration and exchanging applications. Automatically or semi-automatically matching XML schemas has attracted a lot of attentions in academia and industry due to the extensive adoption of XML techniques. One of the most difficult tasks in this problem is to identify structural relations between two XML schemas. This thesis builds an automatic XML schema matching system which generates two types of outputs: the element mappings and schema similarity. In this system, an XML schema is modeled as a tree. We have proposed a new tree matching algorithm to compute the structural relation by extracting the most similar common substructures. The algorithm has been designed to achieve a trade-off between matching optimality and time complexity. The experimental results show that this system can be used to match large XML schemas.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .W366. Source: Masters Abstracts International, Volume: 43-05, page: 1758. Adviser: Jianguo Lu. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System

    Get PDF
    Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems
    corecore