42,540 research outputs found

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    The effectiveness of refactoring, based on a compatibility testing taxonomy and a dependency graph

    Get PDF
    In this paper, we describe and then appraise a testing taxonomy proposed by van Deursen and Moonen (VD&M) based on the post-refactoring repeatability of tests. Four categories of refactoring are identified by VD&M ranging from semantic-preserving to incompatible, where, for the former, no new tests are required and for the latter, a completely new test set has to be developed. In our appraisal of the taxonomy, we heavily stress the need for the inter-dependence of the refactoring categories to be considered when making refactoring decisions and we base that need on a refactoring dependency graph developed as part of the research. We demonstrate that while incompatible refactorings may be harmful and time-consuming from a testing perspective, semantic-preserving refactorings can have equally unpleasant hidden ramifications despite their advantages. In fact, refactorings which fall into neither category have the most interesting properties. We support our results with empirical refactoring data drawn from seven Java open-source systems (OSS) and from the same analysis form a tentative categorization of code smells

    Compatible Remediation on Vulnerabilities from Third-Party Libraries for Java Projects

    Full text link
    With the increasing disclosure of vulnerabilities in open-source software, software composition analysis (SCA) has been widely applied to reveal third-party libraries and the associated vulnerabilities in software projects. Beyond the revelation, SCA tools adopt various remediation strategies to fix vulnerabilities, the quality of which varies substantially. However, ineffective remediation could induce side effects, such as compilation failures, which impede acceptance by users. According to our studies, existing SCA tools could not correctly handle the concerns of users regarding the compatibility of remediated projects. To this end, we propose Compatible Remediation of Third-party libraries (CORAL) for Maven projects to fix vulnerabilities without breaking the projects. The evaluation proved that CORAL not only fixed 87.56% of vulnerabilities which outperformed other tools (best 75.32%) and achieved a 98.67% successful compilation rate and a 92.96% successful unit test rate. Furthermore, we found that 78.45% of vulnerabilities in popular Maven projects could be fixed without breaking the compilation, and the rest of the vulnerabilities (21.55%) could either be fixed by upgrades that break the compilations or even be impossible to fix by upgrading.Comment: 11 pages, conferenc

    Corpus Annotation for Parser Evaluation

    Full text link
    We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources.Comment: 7 pages, LaTeX (uses eaclap.sty

    The 3G standard setting strategy and indigenous innovation policy in China is TD-SCDMA a flagship?

    Get PDF
    In the time of “network economy”, industries and the public have stressed several “battles for dominance” between two or more rival technologies, often involving well-known firms operating in highly visible industries. In this paper, we are going to focus on the Chinese self-developed standard TD-SCDMA to perceive the implication and target of the nation’s policy and strategy. The motivation of the research starts from the interesting fact we observed: TD-SCDMA is named as the Chinese made standard, however the Chinese hold core patent technology is still about 7%, while most of the rest part is still taken by other foreign companies. The “faultage” between the small share reality and a self made standard sweet dream implies a well plotted strategy. In order to understand it, we firstly raise the question of why the Chinese government postpones the 3G decision again and again. Then we go further to probe why the standard-setting of TD-SCDMA has aroused wide attention as a strategic tool to fulfill “indigenous innovation”, and finally becomes part of national science and technology policy to increase international competitiveness? We are going to use economics theories to understand the essence of the creation of TD-SCDMA, and its relation to China’s interests.3G, standard, innovation, China

    COMPATIBILITY TESTING FOR COMPONENT-BASED SYSTEMS

    Get PDF
    Many component-based systems are deployed in diverse environments, each with different components and with different component versions. To ensure the system builds correctly for all deployable combinations (or, configurations), developers often perform compatibility testing by building their systems on various configurations. However, due to the large number of possible configurations, testing all configurations is often infeasible, and in practice, only a handful of popular configurations are tested; as a result, errors can escape to the field. This problem is compounded when components evolve over time and when test resources are limited. To address these problems, in this dissertation I introduce a process, algorithms and a tool called Rachet. First, I describe a formal modeling scheme for capturing the system configuration space, and a sampling criterion that determines the portion of the space to test. I describe an algorithm to sample configurations satisfying the sampling criterion and methods to test the sampled configurations. Second, I present an approach that incrementally tests compatibility between components, so as to accommodate component evolution. I describe methods to compute test obligations, and algorithms to produce configurations that test the obligations, attempting to reuse test artifacts. Third, I present an approach that prioritizes and tests configurations based on developers' preferences. Configurations are tested, by default starting from the most preferred one as requested by a developer, but cost-related factors are also considered to reduce overall testing time. The testing approaches presented are applied to two large-scale systems in the high-performance computing domain, and experimental results show that the approaches can (1) identify compatibility between components effectively and efficiently, (2) make the process of compatibility testing more practical under constant component evolution, and also (3) help developers achieve preferred compatibility results early in the overall testing process when time and resources are limited
    corecore