42,540 research outputs found
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
The effectiveness of refactoring, based on a compatibility testing taxonomy and a dependency graph
In this paper, we describe and then appraise a testing taxonomy proposed by van Deursen and Moonen (VD&M) based on the post-refactoring repeatability of tests. Four categories of refactoring are identified by VD&M ranging from semantic-preserving to incompatible, where, for the former, no new tests are required and for the latter, a completely new test set has to be developed. In our appraisal of the taxonomy, we heavily stress the need for the inter-dependence of the refactoring categories to be considered when making refactoring decisions and we base that need on a refactoring dependency graph developed as part of the research. We demonstrate that while incompatible refactorings may be harmful and time-consuming from a testing perspective, semantic-preserving refactorings can have equally unpleasant hidden ramifications despite their advantages. In fact, refactorings which fall into neither category have the most interesting properties. We support our results with empirical refactoring data drawn from seven Java open-source systems (OSS) and from the same analysis form a tentative categorization of code smells
Compatible Remediation on Vulnerabilities from Third-Party Libraries for Java Projects
With the increasing disclosure of vulnerabilities in open-source software,
software composition analysis (SCA) has been widely applied to reveal
third-party libraries and the associated vulnerabilities in software projects.
Beyond the revelation, SCA tools adopt various remediation strategies to fix
vulnerabilities, the quality of which varies substantially. However,
ineffective remediation could induce side effects, such as compilation
failures, which impede acceptance by users. According to our studies, existing
SCA tools could not correctly handle the concerns of users regarding the
compatibility of remediated projects. To this end, we propose Compatible
Remediation of Third-party libraries (CORAL) for Maven projects to fix
vulnerabilities without breaking the projects. The evaluation proved that CORAL
not only fixed 87.56% of vulnerabilities which outperformed other tools (best
75.32%) and achieved a 98.67% successful compilation rate and a 92.96%
successful unit test rate. Furthermore, we found that 78.45% of vulnerabilities
in popular Maven projects could be fixed without breaking the compilation, and
the rest of the vulnerabilities (21.55%) could either be fixed by upgrades that
break the compilations or even be impossible to fix by upgrading.Comment: 11 pages, conferenc
Corpus Annotation for Parser Evaluation
We describe a recently developed corpus annotation scheme for evaluating
parsers that avoids shortcomings of current methods. The scheme encodes
grammatical relations between heads and dependents, and has been used to mark
up a new public-domain corpus of naturally occurring English text. We show how
the corpus can be used to evaluate the accuracy of a robust parser, and relate
the corpus to extant resources.Comment: 7 pages, LaTeX (uses eaclap.sty
The 3G standard setting strategy and indigenous innovation policy in China is TD-SCDMA a flagship?
In the time of “network economy”, industries and the public have stressed several “battles for dominance” between two or more rival technologies, often involving well-known firms operating in highly visible industries. In this paper, we are going to focus on the Chinese self-developed standard TD-SCDMA to perceive the implication and target of the nation’s policy and strategy. The motivation of the research starts from the interesting fact we observed: TD-SCDMA is named as the Chinese made standard, however the Chinese hold core patent technology is still about 7%, while most of the rest part is still taken by other foreign companies. The “faultage” between the small share reality and a self made standard sweet dream implies a well plotted strategy. In order to understand it, we firstly raise the question of why the Chinese government postpones the 3G decision again and again. Then we go further to probe why the standard-setting of TD-SCDMA has aroused wide attention as a strategic tool to fulfill “indigenous innovation”, and finally becomes part of national science and technology policy to increase international competitiveness? We are going to use economics theories to understand the essence of the creation of TD-SCDMA, and its relation to China’s interests.3G, standard, innovation, China
COMPATIBILITY TESTING FOR COMPONENT-BASED SYSTEMS
Many component-based systems are deployed in diverse environments, each with different components and with different component versions. To ensure the system builds correctly for all deployable combinations (or, configurations), developers often perform compatibility testing by building their systems on various configurations. However, due to the large number of possible configurations, testing all configurations is often infeasible, and in practice, only a handful of popular configurations are tested; as a result, errors can escape to the field. This problem is compounded when components evolve over time and when test resources are limited.
To address these problems, in this dissertation I introduce a process, algorithms and a tool called Rachet. First, I describe a formal modeling scheme for capturing the system configuration space, and a sampling criterion that determines the portion of the space to test. I describe an algorithm to sample configurations satisfying the sampling criterion and methods to test the sampled configurations.
Second, I present an approach that incrementally tests compatibility between components, so as to accommodate component evolution. I describe methods to compute test obligations, and algorithms to produce configurations that test the obligations, attempting to reuse test artifacts.
Third, I present an approach that prioritizes and tests configurations based on developers' preferences. Configurations are tested, by default starting from the most preferred one as requested by a developer, but cost-related factors are also considered to reduce overall testing time.
The testing approaches presented are applied to two large-scale systems in the high-performance computing domain, and experimental results show that the approaches can (1) identify compatibility between components effectively and efficiently, (2) make the process of compatibility testing more practical under constant component evolution, and also (3) help developers achieve preferred compatibility results early in the overall testing process when time and resources are limited
- …