835 research outputs found
The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes
Automatic testing is a widely adopted technique for improving software
quality. Software developers add, remove and update test methods and test
classes as part of the software development process as well as during the
evolution phase, following the initial release. In this work we conduct a large
scale study of 61 popular open source projects and report the relationships we
have established between test maintenance, production code maintenance, and
semantic changes (e.g, statement added, method removed, etc.). performed in
developers' commits.
We build predictive models, and show that the number of tests in a software
project can be well predicted by employing code maintenance profiles (i.e., how
many commits were performed in each of the maintenance activities: corrective,
perfective, adaptive). Our findings also reveal that more often than not,
developers perform code fixes without performing complementary test maintenance
in the same commit (e.g., update an existing test or add a new one). When
developers do perform test maintenance, it is likely to be affected by the
semantic changes they perform as part of their commit.
Our work is based on studying 61 popular open source projects, comprised of
over 240,000 commits consisting of over 16,000,000 semantic change type
instances, performed by over 4,000 software engineers.Comment: postprint, ICSME 201
Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk
Background. Test resources are usually limited and therefore it is often not
possible to completely test an application before a release. To cope with the
problem of scarce resources, development teams can apply defect prediction to
identify fault-prone code regions. However, defect prediction tends to low
precision in cross-project prediction scenarios.
Aims. We take an inverse view on defect prediction and aim to identify
methods that can be deferred when testing because they contain hardly any
faults due to their code being "trivial". We expect that characteristics of
such methods might be project-independent, so that our approach could improve
cross-project predictions.
Method. We compute code metrics and apply association rule mining to create
rules for identifying methods with low fault risk. We conduct an empirical
study to assess our approach with six Java open-source projects containing
precise fault data at the method level.
Results. Our results show that inverse defect prediction can identify approx.
32-44% of the methods of a project to have a low fault risk; on average, they
are about six times less likely to contain a fault than other methods. In
cross-project predictions with larger, more diversified training sets,
identified methods are even eleven times less likely to contain a fault.
Conclusions. Inverse defect prediction supports the efficient allocation of
test resources by identifying methods that can be treated with less priority in
testing activities and is well applicable in cross-project prediction
scenarios.Comment: Submitted to PeerJ C
Blink and it's done: Interactive queries on very large data
In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H benchmark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150x faster than Hive on MapReduce and 10--150x faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2--10%.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)QUALCOMM Inc.Amazon.com (Firm)Google (Firm)SAP CorporationBlue GojiCisco Systems, Inc.Cloudera, Inc.Ericsson, Inc.General Electric CompanyHewlett-Packard CompanyIntel CorporationMarkLogic CorporationMicrosoft CorporationNetAppOracle CorporationSplunk Inc.VMware, Inc.United States. Defense Advanced Research Projects Agency (Contract FA8650-11-C-7136
Efficient caching algorithms for memory management in computer systems
As disk performance continues to lag behind that of memory systems and processors, fully utilizing memory to reduce disk accesses is a highly effective effort to improve the entire system performance. Furthermore, to serve the applications running on a computer in distributed systems, not only the local memory but also the memory on remote servers must be effectively managed to minimize I/O operations. The critical challenges in an effective memory cache management include: (1) Insightfully understanding and quantifying the locality inherent in the memory access requests; (2) Effectively utilizing the locality information in replacement algorithms; (3) Intelligently placing and replacing data in the multi-level caches of a distributed system; (4) Ensuring that the overheads of the proposed schemes are acceptable.;This dissertation provides solutions and makes unique and novel contributions in application locality quantification, general replacement algorithms, low-cost replacement policy, thrashing protection, as well as multi-level cache management in a distributed system. First, the dissertation proposes a new method to quantify locality strength, and accurately to identify the data with strong locality. It also provides a new replacement algorithm, which significantly outperforms existing algorithms. Second, considering the extremely low-cost requirements on replacement policies in virtual memory management, the dissertation proposes a policy meeting the requirements, and considerably exceeding the performance existing policies. Third, the dissertation provides an effective scheme to protect the system from thrashing for running memory-intensive applications. Finally, the dissertation provides a multi-level block placement and replacement protocol in a distributed client-server environment, exploiting non-uniform locality strengths in the I/O access requests.;The methodology used in this study include careful application behavior characterization, system requirement analysis, algorithm designs, trace-driven simulation, and system implementations. A main conclusion of the work is that there is still much room for innovation and significant performance improvement for the seemingly mature and stable policies that have been broadly used in the current operating system design
- …