Search CORE

788 research outputs found

BenchPress: Analyzing Android App Vulnerability Benchmark Suites

Author: Mitra Joydeep
Narkar Aditya
Ranganath Venkatesh-Prasad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2019
Field of study

In recent years, various benchmark suites have been developed to evaluate the efficacy of Android security analysis tools. The choice of such benchmark suites used in tool evaluations is often based on the availability and popularity of suites and not on their characteristics and relevance. One of the reasons for such choices is the lack of information about the characteristics and relevance of benchmarks suites. In this context, we empirically evaluated four Android specific benchmark suites: DroidBench, Ghera, IccBench, and UBCBench. For each benchmark suite, we identified the APIs used by the suite that were discussed on Stack Overflow in the context of Android app development and measured the usage of these APIs in a sample of 227K real world apps (coverage). We also compared each pair of benchmark suites to identify the differences between them in terms of API usage. Finally, we identified security-related APIs used in real-world apps but not in any of the above benchmark suites to assess the opportunities to extend benchmark suites (gaps). The findings in this paper can help 1) Android security analysis tool developers choose benchmark suites that are best suited to evaluate their tools (informed by coverage and pairwise comparison) and 2) Android app vulnerability benchmark creators develop and extend benchmark suites (informed by gaps).Comment: Updates based on AMobile 2019 review

arXiv.org e-Print Archive

Crossref

A benchmark suite for evaluating the performance of the WebODE Ontology Engineering Platform

Author: García-Castro Raúl
Gómez-Pérez A.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2004
Field of study

Ontology tools play a key role in the development and maintenance of the Semantic Web. Hence, we need in one hand to objectively evaluate these tools, in order to analyse whether they can deal with actual and future requirements, and in the other hand to develop benchmark suites for performing these evaluations. In this paper, we describe the method we have followed to design and implement a benchmark suite for evaluating the performance of the WebODE ontology engineering workbench, along with the conclusions obtained after using this benchmark suite for evaluating WebODE

CiteSeerX

Archivo Digital UPM

Recommended from our members

Automatic generation of synthetic workloads for multicore systems

Author: Ganesan Karthik
Publication venue
Publication date: 11/07/2012
Field of study

textWhen designing a computer system, benchmark programs are used with cycle accurate performance/power simulators and HDL level simulators to evaluate novel architectural enhancements, perform design space exploration, understand the worst-case power characteristics of various designs and find performance bottlenecks. This research effort is directed towards automatically generating synthetic benchmarks to tackle three design challenges: 1) For most of the simulation related purposes, full runs of modern real world parallel applications like the PARSEC, SPLASH suites cannot be used as they take machine weeks of time on cycle accurate and HDL level simulators incurring a prohibitively large time cost 2) The second design challenge is that, some of these real world applications are intellectual property and cannot be shared with processor vendors for design studies 3) The most significant problem in the design stage is the complexity involved in fixing the maximum power consumption of a multicore design, called the Thermal Design Power (TDP). In an effort towards fixing this maximum power consumption of a system at the most optimal point, designers are used to hand-crafting possible code snippets called power viruses. But, this process of trying to manually write such maximum power consuming code snippets is very tedious. All of these aforementioned challenges has lead to the resurrection of synthetic benchmarks in the recent past, serving as a promising solution to all the challenges. During the design stage of a multicore system, availability of a framework to automatically generate system-level synthetic benchmarks for multicore systems will greatly simplify the design process and result in more confident design decisions. The key idea behind such an adaptable benchmark synthesis framework is to identify the key characteristics of real world parallel applications that affect the performance and power consumption of a real program and create synthetic executable programs by varying the values for these characteristics. Firstly, with such a framework, one can generate miniaturized synthetic clones for large target (current and futuristic) parallel applications enabling an architect to use them with slow low-level simulation models (e.g., RTL models in VHDL/Verilog) and helps in tailoring designs to the targeted applications. These synthetic benchmark clones can be distributed to architects and designers even if the original applications are intellectual property, when they are not publicly available. Lastly, such a framework can be used to automatically create maximum power consuming code snippets to be able to help in fixing the TDP, heat sinks, cooling system and other power related features of the system. The workload cloning framework built using the proposed synthetic benchmark generation methodology is evaluated to show its superiority over the existing cloning methodologies for single-core systems by generating miniaturized clones for CPU2006 and ImplantBench workloads with only an average error of 2.9% in performance for up to five orders of magnitude of simulation speedup. The correlation coefficient predicting the sensitivity to design changes is 0.95 and 0.98 for performance and power consumption. The proposed framework is evaluated by cloning parallel applications implemented based on p-threads and OpenMP in the PARSEC benchmark suite. The average error in predicting performance is 4.87% and that of power consumption is 2.73%. The correlation coefficient predicting the sensitivity to design changes is 0.92 for performance. The efficacy of the proposed synthetic benchmark generation framework for power virus generation is evaluation on SPARC, Alpha and x86 ISAs using full system simulators and also using real hardware. The results show that the power viruses generated for single-core systems consume 14-41% more power compared to MPrime on SPARC ISA. Similarly, the power viruses generated for multicore systems consume 45-98%, 40-89% and 41-56% more power than PARSEC workloads, running multiple copies of MPrime and multithreaded SPECjbb respectively.Electrical and Computer Engineerin

Texas ScholarWorks

Speeding-up model-based fault injection of deep-submicron CMOS fault models through dynamic and partially reconfigurable FPGAS

Author: Andrés Martínez David de
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 07/05/2008
Field of study

Actualmente, las tecnologías CMOS submicrónicas son básicas para el desarrollo de los modernos sistemas basados en computadores, cuyo uso simplifica enormemente nuestra vida diaria en una gran variedad de entornos, como el gobierno, comercio y banca electrónicos, y el transporte terrestre y aeroespacial. La continua reducción del tamaño de los transistores ha permitido reducir su consumo y aumentar su frecuencia de funcionamiento, obteniendo por ello un mayor rendimiento global. Sin embargo, estas mismas características que mejoran el rendimiento del sistema, afectan negativamente a su confiabilidad. El uso de transistores de tamaño reducido, bajo consumo y alta velocidad, está incrementando la diversidad de fallos que pueden afectar al sistema y su probabilidad de aparición. Por lo tanto, existe un gran interés en desarrollar nuevas y eficientes técnicas para evaluar la confiabilidad, en presencia de fallos, de sistemas fabricados mediante tecnologías submicrónicas. Este problema puede abordarse por medio de la introducción deliberada de fallos en el sistema, técnica conocida como inyección de fallos. En este contexto, la inyección basada en modelos resulta muy interesante, ya que permite evaluar la confiabilidad del sistema en las primeras etapas de su ciclo de desarrollo, reduciendo por tanto el coste asociado a la corrección de errores. Sin embargo, el tiempo de simulación de modelos grandes y complejos imposibilita su aplicación en un gran número de ocasiones. Esta tesis se centra en el uso de dispositivos lógicos programables de tipo FPGA (Field-Programmable Gate Arrays) para acelerar los experimentos de inyección de fallos basados en simulación por medio de su implementación en hardware reconfigurable. Para ello, se extiende la investigación existente en inyección de fallos basada en FPGA en dos direcciones distintas: i) se realiza un estudio de las tecnologías submicrónicas existentes para obtener un conjunto representativo de modelos de fallos transitoriosAndrés Martínez, DD. (2007). Speeding-up model-based fault injection of deep-submicron CMOS fault models through dynamic and partially reconfigurable FPGAS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1943Palanci

Crossref

RiuNet

Answer Summarization for Technical Queries: Benchmark and New Approach

Author: Chengran Yang
Han DongGyun
He Junda
Lo David
Shi Jieke
Shi Yucen
Thung Ferdian
Xu Bowen
Yang Zhou
Zhang Ting
Zhou Xin
Publication venue
Publication date: 22/09/2022
Field of study

Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. There is a need for a benchmark with ground truth summaries to complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites. To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for technical queries for SQA sites. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvement. Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module, 2) Centrality Estimation module, and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%-14.90%, 32.75%-36.59%, and 12.61%-17.54%, in terms of ROUGE-1, ROUGE-2, and ROUGE-L on automatic evaluation, and 5.79%-9.23% and 17.03%-17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that the automatic evaluation of our benchmark can uncover findings similar to the ones found through user studies. More importantly, automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot.Comment: Accepted by ASE 202

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Royal Holloway - Pure

Answer Summarization for Technical Queries:Benchmark and New Approach

Author: Han Donggyun
He Junda
Lo David
Shi Jieke
Shi Yucen
Thung Ferdian
Xu Bowen
Yang Chengran
Yang Zhou
Zhang Ting
Zhou Xin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/01/2023
Field of study

Royal Holloway - Pure