1,073 research outputs found

    Doing Research. Wissenschaftspraktiken zwischen Positionierung und Suchanfrage

    Get PDF
    Forschung wird zunehmend aus Sicht ihrer Ergebnisse gedacht - nicht zuletzt aufgrund der Umwälzungen im System Wissensschaft. Der Band lenkt den Fokus jedoch auf diejenigen Prozesse, die Forschungsergebnisse erst ermöglichen und Wissenschaft konturieren. Dabei ist der Titel Doing Research als Verweis darauf zu verstehen, dass forschendes Handeln von spezifischen Positionierungen, partiellen Perspektiven und Suchbewegungen geformt ist. So knüpfen alle Beitragenden auf reflexive Weise an ihre jeweiligen Forschungspraktiken an. Ausgangspunkt sind Abkürzungen - die vermeintlich kleinsten Einheiten wissenschaftlicher Aushandlung und Verständigung. Der in den Erziehungs-, Sozial-, Medien- und Kunstwissenschaften verankerte Band zeichnet ein vieldimensionales Bild gegenwärtigen Forschens mit transdisziplinären Anknüpfungspunkten zwischen Digitalität und Bildung. (DIPF/Orig.

    Evaluation Methodologies in Software Protection Research

    Full text link
    Man-at-the-end (MATE) attackers have full control over the system on which the attacked software runs, and try to break the confidentiality or integrity of assets embedded in the software. Both companies and malware authors want to prevent such attacks. This has driven an arms race between attackers and defenders, resulting in a plethora of different protection and analysis methods. However, it remains difficult to measure the strength of protections because MATE attackers can reach their goals in many different ways and a universally accepted evaluation methodology does not exist. This survey systematically reviews the evaluation methodologies of papers on obfuscation, a major class of protections against MATE attacks. For 572 papers, we collected 113 aspects of their evaluation methodologies, ranging from sample set types and sizes, over sample treatment, to performed measurements. We provide detailed insights into how the academic state of the art evaluates both the protections and analyses thereon. In summary, there is a clear need for better evaluation methodologies. We identify nine challenges for software protection evaluations, which represent threats to the validity, reproducibility, and interpretation of research results in the context of MATE attacks

    Reliable Fix Patterns Inferred from Static Checkers for Automated Program Repair

    Get PDF
    Fix pattern-based patch generation is a promising direction in automated program repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this article, we propose investigating the possibility in an APR scenario of leveraging fix patterns inferred from code changes that address violations detected by static analysis tools. To that end, we build a fix pattern-based APR tool, Avatar, which exploits fix patterns of static analysis violations as ingredients for the patch generation of repairing semantic bugs. Evaluated on four benchmarks (i.e., Defects4J, Bugs.jar, BEARS, and QuixBugs), Avatar presents the potential feasibility of fixing semantic bugs with the fix patterns inferred from the patches for fixing static analysis violations and can correctly fix 26 semantic bugs when Avatar is implemented with the normal program repair pipeline. We also find that Avatar achieves performance metrics that are comparable to that of the closely related approaches in the literature. Compared with CoCoNut, Avatar can fix 18 new bugs in Defects4J and 3 new bugs in QuixBugs. When compared with HDRepair, JAID, and SketchFix, Avatar can newly fix 14 Defects4J bugs. In terms of the number of correctly fixed bugs, Avatar is also comparable to the program repair tools with the normal fault localization setting and presents better performance than most program repair tools. These results imply that Avatar is complementary to current program repair approaches. We further uncover that Avatar can present different bug-fixing performances when it is configured with different fault localization tools, and the stack trace information from the failed executions of test cases can be exploited to improve the bug-fixing performance of Avatar by fixing more bugs with fewer generated patch candidates. Overall, our study highlights the relevance of static bug-finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases (i.e., dynamic information)

    Exploring annotations for deductive verification

    Get PDF

    Towards an Understanding of Large Language Models in Software Engineering Tasks

    Full text link
    Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in tasks such as text generation and reasoning. Derivative products, like ChatGPT, have been extensively deployed and highly sought after. Meanwhile, the evaluation and optimization of LLMs in software engineering tasks, such as code generation, have become a research focus. However, there is still a lack of systematic research on the application and evaluation of LLMs in the field of software engineering. Therefore, this paper is the first to comprehensively investigate and collate the research and products combining LLMs with software engineering, aiming to answer two questions: (1) What are the current integrations of LLMs with software engineering? (2) Can LLMs effectively handle software engineering tasks? To find the answers, we have collected related literature as extensively as possible from seven mainstream databases, and selected 123 papers for analysis. We have categorized these papers in detail and reviewed the current research status of LLMs from the perspective of seven major software engineering tasks, hoping this will help researchers better grasp the research trends and address the issues when applying LLMs. Meanwhile, we have also organized and presented papers with evaluation content to reveal the performance and effectiveness of LLMs in various software engineering tasks, providing guidance for researchers and developers to optimize

    Towards Human-centered Explainable AI: A Survey of User Studies for Model Explanations

    Full text link
    Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding AI research. A better understanding of the needs of XAI users, as well as human-centered evaluations of explainable models are both a necessity and a challenge. In this paper, we explore how HCI and AI researchers conduct user studies in XAI applications based on a systematic literature review. After identifying and thoroughly analyzing 97core papers with human-based XAI evaluations over the past five years, we categorize them along the measured characteristics of explanatory methods, namely trust, understanding, usability, and human-AI collaboration performance. Our research shows that XAI is spreading more rapidly in certain application domains, such as recommender systems than in others, but that user evaluations are still rather sparse and incorporate hardly any insights from cognitive or social sciences. Based on a comprehensive discussion of best practices, i.e., common models, design choices, and measures in user studies, we propose practical guidelines on designing and conducting user studies for XAI researchers and practitioners. Lastly, this survey also highlights several open research directions, particularly linking psychological science and human-centered XAI

    Farming out : a study.

    Get PDF
    Farming is one of severals ways of arranging for a group of individuals to perform work simultaneously. Farming is attractive. It is a simple concept, and yet it allocates work dynamically, balancing the load automatically. This gives rise to potentially great efficiency; yet the range of applications that can be farmed efficiently and which implementation strategies are the most effective has not been classified. This research has investigated the types of application, design and implementation that farm efficiently on computer systems constructed from a network of communicating parallel processors. This research shows that all applications can be farmed and identifies those concerns that dictate efficiency. For the first generation of transputer hardware, extensive experiments have been performed using Occam, independent of any specific application. This study identified the boundary conditions that dictate which design parameters farm efficiently. These boundary conditions are expressed in a general form that is directly amenable to other architectures. The specific quantitative results are of direct use to others who wish to implement farms on this architecture. Because of farming’s simplicity and potential for high efficiency, this work concludes that architects of parallel hardware should consider binding this paradigm into future systems so as to enable the dynamic allocation of processes to processors to take place automatically. As well as resulting in high levels of machine utilisation for all programs, this would also permanently remove the burden of allocation from the programmer

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine
    corecore