35 research outputs found
Automated tailoring of system software stacks
In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries.
However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications.
In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems.
Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent.
These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time
Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey
The automated code evaluation system (AES) is mainly designed to reliably
assess user-submitted code. Due to their extensive range of applications and
the accumulation of valuable resources, AESs are becoming increasingly popular.
Research on the application of AES and their real-world resource exploration
for diverse coding tasks is still lacking. In this study, we conducted a
comprehensive survey on AESs and their resources. This survey explores the
application areas of AESs, available resources, and resource utilization for
coding tasks. AESs are categorized into programming contests, programming
learning and education, recruitment, online compilers, and additional modules,
depending on their application. We explore the available datasets and other
resources of these systems for research, analysis, and coding tasks. Moreover,
we provide an overview of machine learning-driven coding tasks, such as bug
detection, code review, comprehension, refactoring, search, representation, and
repair. These tasks are performed using real-life datasets. In addition, we
briefly discuss the Aizu Online Judge platform as a real example of an AES from
the perspectives of system design (hardware and software), operation
(competition and education), and research. This is due to the scalability of
the AOJ platform (programming education, competitions, and practice), open
internal features (hardware and software), attention from the research
community, open source data (e.g., solution codes and submission documents),
and transparency. We also analyze the overall performance of this system and
the perceived challenges over the years
Model-driven engineering techniques and tools for machine learning-enabled IoT applications: A scoping review
This paper reviews the literature on model-driven engineering (MDE) tools and languages for the internet of things (IoT). Due to the abundance of big data in the IoT, data analytics and machine learning (DAML) techniques play a key role in providing smart IoT applications. In particular, since a significant portion of the IoT data is sequential time series data, such as sensor data, time series analysis techniques are required. Therefore, IoT modeling languages and tools are expected to support DAML methods, including time series analysis techniques, out of the box. In this paper, we study and classify prior work in the literature through the mentioned lens and following the scoping review approach. Hence, the key underlying research questions are what MDE approaches, tools, and languages have been proposed and which ones have supported DAML techniques at the modeling level and in the scope of smart IoT services.info:eu-repo/semantics/publishedVersio
Unmanned Aerial Vehicle (UAV)-Enabled Wireless Communications and Networking
The emerging massive density of human-held and machine-type nodes implies larger traffic deviatiolns in the future than we are facing today. In the future, the network will be characterized by a high degree of flexibility, allowing it to adapt smoothly, autonomously, and efficiently to the quickly changing traffic demands both in time and space. This flexibility cannot be achieved when the network’s infrastructure remains static. To this end, the topic of UAVs (unmanned aerial vehicles) have enabled wireless communications, and networking has received increased attention. As mentioned above, the network must serve a massive density of nodes that can be either human-held (user devices) or machine-type nodes (sensors). If we wish to properly serve these nodes and optimize their data, a proper wireless connection is fundamental. This can be achieved by using UAV-enabled communication and networks. This Special Issue addresses the many existing issues that still exist to allow UAV-enabled wireless communications and networking to be properly rolled out
Streamlining code smells: Using collective intelligence and visualization
Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code.
Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue.
Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms.
Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells.
Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest).
Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validityContexto. Os cheiros de código são a principal causa de dívida técnica (technical debt), como tal, devem ser detectados e removidos. Os cheiros de código já foram há muito tempo catalogados juntamente com as correspondentes soluções mitigadoras chamadas operações de refabricação (refactoring). No entanto, embora estas últimas sejam suportadas nas IDEs
actuais (por exemplo, Eclipse), a deteção de cheiros de código têm ainda muitas limitações. Os investigadores argumentam que a subjectividade do processo de deteção de cheiros de código é um dos principais obstáculo à mitigação do problema da qualidade do código.
Objectivo. Esta tese apresenta uma nova abordagem à detecção de cheiros de código, a que chamámos CrowdSmelling, e os resultados de uma experiência de validação para esta abordagem. A nossa abordagem de CrowdSmelling baseia-se em técnicas de aprendizagem automática supervisionada, onde a sabedoria da multidão (dos programadores de software) é
utilizada para calibrar colectivamente algoritmos de detecção de cheiros de código, diminuindo assim a questão da subjectividade.
Método. Em três anos consecutivos, no âmbito da Unidade Curricular de Engenharia de Software, uma "multidão", num total de cerca de uma centena de equipas, com uma média de três membros cada, classificou a presença de 3 cheiros de código (Long Method, God Class, and Feature Envy) em código fonte Java. Estas classificações foram a base dos oráculos utilizados para o treino de seis algoritmos de aprendizagem automática. Mais de cem modelos foram gerados e avaliados para determinar quais os algoritmos de aprendizagem de máquinas com melhor desempenho na detecção de cada um dos cheiros de código acima mencionados.
Resultados. Foram obtidos bons desempenhos na detecção do God Class (ROC=0,896 para Naive Bayes) e na detecção do Long Method (ROC=0,870 para AdaBoostM1), mas muito mais baixos para Feature Envy (ROC=0,570 para Random Forrest).
Conclusões. Os resultados obtidos sugerem que o Crowdsmelling é uma abordagem viável para a detecção de cheiros de código, mas são necessárias mais experiências de validação para cobrir mais cheiros de código e para aumentar a validade externa
A Decade of Code Comment Quality Assessment: A Systematic Literature Review
Code comments are important artifacts in software systems and play a
paramount role in many software engineering (SE) tasks related to maintenance
and program comprehension. However, while it is widely accepted that high
quality matters in code comments just as it matters in source code, assessing
comment quality in practice is still an open problem. First and foremost, there
is no unique definition of quality when it comes to evaluating code comments.
The few existing studies on this topic rather focus on specific attributes of
quality that can be easily quantified and measured. Existing techniques and
corresponding tools may also focus on comments bound to a specific programming
language, and may only deal with comments with specific scopes and clear goals
(e.g., Javadoc comments at the method level, or in-body comments describing
TODOs to be addressed). In this paper, we present a Systematic Literature
Review (SLR) of the last decade of research in SE to answer the following
research questions: (i) What types of comments do researchers focus on when
assessing comment quality? (ii) What quality attributes (QAs) do they consider?
(iii) Which tools and techniques do they use to assess comment quality?, and
(iv) How do they evaluate their studies on comment quality assessment in
general? Our evaluation, based on the analysis of 2353 papers and the actual
review of 47 relevant ones, shows that (i) most studies and techniques focus on
comments in Java code, thus may not be generalizable to other languages, and
(ii) the analyzed studies focus on four main QAs of a total of 21 QAs
identified in the literature, with a clear predominance of checking consistency
between comments and the code. We observe that researchers rely on manual
assessment and specific heuristics rather than the automated assessment of the
comment quality attributes
Integrating passive ubiquitous surfaces into human-computer interaction
Mobile technologies enable people to interact with computers ubiquitously. This dissertation investigates how ordinary, ubiquitous surfaces can be integrated into human-computer interaction to extend the interaction space beyond the edge of the display. It turns out that acoustic and tactile features generated during an interaction can be combined to identify input events, the user, and the surface. In addition, it is shown that a heterogeneous distribution of different surfaces is particularly suitable for realizing versatile interaction modalities. However, privacy concerns must be considered when selecting sensors, and context can be crucial in determining whether and what interaction to perform.Mobile Technologien ermöglichen den Menschen eine allgegenwärtige Interaktion mit Computern. Diese Dissertation untersucht, wie gewöhnliche, allgegenwärtige Oberflächen in die Mensch-Computer-Interaktion integriert werden können, um den Interaktionsraum über den Rand des Displays hinaus zu erweitern. Es stellt sich heraus, dass akustische und taktile Merkmale, die während einer Interaktion erzeugt werden, kombiniert werden können, um Eingabeereignisse, den Benutzer und die Oberfläche zu identifizieren. Darüber hinaus wird gezeigt, dass eine heterogene Verteilung verschiedener Oberflächen besonders geeignet ist, um vielfältige Interaktionsmodalitäten zu realisieren. Bei der Auswahl der Sensoren müssen jedoch Datenschutzaspekte berücksichtigt werden, und der Kontext kann entscheidend dafür sein, ob und welche Interaktion durchgeführt werden soll
Designing behavior change support systems in the context of knowledge documentation: development of theory and practical implementation
Although innovation and operating efficiently require creating, transferring, and applying knowledge, successful knowledge documentation remains a challenge for organizations. While knowledge management systems support knowledge management activities, the missing link to applying knowledge management relies on human actions and their behaviors.
This dissertation extends prior design knowledge about designing Behavior Change Support Systems in the context of knowledge documentation by developing theory and showing practical implementation. Combining technical and psychological models within information systems frameworks based on the principles of abstraction, originality, justification, and benefit, this dissertation draws on design science to propose prescriptive knowledge, for example, in the form of design principles and a specific artifact
Assessing Comment Quality in Object-Oriented Languages
Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments.
Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality.
To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists.
Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques