1,021 research outputs found

    Software Design Change Artifacts Generation through Software Architectural Change Detection and Categorisation

    Get PDF
    Software is solely designed, implemented, tested, and inspected by expert people, unlike other engineering projects where they are mostly implemented by workers (non-experts) after designing by engineers. Researchers and practitioners have linked software bugs, security holes, problematic integration of changes, complex-to-understand codebase, unwarranted mental pressure, and so on in software development and maintenance to inconsistent and complex design and a lack of ways to easily understand what is going on and what to plan in a software system. The unavailability of proper information and insights needed by the development teams to make good decisions makes these challenges worse. Therefore, software design documents and other insightful information extraction are essential to reduce the above mentioned anomalies. Moreover, architectural design artifacts extraction is required to create the developer’s profile to be available to the market for many crucial scenarios. To that end, architectural change detection, categorization, and change description generation are crucial because they are the primary artifacts to trace other software artifacts. However, it is not feasible for humans to analyze all the changes for a single release for detecting change and impact because it is time-consuming, laborious, costly, and inconsistent. In this thesis, we conduct six studies considering the mentioned challenges to automate the architectural change information extraction and document generation that could potentially assist the development and maintenance teams. In particular, (1) we detect architectural changes using lightweight techniques leveraging textual and codebase properties, (2) categorize them considering intelligent perspectives, and (3) generate design change documents by exploiting precise contexts of components’ relations and change purposes which were previously unexplored. Our experiment using 4000+ architectural change samples and 200+ design change documents suggests that our proposed approaches are promising in accuracy and scalability to deploy frequently. Our proposed change detection approach can detect up to 100% of the architectural change instances (and is very scalable). On the other hand, our proposed change classifier’s F1 score is 70%, which is promising given the challenges. Finally, our proposed system can produce descriptive design change artifacts with 75% significance. Since most of our studies are foundational, our approaches and prepared datasets can be used as baselines for advancing research in design change information extraction and documentation

    Detecting emotions using a combination of bidirectional encoder representations from transformers embedding and bidirectional long short-term memory

    Get PDF
    One of the most difficult topics in natural language understanding (NLU) is emotion detection in text because human emotions are difficult to understand without knowing facial expressions. Because the structure of Indonesian differs from other languages, this study focuses on emotion detection in Indonesian text. The nine experimental scenarios of this study incorporate word embedding (bidirectional encoder representations from transformers (BERT), Word2Vec, and GloVe) and emotion detection models (bidirectional long short-term memory (BiLSTM), LSTM, and convolutional neural network (CNN)). With values of 88.28%, 88.42%, and 89.20% for Commuter Line, Transjakarta, and Commuter Line+Transjakarta, respectively, BERT-BiLSTM generates the highest accuracy on the data. In general, BiLSTM produces the highest accuracy, followed by LSTM, and finally CNN. When it came to word embedding, BERT embedding outperformed Word2Vec and GloVe. In addition, the BERT-BiLSTM model generates the highest precision, recall, and F1-measure values in each data scenario when compared to other models. According to the results of this study, BERT-BiLSTM can enhance the performance of the classification model when compared to previous studies that only used BERT or BiLSTM for emotion detection in Indonesian texts

    Method versatility in analysing human attitudes towards technology

    Get PDF
    Various research domains are facing new challenges brought about by growing volumes of data. To make optimal use of them, and to increase the reproducibility of research findings, method versatility is required. Method versatility is the ability to flexibly apply widely varying data analytic methods depending on the study goal and the dataset characteristics. Method versatility is an essential characteristic of data science, but in other areas of research, such as educational science or psychology, its importance is yet to be fully accepted. Versatile methods can enrich the repertoire of specialists who validate psychometric instruments, conduct data analysis of large-scale educational surveys, and communicate their findings to the academic community, which corresponds to three stages of the research cycle: measurement, research per se, and communication. In this thesis, studies related to these stages have a common theme of human attitudes towards technology, as this topic becomes vitally important in our age of ever-increasing digitization. The thesis is based on four studies, in which method versatility is introduced in four different ways: the consecutive use of methods, the toolbox choice, the simultaneous use, and the range extension. In the first study, different methods of psychometric analysis are used consecutively to reassess psychometric properties of a recently developed scale measuring affinity for technology interaction. In the second, the random forest algorithm and hierarchical linear modeling, as tools from machine learning and statistical toolboxes, are applied to data analysis of a large-scale educational survey related to students’ attitudes to information and communication technology. In the third, the challenge of selecting the number of clusters in model-based clustering is addressed by the simultaneous use of model fit, cluster separation, and the stability of partition criteria, so that generalizable separable clusters can be selected in the data related to teachers’ attitudes towards technology. The fourth reports the development and evaluation of a scholarly knowledge graph-powered dashboard aimed at extending the range of scholarly communication means. The findings of the thesis can be helpful for increasing method versatility in various research areas. They can also facilitate methodological advancement of academic training in data analysis and aid further development of scholarly communication in accordance with open science principles.Verschiedene Forschungsbereiche müssen sich durch steigende Datenmengen neuen Herausforderungen stellen. Der Umgang damit erfordert – auch in Hinblick auf die Reproduzierbarkeit von Forschungsergebnissen – Methodenvielfalt. Methodenvielfalt ist die Fähigkeit umfangreiche Analysemethoden unter Berücksichtigung von angestrebten Studienzielen und gegebenen Eigenschaften der Datensätze flexible anzuwenden. Methodenvielfalt ist ein essentieller Bestandteil der Datenwissenschaft, der aber in seinem Umfang in verschiedenen Forschungsbereichen wie z. B. den Bildungswissenschaften oder der Psychologie noch nicht erfasst wird. Methodenvielfalt erweitert die Fachkenntnisse von Wissenschaftlern, die psychometrische Instrumente validieren, Datenanalysen von groß angelegten Umfragen im Bildungsbereich durchführen und ihre Ergebnisse im akademischen Kontext präsentieren. Das entspricht den drei Phasen eines Forschungszyklus: Messung, Forschung per se und Kommunikation. In dieser Doktorarbeit werden Studien, die sich auf diese Phasen konzentrieren, durch das gemeinsame Thema der Einstellung zu Technologien verbunden. Dieses Thema ist im Zeitalter zunehmender Digitalisierung von entscheidender Bedeutung. Die Doktorarbeit basiert auf vier Studien, die Methodenvielfalt auf vier verschiedenen Arten vorstellt: die konsekutive Anwendung von Methoden, die Toolbox-Auswahl, die simultane Anwendung von Methoden sowie die Erweiterung der Bandbreite. In der ersten Studie werden verschiedene psychometrische Analysemethoden konsekutiv angewandt, um die psychometrischen Eigenschaften einer entwickelten Skala zur Messung der Affinität von Interaktion mit Technologien zu überprüfen. In der zweiten Studie werden der Random-Forest-Algorithmus und die hierarchische lineare Modellierung als Methoden des Machine Learnings und der Statistik zur Datenanalyse einer groß angelegten Umfrage über die Einstellung von Schülern zur Informations- und Kommunikationstechnologie herangezogen. In der dritten Studie wird die Auswahl der Anzahl von Clustern im modellbasierten Clustering bei gleichzeitiger Verwendung von Kriterien für die Modellanpassung, der Clustertrennung und der Stabilität beleuchtet, so dass generalisierbare trennbare Cluster in den Daten zu den Einstellungen von Lehrern zu Technologien ausgewählt werden können. Die vierte Studie berichtet über die Entwicklung und Evaluierung eines wissenschaftlichen wissensgraphbasierten Dashboards, das die Bandbreite wissenschaftlicher Kommunikationsmittel erweitert. Die Ergebnisse der Doktorarbeit tragen dazu bei, die Anwendung von vielfältigen Methoden in verschiedenen Forschungsbereichen zu erhöhen. Außerdem fördern sie die methodische Ausbildung in der Datenanalyse und unterstützen die Weiterentwicklung der wissenschaftlichen Kommunikation im Rahmen von Open Science

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    Performance Analysis Of Data-Driven Algorithms In Detecting Intrusions On Smart Grid

    Get PDF
    The traditional power grid is no longer a practical solution for power delivery due to several shortcomings, including chronic blackouts, energy storage issues, high cost of assets, and high carbon emissions. Therefore, there is a serious need for better, cheaper, and cleaner power grid technology that addresses the limitations of traditional power grids. A smart grid is a holistic solution to these issues that consists of a variety of operations and energy measures. This technology can deliver energy to end-users through a two-way flow of communication. It is expected to generate reliable, efficient, and clean power by integrating multiple technologies. It promises reliability, improved functionality, and economical means of power transmission and distribution. This technology also decreases greenhouse emissions by transferring clean, affordable, and efficient energy to users. Smart grid provides several benefits, such as increasing grid resilience, self-healing, and improving system performance. Despite these benefits, this network has been the target of a number of cyber-attacks that violate the availability, integrity, confidentiality, and accountability of the network. For instance, in 2021, a cyber-attack targeted a U.S. power system that shut down the power grid, leaving approximately 100,000 people without power. Another threat on U.S. Smart Grids happened in March 2018 which targeted multiple nuclear power plants and water equipment. These instances represent the obvious reasons why a high level of security approaches is needed in Smart Grids to detect and mitigate sophisticated cyber-attacks. For this purpose, the US National Electric Sector Cybersecurity Organization and the Department of Energy have joined their efforts with other federal agencies, including the Cybersecurity for Energy Delivery Systems and the Federal Energy Regulatory Commission, to investigate the security risks of smart grid networks. Their investigation shows that smart grid requires reliable solutions to defend and prevent cyber-attacks and vulnerability issues. This investigation also shows that with the emerging technologies, including 5G and 6G, smart grid may become more vulnerable to multistage cyber-attacks. A number of studies have been done to identify, detect, and investigate the vulnerabilities of smart grid networks. However, the existing techniques have fundamental limitations, such as low detection rates, high rates of false positives, high rates of misdetection, data poisoning, data quality and processing, lack of scalability, and issues regarding handling huge volumes of data. Therefore, these techniques cannot ensure safe, efficient, and dependable communication for smart grid networks. Therefore, the goal of this dissertation is to investigate the efficiency of machine learning in detecting cyber-attacks on smart grids. The proposed methods are based on supervised, unsupervised machine and deep learning, reinforcement learning, and online learning models. These models have to be trained, tested, and validated, using a reliable dataset. In this dissertation, CICDDoS 2019 was used to train, test, and validate the efficiency of the proposed models. The results show that, for supervised machine learning models, the ensemble models outperform other traditional models. Among the deep learning models, densely neural network family provides satisfactory results for detecting and classifying intrusions on smart grid. Among unsupervised models, variational auto-encoder, provides the highest performance compared to the other unsupervised models. In reinforcement learning, the proposed Capsule Q-learning provides higher detection and lower misdetection rates, compared to the other model in literature. In online learning, the Online Sequential Euclidean Distance Routing Capsule Network model provides significantly better results in detecting intrusion attacks on smart grid, compared to the other deep online models

    Analysis of student evaluation of teaching surveys: assessing evidence of implicit bias using numerical and text data

    Full text link
    In the higher education sector, student evaluations of teaching heavily influence the hiring and promotion decision of lecturers. However many studies have found there to be a discrepancy between the ratings that male and female lecturers receive, along with those from minority cultural backgrounds. With the discrepancy unexplained by common factors included in the models, these differences have been attributed to an implicit bias which students may hold towards their lecturers. Using a large set of evaluations completed over several years at a leading Australian university, we use Bayesian statistical methods combined with natural language processing techniques to rigorously investigate the following questions - Is there evidence of implicit bias (gender or cultural) within student evaluation of teaching surveys? What can the students' comments inform us about how the discrepancy in ratings may arise? Can we mitigate these biases to reduce any unintended effects? The studies in this dissertation show that the gender and cultural characteristics of a lecturer do influence how the students rate and comment on their lecturers, and these characteristics also influence how students respond to an intervention message giving us an insight into how these implicit biases may arise. A clear implication from this research is the need to ensure these surveys are effectively and fairly rating lecturers, and administrators need to account for these factors when using evaluations to assess a lecturer’s performance

    Insights on Learning Tractable Probabilistic Graphical Models

    Get PDF

    Aspect-Based Sentiment Analysis using Machine Learning and Deep Learning Approaches

    Get PDF
    Sentiment analysis (SA) is also known as opinion mining, it is the process of gathering and analyzing people's opinions about a particular service, good, or company on websites like Twitter, Facebook, Instagram, LinkedIn, and blogs, among other places. This article covers a thorough analysis of SA and its levels. This manuscript's main focus is on aspect-based SA, which helps manufacturing organizations make better decisions by examining consumers' viewpoints and opinions of their products. The many approaches and methods used in aspect-based sentiment analysis are covered in this review study (ABSA). The features associated with the aspects were manually drawn out in traditional methods, which made it a time-consuming and error-prone operation. Nevertheless, these restrictions may be overcome as artificial intelligence develops. Therefore, to increase the effectiveness of ABSA, researchers are increasingly using AI-based machine learning (ML) and deep learning (DL) techniques. Additionally, certain recently released ABSA approaches based on ML and DL are examined, contrasted, and based on this research, gaps in both methodologies are discovered. At the conclusion of this study, the difficulties that current ABSA models encounter are also emphasized, along with suggestions that can be made to improve the efficacy and precision of ABSA systems
    • …
    corecore