5,967 research outputs found
Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation
This thesis proposes a recalibration of Machine Learning for social biases to minimize harms from existing approaches and practices in the field. Prioritizing quality over quantity, accuracy over efficiency, representativeness over convenience, and situated thinking over universal thinking, the thesis demonstrates an alternative approach to creating Machine Learning models. Drawing on GLAM, the Humanities, the Social Sciences, and Design, the thesis focuses on understanding and communicating biases in a specific use case. 11,888 metadata descriptions from the University of Edinburgh Heritage Collections' Archives catalog were manually annotated for gender biases and text classification models were then trained on the resulting dataset of 55,260 annotations. Evaluations of the models' performance demonstrates that annotating gender biases can be automated; however, the subjectivity of bias as a concept complicates the generalizability of any one approach.
The contributions are: (1) an interdisciplinary and participatory Bias-Aware Methodology, (2) a Taxonomy of Gendered and Gender Biased Language, (3) data annotated for gender biased language, (4) gender biased text classification models, and (5) a human-centered approach to model evaluation. The contributions have implications for Machine Learning, demonstrating how bias is inherent to all data and models; more specifically for Natural Language Processing, providing an annotation taxonomy, annotated datasets and classification models for analyzing gender biased language at scale; for the Gallery, Library, Archives, and Museum sector, offering guidance to institutions seeking to reconcile with histories of marginalizing communities through their documentation practices; and for historians, who utilize cultural heritage documentation to study and interpret the past. Through a real-world application of the Bias-Aware Methodology in a case study, the thesis illustrates the need to shift away from removing social biases and towards acknowledging them, creating data and models that surface the uncertainty and multiplicity characteristic of human societies
On the real world practice of Behaviour Driven Development
Surveys of industry practice over the last decade suggest that Behaviour Driven Development is a popular Agile practice. For example, 19% of respondents to the 14th State of Agile annual survey reported using BDD, placing it in the top 13 practices reported. As well as potential benefits, the adoption of BDD necessarily involves an additional cost of writing and maintaining Gherkin features and scenarios, and (if used for acceptance testing,) the associated step functions. Yet there is a lack of published literature exploring how BDD is used in practice and the challenges experienced by real world software development efforts. This gap is significant because without understanding current real world practice, it is hard to identify opportunities to address and mitigate challenges. In order to address this research gap concerning the challenges of using BDD, this thesis reports on a research project which explored: (a) the challenges of applying agile and undertaking requirements engineering in a real world context; (b) the challenges of applying BDD specifically and (c) the application of BDD in open-source projects to understand challenges in this different context.
For this purpose, we progressively conducted two case studies, two series of interviews, four iterations of action research, and an empirical study. The first case study was conducted in an avionics company to discover the challenges of using an agile process in a large scale safety critical project environment. Since requirements management was found to be one of the biggest challenges during the case study, we decided to investigate BDD because of its reputation for requirements management. The second case study was conducted in the company with an aim to discover the challenges of using BDD in real life. The case study was complemented with an empirical study of the practice of BDD in open source projects, taking a study sample from the GitHub open source collaboration site.
As a result of this Ph.D research, we were able to discover: (i) challenges of using an agile process in a large scale safety-critical organisation, (ii) current state of BDD in practice, (iii) technical limitations of Gherkin (i.e., the language for writing requirements in BDD), (iv) challenges of using BDD in a real project, (v) bad smells in the Gherkin specifications of open source projects on GitHub. We also presented a brief comparison between the theoretical description of BDD and BDD in practice. This research, therefore, presents the results of lessons learned from BDD in practice, and serves as a guide for software practitioners planning on using BDD in their projects
Configuration Management of Distributed Systems over Unreliable and Hostile Networks
Economic incentives of large criminal profits and the threat of legal consequences have pushed criminals to continuously improve their malware, especially command and control channels. This thesis applied concepts from successful malware command and control to explore the survivability and resilience of benign configuration management systems.
This work expands on existing stage models of malware life cycle to contribute a new model for identifying malware concepts applicable to benign configuration management. The Hidden Master architecture is a contribution to master-agent network communication. In the Hidden Master architecture, communication between master and agent is asynchronous and can operate trough intermediate nodes. This protects the master secret key, which gives full control of all computers participating in configuration management. Multiple improvements to idempotent configuration were proposed, including the definition of the minimal base resource dependency model, simplified resource revalidation and the use of imperative general purpose language for defining idempotent configuration.
Following the constructive research approach, the improvements to configuration management were designed into two prototypes. This allowed validation in laboratory testing, in two case studies and in expert interviews. In laboratory testing, the Hidden Master prototype was more resilient than leading configuration management tools in high load and low memory conditions, and against packet loss and corruption. Only the research prototype was adaptable to a network without stable topology due to the asynchronous nature of the Hidden Master architecture.
The main case study used the research prototype in a complex environment to deploy a multi-room, authenticated audiovisual system for a client of an organization deploying the configuration. The case studies indicated that imperative general purpose language can be used for idempotent configuration in real life, for defining new configurations in unexpected situations using the base resources, and abstracting those using standard language features; and that such a system seems easy to learn.
Potential business benefits were identified and evaluated using individual semistructured expert interviews. Respondents agreed that the models and the Hidden Master architecture could reduce costs and risks, improve developer productivity and allow faster time-to-market. Protection of master secret keys and the reduced need for incident response were seen as key drivers for improved security. Low-cost geographic scaling and leveraging file serving capabilities of commodity servers were seen to improve scaling and resiliency. Respondents identified jurisdictional legal limitations to encryption and requirements for cloud operator auditing as factors potentially limiting the full use of some concepts
Pristup specifikaciji i generisanju proizvodnih procesa zasnovan na inženjerstvu vođenom modelima
In this thesis, we present an approach to the production process specification and generation based on the model-driven paradigm, with the goal to increase the flexibility of factories and respond to the challenges that emerged in the era of Industry 4.0 more efficiently. To formally specify production processes and their variations in the Industry 4.0 environment, we created a novel domain-specific modeling language, whose models are machine-readable. The created language can be used to model production processes that can be independent of any production system, enabling process models to be used in different production systems, and process models used for the specific production system. To automatically transform production process models dependent on the specific production system into instructions that are to be executed by production system resources, we created an instruction generator. Also, we created generators for different manufacturing documentation, which automatically transform production process models into manufacturing documents of different types. The proposed approach, domain-specific modeling language, and software solution contribute to introducing factories into the digital transformation process. As factories must rapidly adapt to new products and their variations in the era of Industry 4.0, production must be dynamically led and instructions must be automatically sent to factory resources, depending on products that are to be created on the shop floor. The proposed approach contributes to the creation of such a dynamic environment in contemporary factories, as it allows to automatically generate instructions from process models and send them to resources for execution. Additionally, as there are numerous different products and their variations, keeping the required manufacturing documentation up to date becomes challenging, which can be done automatically by using the proposed approach and thus significantly lower process designers' time.У овој дисертацији представљен је приступ спецификацији и генерисању производних процеса заснован на инжењерству вођеном моделима, у циљу повећања флексибилности постројења у фабрикама и ефикаснијег разрешавања изазова који се појављују у ери Индустрије 4.0. За потребе формалне спецификације производних процеса и њихових варијација у амбијенту Индустрије 4.0, креиран је нови наменски језик, чије моделе рачунар може да обради на аутоматизован начин. Креирани језик има могућност моделовања производних процеса који могу бити независни од производних система и тиме употребљени у различитим постројењима или фабрикама, али и производних процеса који су специфични за одређени систем. Како би моделе производних процеса зависних од конкретног производног система било могуће на аутоматизован начин трансформисати у инструкције које ресурси производног система извршавају, креиран је генератор инструкција. Такође су креирани и генератори техничке документације, који на аутоматизован начин трансформишу моделе производних процеса у документе различитих типова. Употребом предложеног приступа, наменског језика и софтверског решења доприноси се увођењу фабрика у процес дигиталне трансформације. Како фабрике у ери Индустрије 4.0 морају брзо да се прилагоде новим производима и њиховим варијацијама, неопходно је динамички водити производњу и на аутоматизован начин слати инструкције ресурсима у фабрици, у зависности од производа који се креирају у конкретном постројењу. Тиме што је у предложеном приступу могуће из модела процеса аутоматизовано генерисати инструкције и послати их ресурсима, доприноси се креирању једног динамичког окружења у савременим фабрикама. Додатно, услед великог броја различитих производа и њихових варијација, постаје изазовно одржавати неопходну техничку документацију, што је у предложеном приступу могуће урадити на аутоматизован начин и тиме значајно уштедети време пројектаната процеса.U ovoj disertaciji predstavljen je pristup specifikaciji i generisanju proizvodnih procesa zasnovan na inženjerstvu vođenom modelima, u cilju povećanja fleksibilnosti postrojenja u fabrikama i efikasnijeg razrešavanja izazova koji se pojavljuju u eri Industrije 4.0. Za potrebe formalne specifikacije proizvodnih procesa i njihovih varijacija u ambijentu Industrije 4.0, kreiran je novi namenski jezik, čije modele računar može da obradi na automatizovan način. Kreirani jezik ima mogućnost modelovanja proizvodnih procesa koji mogu biti nezavisni od proizvodnih sistema i time upotrebljeni u različitim postrojenjima ili fabrikama, ali i proizvodnih procesa koji su specifični za određeni sistem. Kako bi modele proizvodnih procesa zavisnih od konkretnog proizvodnog sistema bilo moguće na automatizovan način transformisati u instrukcije koje resursi proizvodnog sistema izvršavaju, kreiran je generator instrukcija. Takođe su kreirani i generatori tehničke dokumentacije, koji na automatizovan način transformišu modele proizvodnih procesa u dokumente različitih tipova. Upotrebom predloženog pristupa, namenskog jezika i softverskog rešenja doprinosi se uvođenju fabrika u proces digitalne transformacije. Kako fabrike u eri Industrije 4.0 moraju brzo da se prilagode novim proizvodima i njihovim varijacijama, neophodno je dinamički voditi proizvodnju i na automatizovan način slati instrukcije resursima u fabrici, u zavisnosti od proizvoda koji se kreiraju u konkretnom postrojenju. Time što je u predloženom pristupu moguće iz modela procesa automatizovano generisati instrukcije i poslati ih resursima, doprinosi se kreiranju jednog dinamičkog okruženja u savremenim fabrikama. Dodatno, usled velikog broja različitih proizvoda i njihovih varijacija, postaje izazovno održavati neophodnu tehničku dokumentaciju, što je u predloženom pristupu moguće uraditi na automatizovan način i time značajno uštedeti vreme projektanata procesa
A clinical decision support system for detecting and mitigating potentially inappropriate medications
Background: Medication errors are a leading cause of preventable harm to patients. In older adults, the impact of ageing on the therapeutic effectiveness and safety of drugs is a significant concern, especially for those over 65. Consequently, certain medications called Potentially Inappropriate Medications (PIMs) can be dangerous in the elderly and should be avoided. Tackling PIMs by health professionals and patients can be time-consuming and error-prone, as the criteria underlying the definition of PIMs are complex and subject to frequent updates. Moreover, the criteria are not available in a representation that health systems can interpret and reason with directly.
Objectives: This thesis aims to demonstrate the feasibility of using an ontology/rule-based approach in a clinical knowledge base to identify potentially inappropriate medication(PIM). In addition, how constraint solvers can be used effectively to suggest alternative medications and administration schedules to solve or minimise PIM undesirable side effects.
Methodology: To address these objectives, we propose a novel integrated approach using formal rules to represent the PIMs criteria and inference engines to perform the reasoning presented in the context of a Clinical Decision Support System (CDSS). The approach aims to detect, solve, or minimise undesirable side-effects of PIMs through an ontology (knowledge base) and inference engines incorporating multiple reasoning approaches.
Contributions: The main contribution lies in the framework to formalise PIMs, including the steps required to define guideline requisites to create inference rules to detect and propose alternative drugs to inappropriate medications. No formalisation of the selected guideline (Beers Criteria) can be found in the literature, and hence, this thesis provides a novel ontology for it. Moreover, our process of minimising undesirable side effects offers a novel approach that enhances and optimises the drug rescheduling process, providing a more accurate way to minimise the effect of drug interactions in clinical practice
Mapping the Energy ADE to CityGML 3.0
In order to limit the global warming to well below 2 degrees Celsius, all sectors have to reduce their greenhouse gas emissions and become more sustainable. This also includes the building sector, which is in Europe responsible for 40% of the total energy consumption (European Commission, 2020). A way to work towards this goal is by retrofitting the existing building stock to become more energy efficient. Urban Building Energy Modelling (UBEM) can help in this endeavour by identifying energy-saving potentials and thus to effectively allocate the required resources (Horak et al., 2022). Yet, UBEM involves many stakeholders which is why standards are crucial to facilitate data exchange and interoperability among them. In this context, the Energy ADE v1.0 was developed as an extension for the semantic 3D city model standard CityGML 2.0. It serves two purposes, first by storing energy related information on the individual building level, and second by providing the necessary input data for UBEM simulations (Agugiaro et al., 2018). However, in September 2021 CityGML 3.0 was released. The introduced changes directly affect the structure of the Energy ADE, which is why it cannot fully function on it anymore. This thesis therefore answers the question, how and to what extent the Energy ADE for CityGML 2.0 needs to be adapted to be conformant with the new CityGML 3.0 standard. It is accomplished by following a model-driven approach, where the UML class diagrams for the mapped Energy ADE are created first, before automatically deriving the corresponding XSD schema file. Through the lossless mapping itself, the Energy ADE is integrated as much as possible into CityGML 3.0, while also maintaining a logical symmetry. As such it accounts for the introduced changes of CityGML 3.0, by making use of the space and geometry concept, the versioning possibilities as well as the provided structures to model time-dependent data. The result is eventually tested and verified by converting a sample dataset to the Energy ADE for CityGML 3.0. This work provides an example on how other ADEs can be adapted to fit the new CityGML 3.0 standard and thus hopefully to the further establishment of it.Geomatic
Software Design Change Artifacts Generation through Software Architectural Change Detection and Categorisation
Software is solely designed, implemented, tested, and inspected by expert people, unlike other engineering projects where they are mostly implemented by workers (non-experts) after designing by engineers. Researchers and practitioners have linked software bugs, security holes, problematic integration of changes, complex-to-understand codebase, unwarranted mental pressure, and so on in software development and maintenance to inconsistent and complex design and a lack of ways to easily understand what is going on and what to plan in a software system. The unavailability of proper information and insights needed by the development teams to make good decisions makes these challenges worse. Therefore, software design documents and other insightful information extraction are essential to reduce the above mentioned anomalies. Moreover, architectural design artifacts extraction is required to create the developer’s profile to be available to the market for many crucial scenarios. To that end, architectural change detection, categorization, and change description generation are crucial because they are the primary artifacts to trace other software artifacts.
However, it is not feasible for humans to analyze all the changes for a single release for detecting change and impact because it is time-consuming, laborious, costly, and inconsistent. In this thesis, we conduct six studies considering the mentioned challenges to automate the architectural change information extraction and document generation that could potentially assist the development and maintenance teams. In particular, (1) we detect architectural changes using lightweight techniques leveraging textual and codebase properties, (2) categorize them considering intelligent perspectives, and (3) generate design change documents by exploiting precise contexts of components’ relations and change purposes which were previously unexplored. Our experiment using 4000+ architectural change samples and 200+ design change documents suggests that our proposed approaches are promising in accuracy and scalability to deploy frequently. Our proposed change detection approach can detect up to 100% of the architectural change instances (and is very scalable). On the other hand, our proposed change classifier’s F1 score is 70%, which is promising given the challenges. Finally, our proposed system can produce descriptive design change artifacts with 75% significance. Since most of our studies are foundational, our approaches and prepared datasets can be used as baselines for advancing research in design change information extraction and documentation
Distributed Text Services (DTS): A Community-Built API to Publish and Consume Text Collections as Linked Data
This paper presents the Distributed Text Service (DTS) API Specification, a community-built effort to facilitate the publication and consumption of texts and their structures as Linked Data. DTS was designed to be as generic as possible, providing simple operations for navigating collections, navigating within a text, and retrieving textual content. While the DTS API uses JSON-LD as the serialization format for non-textual data (e.g., descriptive metadata), TEI XML was chosen as the minimum required format for textual data served by the API in order to guarantee the interoperability of data published by DTS-compliant repositories. This paper describes the DTS API specifications by means of real-world examples, discusses the key design choices that were made, and concludes by providing a list of existing repositories and libraries that support DTS
SmartChoices: Augmenting Software with Learned Implementations
We are living in a golden age of machine learning. Powerful models are being
trained to perform many tasks far better than is possible using traditional
software engineering approaches alone. However, developing and deploying those
models in existing software systems remains difficult. In this paper we present
SmartChoices, a novel approach to incorporating machine learning into mature
software stacks easily, safely, and effectively. We explain the overall design
philosophy and present case studies using SmartChoices within large scale
industrial systems
- …