7,267 research outputs found

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Contract-Based Design of Dataflow Programs

    Get PDF
    Quality and correctness are becoming increasingly important aspects of software development, as our reliance on software systems in everyday life continues to increase. Highly complex software systems are today found in critical appliances such as medical equipment, cars, and telecommunication infrastructure. Failures in these kinds of systems may have disastrous consequences. At the same time, modern computer platforms are increasingly concurrent, as the computational capacity of modern CPUs is improved mainly by increasing the number of processor cores. Computer platforms are also becoming increasingly parallel, distributed and heterogeneous, often involving special processing units, such as graphics processing units (GPU) or digital signal processors (DSP) for performing specific tasks more efficiently than possible on general-purpose CPUs. These modern platforms allow implementing increasingly complex functionality in software. Cost efficient development of software that efficiently exploits the power of this type of platforms and at the same time ensures correctness is, however, a challenging task. Dataflow programming has become popular in development of safetycritical software in many domains in the embedded community. For instance, in the automotive domain, the dataflow language Simulink has become widely used in model-based design of control software. However, for more complex functionality, this model of computation may not be expressive enough. In the signal processing domain, more expressive, dynamic models of computation have attracted much attention. These models of computation have, however, not gained as significant uptake in safety-critical domains due to a great extent to that it is challenging to provide guarantees regarding e.g. timing or determinism under these more expressive models of computation. Contract-based design has become widespread to specify and verify correctness properties of software components. A contract consists of assumptions (preconditions) regarding the input data and guarantees (postconditions) regarding the output data. By verifying a component with respect to its contract, it is ensured that the output fulfils the guarantees, assuming that the input fulfils the assumptions. While contract-based verification of traditional object-oriented programs has been researched extensively, verification of asynchronous dataflow programs has not been researched to the same extent. In this thesis, a contract-based design framework tailored specifically to dataflow programs is proposed. The proposed framework supports both an extensive subset of the discrete-time Simulink synchronous language, as well as a more general, asynchronous and dynamic, dataflow language. The proposed contract-based verification techniques are automatic, only guided by user-provided invariants, and based on encoding dataflow programs in existing, mature verification tools for sequential programs, such as the Boogie guarded command language and its associated verifier. It is shown how dataflow programs, with components implemented in an expressive programming language with support for matrix computations, can be efficiently encoded in such a verifier. Furthermore, it is also shown that contract-based design can be used to improve runtime performance of dataflow programs by allowing more scheduling decisions to be made at compile-time. All the proposed techniques have been implemented in prototype tools and evaluated on a large number of different programs. Based on the evaluation, the methods were proven to work in practice and to scale to real-world programs.Kvalitet och korrekthet blir idag allt viktigare aspekter inom mjukvaruutveckling, dÄ vi i allt högre grad förlitar oss pÄ mjukvarusystem i vÄra vardagliga sysslor. Mycket komplicerade mjukvarusystem finns idag i kritiska tillÀmpningar sÄ som medicinsk utrustning, bilar och infrastruktur för telekommunikation. Fel som uppstÄr i de hÀr typerna av system kan ha katastrofala följder. Samtidigt utvecklas kapaciteten hos moderna datorplattformar idag frÀmst genom att öka antalet processorkÀrnor. DÀrtill blir datorplattformar allt mer parallella, distribuerade och heterogena, och innefattar ofta specialla processorer sÄ som grafikprocessorer (GPU) eller signalprocessorer (DSP) för att utföra specifika berÀkningar snabbare Àn vad som Àr möjligt pÄ vanliga processorer. Den hÀr typen av plattformar möjligör implementering av allt mer komplicerade berÀkningar i mjukvara. Kostnadseffektiv utveckling av mjukvara som effektivt utnyttjar kapaciteten i den hÀr typen av plattformar och samtidigt sÀkerstÀller korrekthet Àr emellertid en mycket utmanande uppgift. Dataflödesprogrammering har blivit ett populÀrt sÀtt att utveckla mjukvara inom flera omrÄden som innefattar sÀkerhetskritiska inbyggda datorsystem. Till exempel inom fordonsindustrin har dataflödessprÄket Simulink kommit att anvÀndas i bred utstrÀckning för modellbaserad design av kontrollsystem. För mer komplicerad funktionalitet kan dock den hÀr modellen för berÀkning vara för begrÀnsad betrÀffande vad som kan beksrivas. Inom signalbehandling har mera expressiva och dynamiska modeller för berÀkning attraherat stort intresse. De hÀr modellerna för berÀkning har ÀndÄ inte tagits i bruk i samma utstrÀckning inom sÀkerhetskritiska tillÀmpningar. Det hÀr beror till en stor del pÄ att det Àr betydligt svÄrare att garantera egenskaper gÀllande till exempel timing och determinism under sÄdana hÀr modeller för berÀkning. Kontraktbaserad design har blivit ett vanligt sÀtt att specifiera och verifiera korrekthetsegenskaper hos mjukvarukomponeneter. Ett kontrakt bestÄr av antaganden (förvillkor) gÀllande indata och garantier (eftervillkor) gÀllande utdata. Genom att verifiera en komponent gentemot sitt konktrakt kan man bevisa att utdatan uppfyller garantierna, givet att indatan uppfyller antagandena. Trots att kontraktbaserad verifiering i sig Àr ett mycket beforskat omrÄde, sÄ har inte verifiering av asynkrona dataflödesprogram beforskats i samma utstrÀckning. I den hÀr avhandlingen presenteras ett ramverk för kontraktbaserad design skrÀddarsytt för dataflödesprogram. Det föreslagna ramverket stödjer sÄ vÀl en stor del av det synkrona sprÄket. Simulink med diskret tid som ett mera generellt asynkront och dynamiskt dataflödessprÄk. De föreslagna kontraktbaserade verifieringsteknikerna Àr automatiska. Utöver kontraktets för- och eftervillkor ger anvÀndaren endast de invarianter som krÀvs för att möjliggöra verifieringen. Verifieringsteknikerna grundar sig pÄ att omkoda dataflödesprogram till input för existerande och beprövade verifieringsverktyg för sekventiella program sÄ som Boogie. Avhandlingen visar hur dataflödesprogram implementerade i ett expressivt programmeringssprÄk med inbyggt stöd för matrisoperationer effektivt kan omkodas till input för ett verifieringsverktyg som Boogie. Utöver detta visar avhandlingen ocksÄ att kontraktbaserad design ocksÄ kan förbÀttra prestandan hos dataflödesprogram i körningsskedet genom att möjliggöra flera schemalÀggningsbeslut redan i kompileringsskedet. Alla tekniker som presenteras i avhandlingen har implementerats i prototypverktyg och utvÀrderats pÄ en stor mÀngd olika program. UtvÀrderingen bevisar att teknikerna fungerar i praktiken och Àr tillrÀckligt skalbara för att ocksÄ fungera pÄ program av realistisk storlek

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    TANDEM: taming failures in next-generation datacenters with emerging memory

    Get PDF
    The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions. Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice. Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for implementing recovery, complicating correctness and programmability. Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges. When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability). This thesis aims to address these challenges through the following approaches: (a) defining precise consistency models that formally specify correct end-to-end semantics in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery. We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop- ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism, lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance. We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes). Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services. Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating any intervention from the programmers

    Configuration Management of Distributed Systems over Unreliable and Hostile Networks

    Get PDF
    Economic incentives of large criminal profits and the threat of legal consequences have pushed criminals to continuously improve their malware, especially command and control channels. This thesis applied concepts from successful malware command and control to explore the survivability and resilience of benign configuration management systems. This work expands on existing stage models of malware life cycle to contribute a new model for identifying malware concepts applicable to benign configuration management. The Hidden Master architecture is a contribution to master-agent network communication. In the Hidden Master architecture, communication between master and agent is asynchronous and can operate trough intermediate nodes. This protects the master secret key, which gives full control of all computers participating in configuration management. Multiple improvements to idempotent configuration were proposed, including the definition of the minimal base resource dependency model, simplified resource revalidation and the use of imperative general purpose language for defining idempotent configuration. Following the constructive research approach, the improvements to configuration management were designed into two prototypes. This allowed validation in laboratory testing, in two case studies and in expert interviews. In laboratory testing, the Hidden Master prototype was more resilient than leading configuration management tools in high load and low memory conditions, and against packet loss and corruption. Only the research prototype was adaptable to a network without stable topology due to the asynchronous nature of the Hidden Master architecture. The main case study used the research prototype in a complex environment to deploy a multi-room, authenticated audiovisual system for a client of an organization deploying the configuration. The case studies indicated that imperative general purpose language can be used for idempotent configuration in real life, for defining new configurations in unexpected situations using the base resources, and abstracting those using standard language features; and that such a system seems easy to learn. Potential business benefits were identified and evaluated using individual semistructured expert interviews. Respondents agreed that the models and the Hidden Master architecture could reduce costs and risks, improve developer productivity and allow faster time-to-market. Protection of master secret keys and the reduced need for incident response were seen as key drivers for improved security. Low-cost geographic scaling and leveraging file serving capabilities of commodity servers were seen to improve scaling and resiliency. Respondents identified jurisdictional legal limitations to encryption and requirements for cloud operator auditing as factors potentially limiting the full use of some concepts

    UMSL Bulletin 2023-2024

    Get PDF
    The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    UMSL Bulletin 2022-2023

    Get PDF
    The 2022-2023 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1087/thumbnail.jp

    Mapping the Focal Points of WordPress: A Software and Critical Code Analysis

    Get PDF
    Programming languages or code can be examined through numerous analytical lenses. This project is a critical analysis of WordPress, a prevalent web content management system, applying four modes of inquiry. The project draws on theoretical perspectives and areas of study in media, software, platforms, code, language, and power structures. The applied research is based on Critical Code Studies, an interdisciplinary field of study that holds the potential as a theoretical lens and methodological toolkit to understand computational code beyond its function. The project begins with a critical code analysis of WordPress, examining its origins and source code and mapping selected vulnerabilities. An examination of the influence of digital and computational thinking follows this. The work also explores the intersection of code patching and vulnerability management and how code shapes our sense of control, trust, and empathy, ultimately arguing that a rhetorical-cultural lens can be used to better understand code\u27s controlling influence. Recurring themes throughout these analyses and observations are the connections to power and vulnerability in WordPress\u27 code and how cultural, processual, rhetorical, and ethical implications can be expressed through its code, creating a particular worldview. Code\u27s emergent properties help illustrate how human values and practices (e.g., empathy, aesthetics, language, and trust) become encoded in software design and how people perceive the software through its worldview. These connected analyses reveal cultural, processual, and vulnerability focal points and the influence these entanglements have concerning WordPress as code, software, and platform. WordPress is a complex sociotechnical platform worthy of further study, as is the interdisciplinary merging of theoretical perspectives and disciplines to critically examine code. Ultimately, this project helps further enrich the field by introducing focal points in code, examining sociocultural phenomena within the code, and offering techniques to apply critical code methods
    • 

    corecore