10,234 research outputs found

    The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions

    Full text link
    The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, developing the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution

    Learning disentangled speech representations

    Get PDF
    A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody. The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions. In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks. This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically

    Underwater optical wireless communications in turbulent conditions: from simulation to experimentation

    Get PDF
    Underwater optical wireless communication (UOWC) is a technology that aims to apply high speed optical wireless communication (OWC) techniques to the underwater channel. UOWC has the potential to provide high speed links over relatively short distances as part of a hybrid underwater network, along with radio frequency (RF) and underwater acoustic communications (UAC) technologies. However, there are some difficulties involved in developing a reliable UOWC link, namely, the complexity of the channel. The main focus throughout this thesis is to develop a greater understanding of the effects of the UOWC channel, especially underwater turbulence. This understanding is developed from basic theory through to simulation and experimental studies in order to gain a holistic understanding of turbulence in the UOWC channel. This thesis first presents a method of modelling optical underwater turbulence through simulation that allows it to be examined in conjunction with absorption and scattering. In a stationary channel, this turbulence induced scattering is shown to cause and increase both spatial and temporal spreading at the receiver plane. It is also demonstrated using the technique presented that the relative impact of turbulence on a received signal is lower in a highly scattering channel, showing an in-built resilience of these channels. Received intensity distributions are presented confirming that fluctuations in received power from this method follow the commonly used Log-Normal fading model. The impact of turbulence - as measured using this new modelling framework - on link performance, in terms of maximum achievable data rate and bit error rate is equally investigated. Following that, experimental studies comparing both the relative impact of turbulence induced scattering on coherent and non-coherent light propagating through water and the relative impact of turbulence in different water conditions are presented. It is shown that the scintillation index increases with increasing temperature inhomogeneity in the underwater channel. These results indicate that a light beam from a non-coherent source has a greater resilience to temperature inhomogeneity induced turbulence effect in an underwater channel. These results will help researchers in simulating realistic channel conditions when modelling a light emitting diode (LED) based intensity modulation with direct detection (IM/DD) UOWC link. Finally, a comparison of different modulation schemes in still and turbulent water conditions is presented. Using an underwater channel emulator, it is shown that pulse position modulation (PPM) and subcarrier intensity modulation (SIM) have an inherent resilience to turbulence induced fading with SIM achieving higher data rates under all conditions. The signal processing technique termed pair-wise coding (PWC) is applied to SIM in underwater optical wireless communications for the first time. The performance of PWC is compared with the, state-of-the-art, bit and power loading optimisation algorithm. Using PWC, a maximum data rate of 5.2 Gbps is achieved in still water conditions

    Digital asset management via distributed ledgers

    Get PDF
    Distributed ledgers rose to prominence with the advent of Bitcoin, the first provably secure protocol to solve consensus in an open-participation setting. Following, active research and engineering efforts have proposed a multitude of applications and alternative designs, the most prominent being Proof-of-Stake (PoS). This thesis expands the scope of secure and efficient asset management over a distributed ledger around three axes: i) cryptography; ii) distributed systems; iii) game theory and economics. First, we analyze the security of various wallets. We start with a formal model of hardware wallets, followed by an analytical framework of PoS wallets, each outlining the unique properties of Proof-of-Work (PoW) and PoS respectively. The latter also provides a rigorous design to form collaborative participating entities, called stake pools. We then propose Conclave, a stake pool design which enables a group of parties to participate in a PoS system in a collaborative manner, without a central operator. Second, we focus on efficiency. Decentralized systems are aimed at thousands of users across the globe, so a rigorous design for minimizing memory and storage consumption is a prerequisite for scalability. To that end, we frame ledger maintenance as an optimization problem and design a multi-tier framework for designing wallets which ensure that updates increase the ledger’s global state only to a minimal extent, while preserving the security guarantees outlined in the security analysis. Third, we explore incentive-compatibility and analyze blockchain systems from a micro and a macroeconomic perspective. We enrich our cryptographic and systems' results by analyzing the incentives of collective pools and designing a state efficient Bitcoin fee function. We then analyze the Nash dynamics of distributed ledgers, introducing a formal model that evaluates whether rational, utility-maximizing participants are disincentivized from exhibiting undesirable infractions, and highlighting the differences between PoW and PoS-based ledgers, both in a standalone setting and under external parameters, like market price fluctuations. We conclude by introducing a macroeconomic principle, cryptocurrency egalitarianism, and then describing two mechanisms for enabling taxation in blockchain-based currency systems

    FiabilitĂ© de l’underfill et estimation de la durĂ©e de vie d’assemblages microĂ©lectroniques

    Get PDF
    Abstract : In order to protect the interconnections in flip-chip packages, an underfill material layer is used to fill the volumes and provide mechanical support between the silicon chip and the substrate. Due to the chip corner geometry and the mismatch of coefficient of thermal expansion (CTE), the underfill suffers from a stress concentration at the chip corners when the temperature is lower than the curing temperature. This stress concentration leads to subsequent mechanical failures in flip-chip packages, such as chip-underfill interfacial delamination and underfill cracking. Local stresses and strains are the most important parameters for understanding the mechanism of underfill failures. As a result, the industry currently relies on the finite element method (FEM) to calculate the stress components, but the FEM may not be accurate enough compared to the actual stresses in underfill. FEM simulations require a careful consideration of important geometrical details and material properties. This thesis proposes a modeling approach that can accurately estimate the underfill delamination areas and crack trajectories, with the following three objectives. The first objective was to develop an experimental technique capable of measuring underfill deformations around the chip corner region. This technique combined confocal microscopy and the digital image correlation (DIC) method to enable tri-dimensional strain measurements at different temperatures, and was named the confocal-DIC technique. This techique was first validated by a theoretical analysis on thermal strains. In a test component similar to a flip-chip package, the strain distribution obtained by the FEM model was in good agreement with the results measured by the confocal-DIC technique, with relative errors less than 20% at chip corners. Then, the second objective was to measure the strain near a crack in underfills. Artificial cracks with lengths of 160 ÎŒm and 640 ÎŒm were fabricated from the chip corner along the 45° diagonal direction. The confocal-DIC-measured maximum hoop strains and first principal strains were located at the crack front area for both the 160 ÎŒm and 640 ÎŒm cracks. A crack model was developed using the extended finite element method (XFEM), and the strain distribution in the simulation had the same trend as the experimental results. The distribution of hoop strains were in good agreement with the measured values, when the model element size was smaller than 22 ÎŒm to capture the strong strain gradient near the crack tip. The third objective was to propose a modeling approach for underfill delamination and cracking with the effects of manufacturing variables. A deep thermal cycling test was performed on 13 test cells to obtain the reference chip-underfill delamination areas and crack profiles. An artificial neural network (ANN) was trained to relate the effects of manufacturing variables and the number of cycles to first delamination of each cell. The predicted numbers of cycles for all 6 cells in the test dataset were located in the intervals of experimental observations. The growth of delamination was carried out on FEM by evaluating the strain energy amplitude at the interface elements between the chip and underfill. For 5 out of 6 cells in validation, the delamination growth model was consistent with the experimental observations. The cracks in bulk underfill were modelled by XFEM without predefined paths. The directions of edge cracks were in good agreement with the experimental observations, with an error of less than 2.5°. This approach met the goal of the thesis of estimating the underfill initial delamination, areas of delamination and crack paths in actual industrial flip-chip assemblies.Afin de protĂ©ger les interconnexions dans les assemblages, une couche de matĂ©riau d’underfill est utilisĂ©e pour remplir le volume et fournir un support mĂ©canique entre la puce de silicium et le substrat. En raison de la gĂ©omĂ©trie du coin de puce et de l’écart du coefficient de dilatation thermique (CTE), l’underfill souffre d’une concentration de contraintes dans les coins lorsque la tempĂ©rature est infĂ©rieure Ă  la tempĂ©rature de cuisson. Cette concentration de contraintes conduit Ă  des dĂ©faillances mĂ©caniques dans les encapsulations de flip-chip, telles que la dĂ©lamination interfaciale puce-underfill et la fissuration d’underfill. Les contraintes et dĂ©formations locales sont les paramĂštres les plus importants pour comprendre le mĂ©canisme des ruptures de l’underfill. En consĂ©quent, l’industrie utilise actuellement la mĂ©thode des Ă©lĂ©ments finis (EF) pour calculer les composantes de la contrainte, qui ne sont pas assez prĂ©cises par rapport aux contraintes actuelles dans l’underfill. Ces simulations nĂ©cessitent un examen minutieux de dĂ©tails gĂ©omĂ©triques importants et des propriĂ©tĂ©s des matĂ©riaux. Cette thĂšse vise Ă  proposer une approche de modĂ©lisation permettant d’estimer avec prĂ©cision les zones de dĂ©lamination et les trajectoires des fissures dans l’underfill, avec les trois objectifs suivants. Le premier objectif est de mettre au point une technique expĂ©rimentale capable de mesurer la dĂ©formation de l’underfill dans la rĂ©gion du coin de puce. Cette technique, combine la microscopie confocale et la mĂ©thode de corrĂ©lation des images numĂ©riques (DIC) pour permettre des mesures tridimensionnelles des dĂ©formations Ă  diffĂ©rentes tempĂ©ratures, et a Ă©tĂ© nommĂ©e le technique confocale-DIC. Cette technique a d’abord Ă©tĂ© validĂ©e par une analyse thĂ©orique en dĂ©formation thermique. Dans un Ă©chantillon similaire Ă  un flip-chip, la distribution de la dĂ©formation obtenues par le modĂšle EF Ă©tait en bon accord avec les rĂ©sultats de la technique confocal-DIC, avec des erreurs relatives infĂ©rieures Ă  20% au coin de puce. Ensuite, le second objectif est de mesurer la dĂ©formation autour d’une fissure dans l’underfill. Des fissures artificielles d’une longueuer de 160 ÎŒm et 640 ÎŒm ont Ă©tĂ© fabriquĂ©es dans l’underfill vers la direction diagonale de 45°. Les dĂ©formations circonfĂ©rentielles maximales et principale maximale Ă©taient situĂ©es aux pointes des fissures correspondantes. Un modĂšle de fissure a Ă©tĂ© dĂ©veloppĂ© en utilisant la mĂ©thode des Ă©lĂ©ments finis Ă©tendue (XFEM), et la distribution des contraintes dans la simuation a montrĂ© la mĂȘme tendance que les rĂ©sultats expĂ©rimentaux. La distribution des dĂ©formations circonfĂ©rentielles maximales Ă©tait en bon accord avec les valeurs mesurĂ©es lorsque la taille des Ă©lĂ©ments Ă©tait plus petite que 22 ÎŒm, assez petit pour capturer le grand gradient de dĂ©formation prĂšs de la pointe de fissure. Le troisiĂšme objectif Ă©tait d’apporter une approche de modĂ©lisation de la dĂ©lamination et de la fissuration de l’underfill avec les effets des variables de fabrication. Un test de cyclage thermique a d’abord Ă©tĂ© effectuĂ© sur 13 cellules pour obtenir les zones dĂ©laminĂ©es entre la puce et l’underfill, et les profils de fissures dans l’underfill, comme rĂ©fĂ©rence. Un rĂ©seau neuronal artificiel (ANN) a Ă©tĂ© formĂ© pour Ă©tablir une liaison entre les effets des variables de fabrication et le nombre de cycles Ă  la dĂ©lamination pour chaque cellule. Les nombres de cycles prĂ©dits pour les 6 cellules de l’ensemble de test Ă©taient situĂ©s dans les intervalles d’observations expĂ©rimentaux. La croissance de la dĂ©lamination a Ă©tĂ© rĂ©alisĂ©e par l’EF en Ă©valuant l’énergie de la dĂ©formation au niveau des Ă©lĂ©ments interfaciaux entre la puce et l’underfill. Pour 5 des 6 cellules de la validation, le modĂšle de croissance du dĂ©laminage Ă©tait conforme aux observations expĂ©rimentales. Les fissures dans l’underfill ont Ă©tĂ© modĂ©lisĂ©es par XFEM sans chemins prĂ©dĂ©finis. Les directions des fissures de bord Ă©taient en bon accord avec les observations expĂ©rimentales, avec une erreur infĂ©rieure Ă  2,5°. Cette approche a rĂ©pondu Ă  la problĂ©matique qui consiste Ă  estimer l’initiation des dĂ©lamination, les zones de dĂ©lamination et les trajectoires de fissures dans l’underfill pour des flip-chips industriels

    Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

    Get PDF
    Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.This work has been partially supported by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231). AE was supported by BAGEP 2021 Award of the Science Academy. EE was supported in part by TUBA GEBIP 2018 Award. BP is in in part funded by Independent Research Fund Denmark (DFF) grant 9063-00077B. IC has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 838188. EL is partly funded by Generalitat Valenciana and the Spanish Government throught projects PROMETEU/2018/089 and RTI2018-094649-B-I00, respectively. SMI is partly funded by UNIRI project uniri-drustv-18-20. GB is partly supported by the Ministry of Innovation and the National Research, Development and Innovation Office within the framework of the Hungarian Artificial Intelligence National Laboratory Programme. COT is partially funded by the Romanian Ministry of European Investments and Projects through the Competitiveness Operational Program (POC) project “HOLOTRAIN” (grant no. 29/221 ap2/07.04.2020, SMIS code: 129077) and by the German Academic Exchange Service (DAAD) through the project “AWAKEN: content-Aware and netWork-Aware faKE News mitigation” (grant no. 91809005). ESA is partially funded by the German Academic Exchange Service (DAAD) through the project “Deep-Learning Anomaly Detection for Human and Automated Users Behavior” (grant no. 91809358)

    Machine learning for managing structured and semi-structured data

    Get PDF
    As the digitalization of private, commercial, and public sectors advances rapidly, an increasing amount of data is becoming available. In order to gain insights or knowledge from these enormous amounts of raw data, a deep analysis is essential. The immense volume requires highly automated processes with minimal manual interaction. In recent years, machine learning methods have taken on a central role in this task. In addition to the individual data points, their interrelationships often play a decisive role, e.g. whether two patients are related to each other or whether they are treated by the same physician. Hence, relational learning is an important branch of research, which studies how to harness this explicitly available structural information between different data points. Recently, graph neural networks have gained importance. These can be considered an extension of convolutional neural networks from regular grids to general (irregular) graphs. Knowledge graphs play an essential role in representing facts about entities in a machine-readable way. While great efforts are made to store as many facts as possible in these graphs, they often remain incomplete, i.e., true facts are missing. Manual verification and expansion of the graphs is becoming increasingly difficult due to the large volume of data and must therefore be assisted or substituted by automated procedures which predict missing facts. The field of knowledge graph completion can be roughly divided into two categories: Link Prediction and Entity Alignment. In Link Prediction, machine learning models are trained to predict unknown facts between entities based on the known facts. Entity Alignment aims at identifying shared entities between graphs in order to link several such knowledge graphs based on some provided seed alignment pairs. In this thesis, we present important advances in the field of knowledge graph completion. For Entity Alignment, we show how to reduce the number of required seed alignments while maintaining performance by novel active learning techniques. We also discuss the power of textual features and show that graph-neural-network-based methods have difficulties with noisy alignment data. For Link Prediction, we demonstrate how to improve the prediction for unknown entities at training time by exploiting additional metadata on individual statements, often available in modern graphs. Supported with results from a large-scale experimental study, we present an analysis of the effect of individual components of machine learning models, e.g., the interaction function or loss criterion, on the task of link prediction. We also introduce a software library that simplifies the implementation and study of such components and makes them accessible to a wide research community, ranging from relational learning researchers to applied fields, such as life sciences. Finally, we propose a novel metric for evaluating ranking results, as used for both completion tasks. It allows for easier interpretation and comparison, especially in cases with different numbers of ranking candidates, as encountered in the de-facto standard evaluation protocols for both tasks.Mit der rasant fortschreitenden Digitalisierung des privaten, kommerziellen und öffentlichen Sektors werden immer grĂ¶ĂŸere Datenmengen verfĂŒgbar. Um aus diesen enormen Mengen an Rohdaten Erkenntnisse oder Wissen zu gewinnen, ist eine tiefgehende Analyse unerlĂ€sslich. Das immense Volumen erfordert hochautomatisierte Prozesse mit minimaler manueller Interaktion. In den letzten Jahren haben Methoden des maschinellen Lernens eine zentrale Rolle bei dieser Aufgabe eingenommen. Neben den einzelnen Datenpunkten spielen oft auch deren ZusammenhĂ€nge eine entscheidende Rolle, z.B. ob zwei Patienten miteinander verwandt sind oder ob sie vom selben Arzt behandelt werden. Daher ist das relationale Lernen ein wichtiger Forschungszweig, der untersucht, wie diese explizit verfĂŒgbaren strukturellen Informationen zwischen verschiedenen Datenpunkten nutzbar gemacht werden können. In letzter Zeit haben Graph Neural Networks an Bedeutung gewonnen. Diese können als eine Erweiterung von CNNs von regelmĂ€ĂŸigen Gittern auf allgemeine (unregelmĂ€ĂŸige) Graphen betrachtet werden. Wissensgraphen spielen eine wesentliche Rolle bei der Darstellung von Fakten ĂŒber EntitĂ€ten in maschinenlesbaren Form. Obwohl große Anstrengungen unternommen werden, so viele Fakten wie möglich in diesen Graphen zu speichern, bleiben sie oft unvollstĂ€ndig, d. h. es fehlen Fakten. Die manuelle ÜberprĂŒfung und Erweiterung der Graphen wird aufgrund der großen Datenmengen immer schwieriger und muss daher durch automatisierte Verfahren unterstĂŒtzt oder ersetzt werden, die fehlende Fakten vorhersagen. Das Gebiet der WissensgraphenvervollstĂ€ndigung lĂ€sst sich grob in zwei Kategorien einteilen: Link Prediction und Entity Alignment. Bei der Link Prediction werden maschinelle Lernmodelle trainiert, um unbekannte Fakten zwischen EntitĂ€ten auf der Grundlage der bekannten Fakten vorherzusagen. Entity Alignment zielt darauf ab, gemeinsame EntitĂ€ten zwischen Graphen zu identifizieren, um mehrere solcher Wissensgraphen auf der Grundlage einiger vorgegebener Paare zu verknĂŒpfen. In dieser Arbeit stellen wir wichtige Fortschritte auf dem Gebiet der VervollstĂ€ndigung von Wissensgraphen vor. FĂŒr das Entity Alignment zeigen wir, wie die Anzahl der benötigten Paare reduziert werden kann, wĂ€hrend die Leistung durch neuartige aktive Lerntechniken erhalten bleibt. Wir erörtern auch die LeistungsfĂ€higkeit von Textmerkmalen und zeigen, dass auf Graph-Neural-Networks basierende Methoden Schwierigkeiten mit verrauschten Paar-Daten haben. FĂŒr die Link Prediction demonstrieren wir, wie die Vorhersage fĂŒr unbekannte EntitĂ€ten zur Trainingszeit verbessert werden kann, indem zusĂ€tzliche Metadaten zu einzelnen Aussagen genutzt werden, die oft in modernen Graphen verfĂŒgbar sind. GestĂŒtzt auf Ergebnisse einer groß angelegten experimentellen Studie prĂ€sentieren wir eine Analyse der Auswirkungen einzelner Komponenten von Modellen des maschinellen Lernens, z. B. der Interaktionsfunktion oder des Verlustkriteriums, auf die Aufgabe der Link Prediction. Außerdem stellen wir eine Softwarebibliothek vor, die die Implementierung und Untersuchung solcher Komponenten vereinfacht und sie einer breiten Forschungsgemeinschaft zugĂ€nglich macht, die von Forschern im Bereich des relationalen Lernens bis hin zu angewandten Bereichen wie den Biowissenschaften reicht. Schließlich schlagen wir eine neuartige Metrik fĂŒr die Bewertung von Ranking-Ergebnissen vor, wie sie fĂŒr beide Aufgaben verwendet wird. Sie ermöglicht eine einfachere Interpretation und einen leichteren Vergleich, insbesondere in FĂ€llen mit einer unterschiedlichen Anzahl von Kandidaten, wie sie in den de-facto Standardbewertungsprotokollen fĂŒr beide Aufgaben vorkommen

    Graphical scaffolding for the learning of data wrangling APIs

    Get PDF
    In order for students across the sciences to avail themselves of modern data streams, they must first know how to wrangle data: how to reshape ill-organised, tabular data into another format, and how to do this programmatically, in languages such as Python and R. Despite the cross-departmental demand and the ubiquity of data wrangling in analytical workflows, the research on how to optimise the instruction of it has been minimal. Although data wrangling as a programming domain presents distinctive challenges - characterised by on-the-fly syntax lookup and code example integration - it also presents opportunities. One such opportunity is how tabular data structures are easily visualised. To leverage the inherent visualisability of data wrangling, this dissertation evaluates three types of graphics that could be employed as scaffolding for novices: subgoal graphics, thumbnail graphics, and parameter graphics. Using a specially built e-learning platform, this dissertation documents a multi-institutional, randomised, and controlled experiment that investigates the pedagogical effects of these. Our results indicate that the graphics are well-received, that subgoal graphics boost the completion rate, and that thumbnail graphics improve navigability within a command menu. We also obtained several non-significant results, and indications that parameter graphics are counter-productive. We will discuss these findings in the context of general scaffolding dilemmas, and how they fit into a wider research programme on data wrangling instruction
    • 

    corecore