1,753 research outputs found

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    Formalizing, Verifying and Applying ISA Security Guarantees as Universal Contracts

    Full text link
    Progress has recently been made on specifying instruction set architectures (ISAs) in executable formalisms rather than through prose. However, to date, those formal specifications are limited to the functional aspects of the ISA and do not cover its security guarantees. We present a novel, general method for formally specifying an ISAs security guarantees to (1) balance the needs of ISA implementations (hardware) and clients (software), (2) can be semi-automatically verified to hold for the ISA operational semantics, producing a high-assurance mechanically-verifiable proof, and (3) support informal and formal reasoning about security-critical software in the presence of adversarial code. Our method leverages universal contracts: software contracts that express bounds on the authority of arbitrary untrusted code. Universal contracts can be kept agnostic of software abstractions, and strike the right balance between requiring sufficient detail for reasoning about software and preserving implementation freedom of ISA designers and CPU implementers. We semi-automatically verify universal contracts against Sail implementations of ISA semantics using our Katamaran tool; a semi-automatic separation logic verifier for Sail which produces machine-checked proofs for successfully verified contracts. We demonstrate the generality of our method by applying it to two ISAs that offer very different security primitives: (1) MinimalCaps: a custom-built capability machine ISA and (2) a (somewhat simplified) version of RISC-V with PMP. We verify a femtokernel using the security guarantee we have formalized for RISC-V with PMP

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

    Full text link
    The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where probability should not matter. To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven tasks, and we find robust evidence that LLMs are influenced by probability in the ways that we have hypothesized. In many cases, the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability word sequence but only 13% when it is low-probability. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system - one that has been shaped by its own particular set of pressures.Comment: 50 pages plus 11 page of references and 23 pages of appendice

    Entity Linking for the Biomedical Domain

    Get PDF
    Entity linking is the process of detecting mentions of different concepts in text documents and linking them to canonical entities in a target lexicon. However, one of the biggest issues in entity linking is the ambiguity in entity names. The ambiguity is an issue that many text mining tools have yet to address since different names can represent the same thing and every mention could indicate a different thing. For instance, search engines that rely on heuristic string matches frequently return irrelevant results, because they are unable to satisfactorily resolve ambiguity. Thus, resolving named entity ambiguity is a crucial step in entity linking. To solve the problem of ambiguity, this work proposes a heuristic method for entity recognition and entity linking over the biomedical knowledge graph concerning the semantic similarity of entities in the knowledge graph. Named entity recognition (NER), relation extraction (RE), and relationship linking make up a conventional entity linking (EL) system pipeline (RL). We have used the accuracy metric in this thesis. Therefore, for each identified relation or entity, the solution comprises identifying the correct one and matching it to its corresponding unique CUI in the knowledge base. Because KBs contain a substantial number of relations and entities, each with only one natural language label, the second phase is directly dependent on the accuracy of the first. The framework developed in this thesis enables the extraction of relations and entities from the text and their mapping to the associated CUI in the UMLS knowledge base. This approach derives a new representation of the knowledge base that lends it to the easy comparison. Our idea to select the best candidates is to build a graph of relations and determine the shortest path distance using a ranking approach. We test our suggested approach on two well-known benchmarks in the biomedical field and show that our method exceeds the search engine's top result and provides us with around 4% more accuracy. In general, when it comes to fine-tuning, we notice that entity linking contains subjective characteristics and modifications may be required depending on the task at hand. The performance of the framework is evaluated based on a Python implementation

    Contributions and applications around low resource deep learning modeling

    Get PDF
    El aprendizaje profundo representa la vanguardia del aprendizaje automático en multitud de aplicaciones. Muchas de estas tareas requieren una gran cantidad de recursos computacionales, lo que limita su adopción en dispositivos integrados. El objetivo principal de esta tesis es estudiar métodos y algoritmos que permiten abordar problemas utilizando aprendizaje profundo con bajos recursos computacionales. Este trabajo también tiene como objetivo presentar aplicaciones de aprendizaje profundo en la industria. La primera contribución es una nueva función de activación para redes de aprendizaje profundo: la función de módulo. Los experimentos muestran que la función de activación propuesta logra resultados superiores en tareas de visión artificial cuando se compara con las alternativas encontradas en la literatura. La segunda contribución es una nueva estrategia para combinar modelos preentrenados usando destilación de conocimiento. Los resultados de este capítulo muestran que es posible aumentar significativamente la precisión de los modelos preentrenados más pequeños, lo que permite un alto rendimiento a un menor costo computacional. La siguiente contribución de esta tesis aborda el problema de la previsión de ventas en el campo de la logística. Se proponen dos sistemas de extremo a extremo con dos técnicas diferentes de aprendizaje profundo (modelos de secuencia a secuencia y transformadores). Los resultados de este capítulo concluyen que es posible construir sistemas integrales para predecir las ventas de múltiples productos individuales, en múltiples puntos de venta y en diferentes momentos con un único modelo de aprendizaje automático. El modelo propuesto supera las alternativas encontradas en la literatura. Finalmente, las dos últimas contribuciones pertenecen al campo de la tecnología del habla. El primero estudia cómo construir un sistema de reconocimiento de voz Keyword Spotting utilizando una versión eficiente de una red neuronal convolucional. En este estudio, el sistema propuesto es capaz de superar el rendimiento de todos los puntos de referencia encontrados en la literatura cuando se prueba contra las subtareas más complejas. El último estudio propone un modelo independiente de texto a voz de última generación capaz de sintetizar voz inteligible en miles de perfiles de voz, mientras genera un discurso con variaciones de prosodia significativas y expresivas. El enfoque propuesto elimina la dependencia de los modelos anteriores de un sistema de voz adicional, lo que hace que el sistema propuesto sea más eficiente en el tiempo de entrenamiento e inferencia, y permite operaciones fuera de línea y en el dispositivo.Deep learning is the state of the art for several machine learning tasks. Many of these tasks require large amount of computational resources, which limits their adoption in embedded devices. The main goal of this dissertation is to study methods and algorithms that allow to approach problems using deep learning with restricted computational resources. This work also aims at presenting applications of deep learning in industry. The first contribution is a new activation function for deep learning networks: the modulus function. The experiments show that the proposed activation function achieves superior results in computer vision tasks when compared with the alternatives found in the literature. The second contribution is a new strategy to combine pre-trained models using knowledge distillation. The results of this chapter show that it is possible to significantly increase the accuracy of the smallest pre-trained models, allowing high performance at a lower computational cost. The following contribution in this thesis tackles the problem of sales fore- casting in the field of logistics. Two end-to-end systems with two different deep learning techniques (sequence-to-sequence models and transformers) are pro- posed. The results of this chapter conclude that it is possible to build end-to-end systems to predict the sales of multiple individual products, at multiple points of sale and different times with a single machine learning model. The proposed model outperforms the alternatives found in the literature. Finally, the last two contributions belong to the speech technology field. The former, studies how to build a Keyword Spotting speech recognition system using an efficient version of a convolutional neural network. In this study, the proposed system is able to beat the performance of all the benchmarks found in the literature when tested against the most complex subtasks. The latter study proposes a standalone state-of-the-art text-to-speech model capable of synthesizing intelligible voice in thousands of voice profiles, while generating speech with meaningful and expressive prosody variations. The proposed approach removes the dependency of previous models on an additional voice system, which makes the proposed system more efficient at training and inference time, and enables offline and on-device operations
    • …
    corecore