491 research outputs found

    A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness

    Full text link
    People increasingly use videos on the Web as a source for learning. To support this way of learning, researchers and developers are continuously developing tools, proposing guidelines, analyzing data, and conducting experiments. However, it is still not clear what characteristics a video should have to be an effective learning medium. In this paper, we present a comprehensive review of 257 articles on video-based learning for the period from 2016 to 2021. One of the aims of the review is to identify the video characteristics that have been explored by previous work. Based on our analysis, we suggest a taxonomy which organizes the video characteristics and contextual aspects into eight categories: (1) audio features, (2) visual features, (3) textual features, (4) instructor behavior, (5) learners activities, (6) interactive features (quizzes, etc.), (7) production style, and (8) instructional design. Also, we identify four representative research directions: (1) proposals of tools to support video-based learning, (2) studies with controlled experiments, (3) data analysis studies, and (4) proposals of design guidelines for learning videos. We find that the most explored characteristics are textual features followed by visual features, learner activities, and interactive features. Text of transcripts, video frames, and images (figures and illustrations) are most frequently used by tools that support learning through videos. The learner activity is heavily explored through log files in data analysis studies, and interactive features have been frequently scrutinized in controlled experiments. We complement our review by contrasting research findings that investigate the impact of video characteristics on the learning effectiveness, report on tasks and technologies used to develop tools that support learning, and summarize trends of design guidelines to produce learning video

    User-oriented recommender systems in retail

    Get PDF
    User satisfaction is considered a key objective for all service provider platforms, regardless of the nature of the service, encompassing domains such as media, entertainment, retail, and information. While the goal of satisfying users is the same across different domains and services, considering domain-specific characteristics is of paramount importance to ensure users have a positive experience with a given system. User interaction data with a system is one of the main sources of data that facilitates achieving this goal. In this thesis, we investigate how to learn from domain-specific user interactions. We focus on recommendation as our main task, and retail as our main domain. We further explore the finance domain and the demand forecasting task as additional directions to understand whether our methodology and findings generalize to other tasks and domains. The research in this thesis is organized around the following dimensions: 1) Characteristics of multi-channel retail: we consider a retail setting where interaction data comes from both digital (i.e., online) and in-store (i.e., offline) shopping; 2) From user behavior to recommendation: we conduct extensive descriptive studies on user interaction log datasets that inform the design of recommender systems in two domains, retail and finance. Our key contributions in characterizing multi-channel retail are two-fold. First, we propose a neural model that makes use of sales in multiple shopping channels in order to improve the performance of demand forecasting in a target channel. Second, we provide the first study of user behavior in a multi-channel retail setting, which results in insights about the channel-specific properties of user behavior, and their effects on the performance of recommender systems. We make three main contributions in designing user-oriented recommender systems. First, we provide a large-scale user behavior study in the finance domain, targeted at understanding financial information seeking behavior in user interactions with company filings. We then propose domain-specific user-oriented filing recommender systems that are informed by the findings of the user behavior analysis. Second, we analyze repurchasing behavior in retail, specifically in the grocery shopping domain. We then propose a repeat consumption-aware neural recommender for this domain. Third, we focus on scalable recommendation in retail and propose an efficient recommender system that explicitly models users' personal preferences that are reflected in their purchasing history

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    Détection des écarts de tendance et analyse prédictive pour le traitement des flux d’événements en temps réel

    Get PDF
    Les systèmes d’information produisent différents types de journaux d’événements. Les données historiques contenues dans les journaux d’événements peuvent révéler des informations importantes sur l’exécution d’un processus métier. Le volume croissant de ces données collectées, pour être utile, doit être traité afin d’extraire des informations pertinentes. Dans de nombreuses situations, il peut être souhaitable de rechercher des tendances dans ces journaux. En particulier, les tendances calculées par le traitement et l’analyse de la séquence d’événements générés par plusieurs instances du même processus servent de base pour produire des prévisions sur les exécutions actuelles du processus. L’objectif de cette thèse est de proposer un cadre générique pour l’analyse des tendances sur ces flux d’événement, en temps réel. En premier lieu, nous montrons comment des tendances de différents types peuvent être calculées sur des journaux d’événements en temps réel, à l’aide d’un cadre générique appelé workflow de distance de tendance. De multiples calculs courants sur les flux d’événements s’avèrent être des cas particuliers de ce flux de travail, selon la façon dont différents paramètres de flux de travail sont définis. La suite naturelle de l’analyse statique des tendances est l’usage des algorithmes d’apprentissage. Nous joignons alors les concepts de traitement de flux d’événements et d’apprentissage automatique pour créer un cadre qui permet le calcul de différents types de prédictions sur les journaux d’événements. Le cadre proposé est générique : en fournissant différentes définitions à une poignée de fonctions d’événement, plusieurs types de prédictions différents peuvent être calculés à l’aide du même flux de travail de base. Les deux approches ont été mises en oeuvre et évaluées expérimentalement en étendant un moteur de traitement de flux d’événements existant, appelé BeepBeep. Les résultats expérimentaux montrent que les écarts par rapport à une tendance de référence peuvent être détectés en temps réel pour des flux produisant jusqu’à des milliers d’événements par seconde

    Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities

    Full text link
    Decision-focused learning (DFL) is an emerging paradigm in machine learning which trains a model to optimize decisions, integrating prediction and optimization in an end-to-end system. This paradigm holds the promise to revolutionize decision-making in many real-world applications which operate under uncertainty, where the estimation of unknown parameters within these decision models often becomes a substantial roadblock. This paper presents a comprehensive review of DFL. It provides an in-depth analysis of the various techniques devised to integrate machine learning and optimization models, introduces a taxonomy of DFL methods distinguished by their unique characteristics, and conducts an extensive empirical evaluation of these methods proposing suitable benchmark dataset and tasks for DFL. Finally, the study provides valuable insights into current and potential future avenues in DFL research.Comment: Experimental Survey and Benchmarkin

    Personalized Search Via Neural Contextual Semantic Relevance Ranking

    Full text link
    Existing neural relevance models do not give enough consideration for query and item context information which diversifies the search results to adapt for personal preference. To bridge this gap, this paper presents a neural learning framework to personalize document ranking results by leveraging the signals to capture how the document fits into users' context. In particular, it models the relationships between document content and user query context using both lexical representations and semantic embeddings such that the user's intent can be better understood by data enrichment of personalized query context information. Extensive experiments performed on the search dataset, demonstrate the effectiveness of the proposed method.Comment: Contextual, Personalization, Search, Semantics, LLM, embeddin

    AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges

    Full text link
    Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes, particularly in cloud infrastructures, to provide actionable insights with the primary goal of maximizing availability. There are a wide variety of problems to address, and multiple use-cases, where AI capabilities can be leveraged to enhance operational efficiency. Here we provide a review of the AIOps vision, trends challenges and opportunities, specifically focusing on the underlying AI techniques. We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful. We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions. We discuss the problem formulation for each task, and then present a taxonomy of techniques to solve these problems. We also identify relatively under explored topics, especially those that could significantly benefit from advances in AI literature. We also provide insights into the trends in this field, and what are the key investment opportunities

    Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

    Get PDF
    [...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...
    • …
    corecore