17 research outputs found

    Exploiting General-Purpose In-Game Behaviours to Predict Players Churn in Gameful Systems

    Get PDF
    The value of a game is assessed by measuring the intensity of the level of activity of its players. No matter how thoroughly though the design is, the litmus test is whether players keep using it or not. To reduce the number of abandoning players, it is important to detect in time the subjects at risk. In the literature, many works are targeting this issue. However, the main focus has been on entertainment games, from which articulated indicators of in-game behaviors can be extracted. Those features tend to be context-specific and, even when they are not, they are proper of full-featured games, and thus, impossible to adapt to other systems such as games with a purpose and gamified apps. In this preliminary work, we fed to an Artificial Neural Network general-purpose in-game behaviors, such as participation data, to predict when a player will definitively leave the game. Moreover, we study the appropriate amount of information, in terms of players’ history, that should be considered when predicting players’ churn. Our use case study is an on-the-field long-lasting persuasive gameful system

    Posthoc Interpretation via Quantization

    Full text link
    In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. Our model formulation also enables learning concepts by incorporating the supervision of pretrained annotation models such as state-of-the-art image segmentation models. We evaluated our method through quantitative and qualitative studies involving black-and-white images, color images, and audio. As a result of these studies we found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.Comment: Francesco Paissan and Cem Subakan contributed equall

    Audio Editing with Non-Rigid Text Prompts

    Full text link
    In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events

    Low-Complexity Acoustic Scene Classification in DCASE 2022 Challenge

    Get PDF
    This paper presents an analysis of the Low-Complexity Acoustic Scene Classification task in DCASE 2022 Challenge. The task was a continuation from the previous years, but the low-complexity requirements were changed to the following: the maximum number of allowed parameters, including the zero-valued ones, was 128 K, with parameters being represented using INT8 numerical format; and the maximum number of multiply-accumulate operations at inference time was 30 million. Despite using the same previous year dataset, the audio samples have been shortened to 1 second instead of 10 second for this year challenge. The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46.5 K parameters, and 29.23 million multiply-and-accumulate operations (MMACs). Its performance on the evaluation data is 44.2% accuracy and 1.532 log-loss. In comparison, the top system in the challenge obtained an accuracy of 59.6% and a log loss of 1.091, having 121 K parameters and 28 MMACs. The task received 48 submissions from 19 different teams, most of which outperformed the baseline system.publishedVersionPeer reviewe

    PhiNet-GAN: Bringing real-time face swapping to embedded devices

    No full text
    Recent years have seen an unprecedented development of deep learning-based techniques for processing live video from CCTV cameras, causing growing privacy concerns. A possible solution is to ensure that a subject's personal information never leaves the device in which it was collected, thus implementing a Privacy-by-Design (PbD) approach. In live video processing tasks, PbD can be guaranteed through anonymisation techniques, such as face-swapping, performed directly on the end device. This paper, therefore, presents PhiNet-GAN, an extension of the PhiNet family of embedded neural networks applied to generative networks. PhiNet-GAN targets resource-constrained platforms based on low-power microcontrollers. An example is the Kendryte K210, a RISC V dual-core processing unit working at 400MHz on which we tested our network. Overall we achieved a power consumption of less than 300mW, working at more than 15fps with an FID score lower than 150

    PhiNets: a scalable backbone for low-power AI at the edge

    No full text
    In the Internet of Things era, where we see many interconnected and heterogeneous mobile and fixed smart devices, distributing the intelligence from the cloud to the edge has become a necessity. Due to limited computational and communication capabilities, low memory and limited energy budget, bringing artificial intelligence algorithms to peripheral devices, such as end-nodes of a sensor network, is a challenging task and requires the design of innovative solutions. In this work, we present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory, thus exploiting all available resources for a given platform. With a YoloV2 detection head and Simple Online and Realtime Tracking, the proposed architecture achieves state-of-the-art results in (i) detection on the COCO and VOC2012 benchmarks, and (ii) tracking on the MOT15 benchmark. PhiNets obtain a reduction in parameter count of around 90% with respect to previous state-of-the-art models (EfficientNetv1, MobileNetv2) and achieve better performance with lower computational cost. Moreover, we demonstrate our approach on a prototype node based on an STM32H743 microcontroller (MCU) with 2MB of internal Flash and 1MB of RAM and achieve power requirements in the order of 10 mW. The code for the PhiNets is publicly available on GitHub

    On the Role of Smart Vision Sensors in Energy-Efficient Computer Vision at the Edge

    No full text
    The increasing focus of the research community towards lightweight and small footprint neural network models is closing the gap between inference performance in cluster-scale models and tiny devices. In the recent past, researchers have shown how it is possible to achieve state-of-the-art performance in different domains (e.g. sound event detection, object detection, image classification) with small footprints and low computational cost architectures. However, these studies lack a comprehensive analysis of the input space used (e.g. for images) and present the results on standard RGB benchmarks. In this manuscript, we investigate the role of smart vision sensors (SVSs) in deep learning-based object detection pipelines. In particular, we combine the motion bitmaps with standard color spaces representations (namely, RGB, YUV, and grayscale) and show how SVSs can be used optimally for an IoT end-node. In conclusion, we report that, overall, the best-performing input space is grayscale augmented with the motion bitmap. These results are promising for real-world applications since many SVSs provide both image formats at low power consumption

    Optimizing PhiNet architectures for the detection of urban sounds on low-end devices

    No full text
    Sound Event Detection (SED) pipelines identify and classify relevant events in audio streams. With typical applications in the smart city domain (e.g., crowd counting, alarm triggering), SED is an asset for municipalities and law enforcement agencies. Given the large size of the areas to be monitored and the amount of data generated by the IoT sensors, large models running on centralised servers are not suitable for real-time applications. Conversely, performing SED directly on pervasive embedded devices is very attractive in terms of energy consumption, bandwidth requirements and privacy preservation. In a previous manuscript, we proposed scalable backbones from the PhiNets architectures’ family for real-time sound event detection on microcontrollers. In this paper, we extend our analysis investigating how PhiNets’ scaling parameters affect the model performance in the SED task while searching for the best configuration given the computational constraints. Experimental analysis on UrbanSound8K shows that while onlythe total number of parameters matters when training the model from scratch (i.e., it is independent of the scaling parameter configuration), knowledge distillation is more effective with specific scaling configurations
    corecore