779 research outputs found
How to Price Shared Optimizations in the Cloud
Data-management-as-a-service systems are increasingly being used in
collaborative settings, where multiple users access common datasets. Cloud
providers have the choice to implement various optimizations, such as indexing
or materialized views, to accelerate queries over these datasets. Each
optimization carries a cost and may benefit multiple users. This creates a
major challenge: how to select which optimizations to perform and how to share
their cost among users. The problem is especially challenging when users are
selfish and will only report their true values for different optimizations if
doing so maximizes their utility. In this paper, we present a new approach for
selecting and pricing shared optimizations by using Mechanism Design. We first
show how to apply the Shapley Value Mechanism to the simple case of selecting
and pricing additive optimizations, assuming an offline game where all users
access the service for the same time-period. Second, we extend the approach to
online scenarios where users come and go. Finally, we consider the case of
substitutive optimizations. We show analytically that our mechanisms induce
truth- fulness and recover the optimization costs. We also show experimentally
that our mechanisms yield higher utility than the state-of-the-art approach
based on regret accumulation.Comment: VLDB201
Creation and web integration of a machine learning tool for estimating the price of second-hand devices
Amb els rà pids avenços tecnològics, tant les persones com les empreses renoven els seus dispositius molt sovint. Avui dia, molts d'aquests dispositius són prematurament reciclats. Però, malgrat ser obsolets, encara tenen valor i poden convertir-se en dispositius de segona mà en comptes de ser directament reciclats, el que estalvia més recursos. Existeixen organitzacions que reben centenars o milers de dispositius i han de decidir per cadascun si cal reciclar-lo o guardar-lo per a donar-li un altre ús. La decisió requereix saber el valor del dispositiu i el cost que tindria guardar-lo fins que algú el compri. A més, pel fet que els dispositius són configurables, també cal tenir en compte si és bona idea extreure components d'altres dispositius per a reparacions o millorares a d'altres, el que augmenta la complexitat del problema. Aquest projecte proporciona una eina per a estimar el preu que té un dispositiu en el mercat i facilitar la presa de decisions. El projecte també desenvolupa una aplicació web que integra l'eina i facilita el seu ús.Due to the rapid evolution of hardware, individuals and organizations renew their devices frequently. Nowadays, most replaced devices are prematurely recycled. However, those devices, despite being obsolete, still hold value and can have a second owner, which saves more resources than recycling. Refurbishers receive hundreds or thousands of devices from organizations and need to make the decision of either recycling the device or storing it so it can be used by other organizations. To make a wise choice, the value of the device and the cost of storing it should be taken into account. Moreover, since devices can be configured, the decision could also involve taking components from one device to upgrade or repair another, which makes the problem more complex. This project aims at providing a solution to this problem by building a tool for predicting the price of second-hand devices based on machine learning so refurbishers can estimate the price a buyer would pay for the device. The project also aims at developing a web application that integrates the tool and eases its use
Recommended from our members
Optimizing Data-Intensive Computing with Efficient Configuration Tuning
As the complexity of distributed analytics systems evolves over time, more configuration parameters get exposed for tuning. While these numerous parameters allow users more control over how their workloads are executed, this flexibility comes at a cost, since finding the right configurations for such systems in a cost-effective way becomes challenging. In practice, several factors contribute to the complexity of tuning the configuration of those systems: the large configuration space, the diversity of the served workloads (each workload possibly requiring a different resource allocation strategy to run optimally), and the dynamic
characteristics of these systems’ environment (e.g., increase in input data size, changes in the allocation of resources). Paradoxically, existing solutions for workload tuning either assume static tuning environment or workloads that are inexpensive to run (i.e. requiring hundreds of execution samples). Recently, Bayesian Optimisation (BO) strategies have been applied as a solution to enable efficient autotuning. They build a probabilistic model incrementally to predict the impact of the parameters on performance using a small number of execution samples. The incrementally constructed BO model is used to guide the tuning process and accelerate convergence to a near-optimal configuration. Unfortunately, for distributed analytics systems, the configuration space is too large to construct a good model using traditional BO, which fails to provide quick convergence in high dimensional configuration space.
I argue that cost-effective tuning strategies can only be developed when taking into account: the frequent changes that can happen in the analytics workload/environment, the amortization of tuning costs and how this influences tuning profitability, the high dimensionality of configuration
space and the need to cater for diverse workloads. To tackle these challenges, I propose Tuneful, an efficient configuration tuning framework
for such expensive to tune systems. It works efficiently both initially (when little data is available) as well as later (as more tuning knowledge is acquired). It starts with learning workload-specific influential parameters incrementally and tunes those only, then when more tuning knowledge becomes available, it detects similarity across workloads and utilizes multitask BO to share the tuning knowledge across similar workloads. I show how augmenting the BO approach with parameters’ significance and workload similarity characteristics enables an
efficient configuration tuning in high dimensional configuration space. Over diverse analytics workloads, this significantly accelerates both configuration tuning and cost amortization, saving search time by 2.7-3.7X at median compared to the-state-of-the-art approaches
D3.2 Cost Concept Model and Gateway Specification
This document introduces a Framework supporting the implementation of a cost concept model against which current and future cost models for curating digital assets can be benchmarked. The value built into this cost concept model leverages the comprehensive engagement by the 4C project with various user communities and builds upon our understanding of the requirements, drivers, obstacles and objectives that various stakeholder groups have relating to digital curation. Ultimately, this concept model should provide a critical input to the development and refinement of cost models as well as helping to ensure that the curation and preservation solutions and services that will inevitably arise from the commercial sector as ‘supply’ respond to a much better understood ‘demand’ for cost-effective and relevant tools. To meet acknowledged gaps in current provision, a nested model of curation which addresses both costs and benefits is provided. The goal of this task was not to create a single, functionally implementable cost modelling application; but rather to design a model based on common concepts and to develop a generic gateway specification that can be used by future model developers, service and solution providers, and by researchers in follow-up research and development projects.<p></p>
The Framework includes:<p></p>
• A Cost Concept Model—which defines the core concepts that should be included in curation costs models;<p></p>
• An Implementation Guide—for the cost concept model that provides guidance and proposes questions that should be considered when developing new cost models and refining existing cost models;<p></p>
• A Gateway Specification Template—which provides standard metadata for each of the core cost concepts and is intended for use by future model developers, model users, and service and solution providers to promote interoperability;<p></p>
• A Nested Model for Digital Curation—that visualises the core concepts, demonstrates how they interact and places them into context visually by linking them to A Cost and Benefit Model for Curation.<p></p>
This Framework provides guidance for data collection and associated calculations in an operational context but will also provide a critical foundation for more strategic thinking around curation such as the Economic Sustainability Reference Model (ESRM).<p></p>
Where appropriate, definitions of terms are provided, recommendations are made, and examples from existing models are used to illustrate the principles of the framework
The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free
The main bottleneck in designing efficient dynamic algorithms is the unknown
nature of the update sequence. In particular, there are some problems, like
3-vertex connectivity, planar digraph all pairs shortest paths, and others,
where the separation in runtime between the best partially dynamic solutions
and the best fully dynamic solutions is polynomial, sometimes even exponential.
In this paper, we formulate the predicted-deletion dynamic model, motivated
by a recent line of empirical work about predicting edge updates in dynamic
graphs. In this model, edges are inserted and deleted online, and when an edge
is inserted, it is accompanied by a "prediction" of its deletion time. This
models real world settings where services may have access to historical data or
other information about an input and can subsequently use such information make
predictions about user behavior. The model is also of theoretical interest, as
it interpolates between the partially dynamic and fully dynamic settings, and
provides a natural extension of the algorithms with predictions paradigm to the
dynamic setting.
We give a novel framework for this model that "lifts" partially dynamic
algorithms into the fully dynamic setting with little overhead. We use our
framework to obtain improved efficiency bounds over the state-of-the-art
dynamic algorithms for a variety of problems. In particular, we design
algorithms that have amortized update time that scales with a partially dynamic
algorithm, with high probability, when the predictions are of high quality. On
the flip side, our algorithms do no worse than existing fully-dynamic
algorithms when the predictions are of low quality. Furthermore, our algorithms
exhibit a graceful trade-off between the two cases. Thus, we are able to take
advantage of ML predictions asymptotically "for free.'
An Essay on How Data Science Can Strengthen Business
Data science combines several extensions, including, e.g., statistics, scientific methods, artificial intelligence (AI) and data analysis to extract value from raw data. Analytical applications and data scientists can then verify and defer the results to discover patterns and trends. In this way, they allow business leaders to gain enlightened knowledge about the market. Companies have kept a wealth of data with them. As modern technology allowed for the creation and storage of ever-increasing amounts of information, data volumes popped. The wealth of data collected and stored by these technologies can bring regenerative benefits to organizations and societies around the world, but only if they can interpret it. That's where data science comes in. So, the applied economics refers to the application of economic theory and analysis. In this article we intend to present several software that are available for the application of economic analysis. Analysis can be performed on any type of data and is a way of looking at raw data and find useful information. There are several technologies available for economic analysis, with more or less characteristics, some of which are not only intended for this single purpose, and cover a wider spectrum of functionalities. Some of the technologies we will use are, e.g., Rstudio, SPSS, Statis and SAS/Stata. These are very common technologies when talking about economic or business analysis. The intention is to demonstrate how each of these software analyse the data and subsequently the interpretations that we can draw from that scrutiny. Organizations are using data science teams to turn data into a competitive advantage by refining products and services and cost-effective solutions. We will use some different algorithms to verify how they are processed by the different technologies, namely we will use metrics such as maximum, minimum, covariance, standard deviation, average and multicollinearity and variance, even the use of types of regression models
Fine Tuning Transformer Models for Domain Specific Feature Extraction
La naturalesa del processament de llengües naturals ha canviat drà sticament en els últims anys. La implementació de Large Language Models pre-entrenat en milers de dades sense etiquetar ha obert la porta a una nova capa de comprensió del processament de text. Això ha desplaçat la investigació a la zona per explotar aquests grans models per obtenir millors resultats per a les tasques més petites. D'aquesta manera, el processament de llengües naturals està adquirint una importà ncia cada vegada major. Afinant els diferents models de llenguatge gran amb dades especÃfiques de context i de tasques, aquests models rà pidament aprenen a seguir patrons i generalitzar-los a nous conceptes. Entenen el llenguatge natural en gran mesura i poden generar relacions en paraules, frases i parà grafs. La sintonització fina neuronal s'ha convertit en una tasca cada vegada més important per simplificar l'ús de solucions d'aprenentatge automà tic amb pocs recursos. L'augment dels models de transformadors pre-entrenats per al processament del llenguatge natural ha complicat la selecció i l'experimentació d'aquests models, augmentant el temps de recerca i experimentació. Aquest estudi passa per l'estat actual de l'art dels models transformadors i intenta estudiar l'abast i l'aplicabilitat d'aquests models. A partir d'aquest treball inicial, el document produeix un gasoducte complet d'ajust fi del model que permet a l'usuari obtenir fà cilment un model llest per a utilitzar per a una tasca de llenguatge natural. Per provar aquest model, la canonada es prova i s'avalua per a l'extracció automà tica de caracterÃstiques (és a dir, funcionalitats) des d'aplicacions mòbils utilitzant documents de llenguatge natural disponibles, com ara descripcions.The nature of Natural Language Processing has drastically changed in the past years. The implementation of Large Language Models pre-trained on thousands of unlabelled data has opened the door to a new layer of comprehension of text processing. This has shifted research in the area to exploit these large models to obtain better results for smaller tasks. In this way, fine-tuning Natural Language Processing is becoming increasingly important. By fine-tuning the different large language models with context and task-specific data, these models quickly learn to track patterns and generalize to new concepts. They understand natural language to a great extent and can generate relationships in words, phrases, and paragraphs. Fine Tuning has become an increasingly important task to simplify the use of machine learning solutions with low resources. The increase in pre-trained transformer models for Natural Language Processing has complicated the selection and experimentation of these models, increasing research and experimentation time. This study goes through the current state of the art of transformer models and attempts to study the scope and applicability of these models. From this initial work, the paper produces a compre- hensive pipeline of model fine-tuning that allows the user to easily obtain a ready-to-use model for a natural language task. To test this model, the pipeline is tested and evaluated for the automatic extraction of features (i.e. functionalities) from mobile applications using available natural language documents, such as descriptions
Caching-based Multicast Message Authentication in Time-critical Industrial Control Systems
Attacks against industrial control systems (ICSs) often exploit the
insufficiency of authentication mechanisms. Verifying whether the received
messages are intact and issued by legitimate sources can prevent malicious
data/command injection by illegitimate or compromised devices. However, the key
challenge is to introduce message authentication for various ICS communication
models, including multicast or broadcast, with a messaging rate that can be as
high as thousands of messages per second, within very stringent latency
constraints. For example, certain commands for protection in smart grids must
be delivered within 2 milliseconds, ruling out public-key cryptography. This
paper proposes two lightweight message authentication schemes, named CMA and
its multicast variant CMMA, that perform precomputation and caching to
authenticate future messages. With minimal precomputation and communication
overhead, C(M)MA eliminates all cryptographic operations for the source after
the message is given, and all expensive cryptographic operations for the
destinations after the message is received. C(M)MA considers the urgency
profile (or likelihood) of a set of future messages for even faster
verification of the most time-critical (or likely) messages. We demonstrate the
feasibility of C(M)MA in an ICS setting based on a substation automation system
in smart grids.Comment: For viewing INFOCOM proceedings in IEEE Xplore see
https://ieeexplore.ieee.org/abstract/document/979676
Graph neural networks for seizure discrimination based on electroencephalogram analysis
Este estudio presenta una investigación sobre la clasificación de Convulsiones Psicógenas No Epilépticas (PNES) y Convulsiones Epilépticas (ES) utilizando datos de EEG y Redes Neuronales de Grafos (GNN). El modelo propuesto muestra un rendimiento destacable, superando los resultados previos del estado del arte y logrando una precisión notable en la clasificación ternaria. Mediante el uso de una arquitectura GNN, el modelo distingue de manera efectiva entre PNES y ES con una precisión del 92.9%. Además, al emplear la validación cruzada "Leave One Group Out", el modelo logra una precisión aún mayor del 97.58%, superando la precisión más alta reportada en el estado del arte de 94.4%. Asimismo, al ampliar la clasificación para incluir a pacientes sanos, el modelo alcanza una precisión del 91.12%, superando la mejor precisión conocida del estado del arte de 85.7%. Estos hallazgos resaltan el potencial del modelo para clasificar y diferenciar de manera precisa estas condiciones médicas utilizando datos de EEG. El trabajo futuro incluye la exploración de biomarcadores para la clasificación binaria utilizando las capacidades de explicabilidad del modelo, contribuyendo al desarrollo de herramientas de diagnóstico objetivas y estrategias de tratamiento personalizadas. Además, este estudio compara el rendimiento, las metodologÃas y los conjuntos de datos de estudios similares del estado del arte, proporcionando una visión general completa de la investigación en clasificación de convulsiones. En conclusión, este estudio demuestra el éxito del modelo propuesto en la clasificación de PNES y ES, allanando el camino para futuros avances en el campo y beneficiando a pacientes y profesionales de la salud en el diagnóstico y tratamiento.This study presents a research investigation on the classification of Psychogenic Non-Epileptic Seizures (PNES) and Epileptic Seizures (ES) using EEG data and Graph Neural Networks (GNN). The proposed model demonstrates outstanding performance, surpassing previous state-of-the-art results and achieving remarkable accuracy in ternary classification. By utilizing a GNN architecture, the model effectively distinguishes between PNES and ES with an accuracy of 92.9%. Moreover, when employing Leave One Group Out crossvalidation, the model achieves an even higher accuracy of 97.58%, outperforming the highest reported state-of-the-art accuracy of 94.4%. Furthermore, by extending the classification to include healthy patients, the model achieves an accuracy of 91.12%, surpassing the bestknown state-of-the-art accuracy of 85.7%. These findings highlight the potential of the model in accurately classifying and differentiating these medical conditions using EEG data. Future work includes the exploration of biomarkers for binary classification using the model's explainability capabilities, contributing to the development of objective diagnostic tools and personalized treatment strategies. Additionally, this study compares the performance, methodologies, and datasets of similar studies from the state-of-the-art, providing a comprehensive overview of seizure classification research. In conclusion, this study demonstrates the success of the proposed model in classifying PNES and ES, paving the way for further advancements in the field and benefiting patients and healthcare practitioners in diagnosis and treatment
- …