Search CORE

779 research outputs found

How to Price Shared Optimizations in the Cloud

Author: Balazinska Magdalena
Suciu Dan
Upadhyaya Prasang
Publication venue
Publication date: 01/01/2011
Field of study

Data-management-as-a-service systems are increasingly being used in collaborative settings, where multiple users access common datasets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major challenge: how to select which optimizations to perform and how to share their cost among users. The problem is especially challenging when users are selfish and will only report their true values for different optimizations if doing so maximizes their utility. In this paper, we present a new approach for selecting and pricing shared optimizations by using Mechanism Design. We first show how to apply the Shapley Value Mechanism to the simple case of selecting and pricing additive optimizations, assuming an offline game where all users access the service for the same time-period. Second, we extend the approach to online scenarios where users come and go. Finally, we consider the case of substitutive optimizations. We show analytically that our mechanisms induce truth- fulness and recover the optimization costs. We also show experimentally that our mechanisms yield higher utility than the state-of-the-art approach based on regret accumulation.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Creation and web integration of a machine learning tool for estimating the price of second-hand devices

Author: Galvez Alcantara David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 28/06/2023
Field of study

Amb els ràpids avenços tecnològics, tant les persones com les empreses renoven els seus dispositius molt sovint. Avui dia, molts d'aquests dispositius són prematurament reciclats. Però, malgrat ser obsolets, encara tenen valor i poden convertir-se en dispositius de segona mà en comptes de ser directament reciclats, el que estalvia més recursos. Existeixen organitzacions que reben centenars o milers de dispositius i han de decidir per cadascun si cal reciclar-lo o guardar-lo per a donar-li un altre ús. La decisió requereix saber el valor del dispositiu i el cost que tindria guardar-lo fins que algú el compri. A més, pel fet que els dispositius són configurables, també cal tenir en compte si és bona idea extreure components d'altres dispositius per a reparacions o millorares a d'altres, el que augmenta la complexitat del problema. Aquest projecte proporciona una eina per a estimar el preu que té un dispositiu en el mercat i facilitar la presa de decisions. El projecte també desenvolupa una aplicació web que integra l'eina i facilita el seu ús.Due to the rapid evolution of hardware, individuals and organizations renew their devices frequently. Nowadays, most replaced devices are prematurely recycled. However, those devices, despite being obsolete, still hold value and can have a second owner, which saves more resources than recycling. Refurbishers receive hundreds or thousands of devices from organizations and need to make the decision of either recycling the device or storing it so it can be used by other organizations. To make a wise choice, the value of the device and the cost of storing it should be taken into account. Moreover, since devices can be configured, the decision could also involve taking components from one device to upgrade or repair another, which makes the problem more complex. This project aims at providing a solution to this problem by building a tool for predicting the price of second-hand devices based on machine learning so refurbishers can estimate the price a buyer would pay for the device. The project also aims at developing a web application that integrates the tool and eases its use

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Optimizing Data-Intensive Computing with Efficient Configuration Tuning

Author: Fekry Ayat
Publication venue: University of Cambridge
Publication date: 30/07/2021
Field of study

As the complexity of distributed analytics systems evolves over time, more configuration parameters get exposed for tuning. While these numerous parameters allow users more control over how their workloads are executed, this flexibility comes at a cost, since finding the right configurations for such systems in a cost-effective way becomes challenging. In practice, several factors contribute to the complexity of tuning the configuration of those systems: the large configuration space, the diversity of the served workloads (each workload possibly requiring a different resource allocation strategy to run optimally), and the dynamic characteristics of these systems’ environment (e.g., increase in input data size, changes in the allocation of resources). Paradoxically, existing solutions for workload tuning either assume static tuning environment or workloads that are inexpensive to run (i.e. requiring hundreds of execution samples). Recently, Bayesian Optimisation (BO) strategies have been applied as a solution to enable efficient autotuning. They build a probabilistic model incrementally to predict the impact of the parameters on performance using a small number of execution samples. The incrementally constructed BO model is used to guide the tuning process and accelerate convergence to a near-optimal configuration. Unfortunately, for distributed analytics systems, the configuration space is too large to construct a good model using traditional BO, which fails to provide quick convergence in high dimensional configuration space. I argue that cost-effective tuning strategies can only be developed when taking into account: the frequent changes that can happen in the analytics workload/environment, the amortization of tuning costs and how this influences tuning profitability, the high dimensionality of configuration space and the need to cater for diverse workloads. To tackle these challenges, I propose Tuneful, an efficient configuration tuning framework for such expensive to tune systems. It works efficiently both initially (when little data is available) as well as later (as more tuning knowledge is acquired). It starts with learning workload-specific influential parameters incrementally and tunes those only, then when more tuning knowledge becomes available, it detects similarity across workloads and utilizes multitask BO to share the tuning knowledge across similar workloads. I show how augmenting the BO approach with parameters’ significance and workload similarity characteristics enables an efficient configuration tuning in high dimensional configuration space. Over diverse analytics workloads, this significantly accelerates both configuration tuning and cost amortization, saving search time by 2.7-3.7X at median compared to the-state-of-the-art approaches

Apollo (Cambridge)

D3.2 Cost Concept Model and Gateway Specification

Author: Ashley Kevin
Bøgvad Kejser Ulla
Davidson Joy
Grindley Neil
Hougaard Edsen Johansen Kathrine
Krupp Jaan
L’Hours Hervé
McCann Patrick
Strodl Stephan
Thirifays Alex
Wang David
Publication venue: Collaboration to Clarify the Cost of Curation (4C)
Publication date: 08/08/2014
Field of study

This document introduces a Framework supporting the implementation of a cost concept model against which current and future cost models for curating digital assets can be benchmarked. The value built into this cost concept model leverages the comprehensive engagement by the 4C project with various user communities and builds upon our understanding of the requirements, drivers, obstacles and objectives that various stakeholder groups have relating to digital curation. Ultimately, this concept model should provide a critical input to the development and refinement of cost models as well as helping to ensure that the curation and preservation solutions and services that will inevitably arise from the commercial sector as ‘supply’ respond to a much better understood ‘demand’ for cost-effective and relevant tools. To meet acknowledged gaps in current provision, a nested model of curation which addresses both costs and benefits is provided. The goal of this task was not to create a single, functionally implementable cost modelling application; but rather to design a model based on common concepts and to develop a generic gateway specification that can be used by future model developers, service and solution providers, and by researchers in follow-up research and development projects. The Framework includes: • A Cost Concept Model—which defines the core concepts that should be included in curation costs models; • An Implementation Guide—for the cost concept model that provides guidance and proposes questions that should be considered when developing new cost models and refining existing cost models; • A Gateway Specification Template—which provides standard metadata for each of the core cost concepts and is intended for use by future model developers, model users, and service and solution providers to promote interoperability; • A Nested Model for Digital Curation—that visualises the core concepts, demonstrates how they interact and places them into context visually by linking them to A Cost and Benefit Model for Curation. This Framework provides guidance for data collection and associated calculations in an operational context but will also provide a critical foundation for more strategic thinking around curation such as the Economic Sustainability Reference Model (ESRM). Where appropriate, definitions of terms are provided, recommendations are made, and examples from existing models are used to illustrate the principles of the framework

Enlighten

The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free

Author: Liu Quanquan C.
Srinivas Vaidehi
Publication venue
Publication date: 17/07/2023
Field of study

The main bottleneck in designing efficient dynamic algorithms is the unknown nature of the update sequence. In particular, there are some problems, like 3-vertex connectivity, planar digraph all pairs shortest paths, and others, where the separation in runtime between the best partially dynamic solutions and the best fully dynamic solutions is polynomial, sometimes even exponential. In this paper, we formulate the predicted-deletion dynamic model, motivated by a recent line of empirical work about predicting edge updates in dynamic graphs. In this model, edges are inserted and deleted online, and when an edge is inserted, it is accompanied by a "prediction" of its deletion time. This models real world settings where services may have access to historical data or other information about an input and can subsequently use such information make predictions about user behavior. The model is also of theoretical interest, as it interpolates between the partially dynamic and fully dynamic settings, and provides a natural extension of the algorithms with predictions paradigm to the dynamic setting. We give a novel framework for this model that "lifts" partially dynamic algorithms into the fully dynamic setting with little overhead. We use our framework to obtain improved efficiency bounds over the state-of-the-art dynamic algorithms for a variety of problems. In particular, we design algorithms that have amortized update time that scales with a partially dynamic algorithm, with high probability, when the predictions are of high quality. On the flip side, our algorithms do no worse than existing fully-dynamic algorithms when the predictions are of low quality. Furthermore, our algorithms exhibit a graceful trade-off between the two cases. Thus, we are able to take advantage of ML predictions asymptotically "for free.'

arXiv.org e-Print Archive

An Essay on How Data Science Can Strengthen Business

Author: Santos António Duarte
Publication venue
Publication date: 01/01/2023
Field of study

Data science combines several extensions, including, e.g., statistics, scientific methods, artificial intelligence (AI) and data analysis to extract value from raw data. Analytical applications and data scientists can then verify and defer the results to discover patterns and trends. In this way, they allow business leaders to gain enlightened knowledge about the market. Companies have kept a wealth of data with them. As modern technology allowed for the creation and storage of ever-increasing amounts of information, data volumes popped. The wealth of data collected and stored by these technologies can bring regenerative benefits to organizations and societies around the world, but only if they can interpret it. That's where data science comes in. So, the applied economics refers to the application of economic theory and analysis. In this article we intend to present several software that are available for the application of economic analysis. Analysis can be performed on any type of data and is a way of looking at raw data and find useful information. There are several technologies available for economic analysis, with more or less characteristics, some of which are not only intended for this single purpose, and cover a wider spectrum of functionalities. Some of the technologies we will use are, e.g., Rstudio, SPSS, Statis and SAS/Stata. These are very common technologies when talking about economic or business analysis. The intention is to demonstrate how each of these software analyse the data and subsequently the interpretations that we can draw from that scrutiny. Organizations are using data science teams to turn data into a competitive advantage by refining products and services and cost-effective solutions. We will use some different algorithms to verify how they are processed by the different technologies, namely we will use metrics such as maximum, minimum, covariance, standard deviation, average and multicollinearity and variance, even the use of types of regression models

Camões - Repositório Institucional da Universidade Autónoma de Lisboa

Fine Tuning Transformer Models for Domain Specific Feature Extraction

Author: Campàs Gené Carla
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/01/2023
Field of study

La naturalesa del processament de llengües naturals ha canviat dràsticament en els últims anys. La implementació de Large Language Models pre-entrenat en milers de dades sense etiquetar ha obert la porta a una nova capa de comprensió del processament de text. Això ha desplaçat la investigació a la zona per explotar aquests grans models per obtenir millors resultats per a les tasques més petites. D'aquesta manera, el processament de llengües naturals està adquirint una importància cada vegada major. Afinant els diferents models de llenguatge gran amb dades específiques de context i de tasques, aquests models ràpidament aprenen a seguir patrons i generalitzar-los a nous conceptes. Entenen el llenguatge natural en gran mesura i poden generar relacions en paraules, frases i paràgrafs. La sintonització fina neuronal s'ha convertit en una tasca cada vegada més important per simplificar l'ús de solucions d'aprenentatge automàtic amb pocs recursos. L'augment dels models de transformadors pre-entrenats per al processament del llenguatge natural ha complicat la selecció i l'experimentació d'aquests models, augmentant el temps de recerca i experimentació. Aquest estudi passa per l'estat actual de l'art dels models transformadors i intenta estudiar l'abast i l'aplicabilitat d'aquests models. A partir d'aquest treball inicial, el document produeix un gasoducte complet d'ajust fi del model que permet a l'usuari obtenir fàcilment un model llest per a utilitzar per a una tasca de llenguatge natural. Per provar aquest model, la canonada es prova i s'avalua per a l'extracció automàtica de característiques (és a dir, funcionalitats) des d'aplicacions mòbils utilitzant documents de llenguatge natural disponibles, com ara descripcions.The nature of Natural Language Processing has drastically changed in the past years. The implementation of Large Language Models pre-trained on thousands of unlabelled data has opened the door to a new layer of comprehension of text processing. This has shifted research in the area to exploit these large models to obtain better results for smaller tasks. In this way, fine-tuning Natural Language Processing is becoming increasingly important. By fine-tuning the different large language models with context and task-specific data, these models quickly learn to track patterns and generalize to new concepts. They understand natural language to a great extent and can generate relationships in words, phrases, and paragraphs. Fine Tuning has become an increasingly important task to simplify the use of machine learning solutions with low resources. The increase in pre-trained transformer models for Natural Language Processing has complicated the selection and experimentation of these models, increasing research and experimentation time. This study goes through the current state of the art of transformer models and attempts to study the scope and applicability of these models. From this initial work, the paper produces a compre- hensive pipeline of model fine-tuning that allows the user to easily obtain a ready-to-use model for a natural language task. To test this model, the pipeline is tested and evaluated for the automatic extraction of features (i.e. functionalities) from mobile applications using available natural language documents, such as descriptions

UPCommons. Portal del coneixement obert de la UPC

Caching-based Multicast Message Authentication in Time-critical Industrial Control Systems

Author: Chen Binbin
Esiner Ertem
Hu Yih-Chun
Mashima Daisuke
Tefek Utku
Publication venue
Publication date: 08/08/2023
Field of study

Attacks against industrial control systems (ICSs) often exploit the insufficiency of authentication mechanisms. Verifying whether the received messages are intact and issued by legitimate sources can prevent malicious data/command injection by illegitimate or compromised devices. However, the key challenge is to introduce message authentication for various ICS communication models, including multicast or broadcast, with a messaging rate that can be as high as thousands of messages per second, within very stringent latency constraints. For example, certain commands for protection in smart grids must be delivered within 2 milliseconds, ruling out public-key cryptography. This paper proposes two lightweight message authentication schemes, named CMA and its multicast variant CMMA, that perform precomputation and caching to authenticate future messages. With minimal precomputation and communication overhead, C(M)MA eliminates all cryptographic operations for the source after the message is given, and all expensive cryptographic operations for the destinations after the message is received. C(M)MA considers the urgency profile (or likelihood) of a set of future messages for even faster verification of the most time-critical (or likely) messages. We demonstrate the feasibility of C(M)MA in an ICS setting based on a substation automation system in smart grids.Comment: For viewing INFOCOM proceedings in IEEE Xplore see https://ieeexplore.ieee.org/abstract/document/979676

arXiv.org e-Print Archive

Graph neural networks for seizure discrimination based on electroencephalogram analysis

Author: Galván Calderón Pablo José
Publication venue: Universitat Politècnica de Catalunya
Publication date: 27/06/2023
Field of study

Este estudio presenta una investigación sobre la clasificación de Convulsiones Psicógenas No Epilépticas (PNES) y Convulsiones Epilépticas (ES) utilizando datos de EEG y Redes Neuronales de Grafos (GNN). El modelo propuesto muestra un rendimiento destacable, superando los resultados previos del estado del arte y logrando una precisión notable en la clasificación ternaria. Mediante el uso de una arquitectura GNN, el modelo distingue de manera efectiva entre PNES y ES con una precisión del 92.9%. Además, al emplear la validación cruzada "Leave One Group Out", el modelo logra una precisión aún mayor del 97.58%, superando la precisión más alta reportada en el estado del arte de 94.4%. Asimismo, al ampliar la clasificación para incluir a pacientes sanos, el modelo alcanza una precisión del 91.12%, superando la mejor precisión conocida del estado del arte de 85.7%. Estos hallazgos resaltan el potencial del modelo para clasificar y diferenciar de manera precisa estas condiciones médicas utilizando datos de EEG. El trabajo futuro incluye la exploración de biomarcadores para la clasificación binaria utilizando las capacidades de explicabilidad del modelo, contribuyendo al desarrollo de herramientas de diagnóstico objetivas y estrategias de tratamiento personalizadas. Además, este estudio compara el rendimiento, las metodologías y los conjuntos de datos de estudios similares del estado del arte, proporcionando una visión general completa de la investigación en clasificación de convulsiones. En conclusión, este estudio demuestra el éxito del modelo propuesto en la clasificación de PNES y ES, allanando el camino para futuros avances en el campo y beneficiando a pacientes y profesionales de la salud en el diagnóstico y tratamiento.This study presents a research investigation on the classification of Psychogenic Non-Epileptic Seizures (PNES) and Epileptic Seizures (ES) using EEG data and Graph Neural Networks (GNN). The proposed model demonstrates outstanding performance, surpassing previous state-of-the-art results and achieving remarkable accuracy in ternary classification. By utilizing a GNN architecture, the model effectively distinguishes between PNES and ES with an accuracy of 92.9%. Moreover, when employing Leave One Group Out crossvalidation, the model achieves an even higher accuracy of 97.58%, outperforming the highest reported state-of-the-art accuracy of 94.4%. Furthermore, by extending the classification to include healthy patients, the model achieves an accuracy of 91.12%, surpassing the bestknown state-of-the-art accuracy of 85.7%. These findings highlight the potential of the model in accurately classifying and differentiating these medical conditions using EEG data. Future work includes the exploration of biomarkers for binary classification using the model's explainability capabilities, contributing to the development of objective diagnostic tools and personalized treatment strategies. Additionally, this study compares the performance, methodologies, and datasets of similar studies from the state-of-the-art, providing a comprehensive overview of seizure classification research. In conclusion, this study demonstrates the success of the proposed model in classifying PNES and ES, paving the way for further advancements in the field and benefiting patients and healthcare practitioners in diagnosis and treatment

UPCommons. Portal del coneixement obert de la UPC

The Coalition For Consumer Bankruptcy Debtor Education: A Report On Its Pilot Program

Author: Baron-Donovan Corinne
Block-Lieb Susan
Gross Karen
Wiener Richard
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/01/2004
Field of study

Fordham University School of Law