10 research outputs found

    AutoML: state of the art with a focus on anomaly detection, challenges, and research directions

    Get PDF
    International audienceThe last decade has witnessed the explosion of machine learning research studies with the inception of several algorithms proposed and successfully adopted in different application domains. However, the performance of multiple machine learning algorithms is very sensitive to multiple ingredients (e.g., hyper-parameters tuning and data cleaning) where a significant human effort is required to achieve good results. Thus, building well-performing machine learning algorithms requires domain knowledge and highly specialized data scientists. Automated machine learning (autoML) aims to make easier and more accessible the use of machine learning algorithms for researchers with varying levels of expertise. Besides, research effort to date has mainly been devoted to autoML for supervised learning, and only a few research proposals have been provided for the unsupervised learning. In this paper, we present an overview of the autoML field with a particular emphasis on the automated methods and strategies that have been proposed for unsupervised anomaly detection

    UMAP: Urban Mobility Analysis Platform to Harvest Car Sharing Data

    Get PDF
    Car sharing is nowadays a popular transport means in smart cities. In particular, the free-floating paradigm lets the users look for available cars, book one, and then start and stop the rental at their will, within the city area. This is done by using a smartphone app, which in turn contacts a web-based backend to exchange information. In this paper we present UMAP, a platform to harvest data freely made available on the web to extract driving habits in cities. We design UMAP to fetch data from car sharing platforms in real time, and process it to extract more advanced information about driving patterns and user’s habits while augmenting data with mapping and direction information fetched from other web platforms. This information is stored in a data lake where historical series are built, and later analyzed using easy to design and customize analytics modules. We prove the flexibility of UMAP by presenting a case of study for the city of Turin. We collect car sharing usage data over 50 days, and characterize both the temporal and spatial properties of rentals, as well as users’ habits in using the service, which we contrast with public transportation alternatives. Results provide insights about the driving style and needs, that are useful for smart city planners, and prove the feasibility of our approach

    AutoAD: an Automated Framework for Unsupervised Anomaly Detection

    Get PDF
    International audienceOver the last decade, we witnessed the prolifera-tion of several machine learning algorithms capable of solving different tasks for the most diverse applications. Often, for an algorithm to be effective, significant human effort is required, in particular for hyper-parameter tuning and data cleaning. Recently, there have been increasing efforts to alleviate such a burden and make machine learning algorithms easier to use for researchers with varying levels of expertise. Nevertheless, the question of whether an efficient and fully generalizable automated Machine Learning (autoML) framework is possible remains unanswered. In this paper, we present autoAD, the first autoML framework for unsupervised anomaly detection. By leveraging a pool of different anomaly detection algorithms, each one coming with its own hyper-parameter search space, our framework automatically selects the best performing ap-proach, while determining an optimal configuration for its hyper-parameters on a given dataset. Our extensive experimental evaluation, conducted on a rich collection of datasets, shows the substantial gains that can be achieved with autoAD compared to state-of-the-art methods for unsupervised anomaly detection

    Mesuré d'Internet à large échelle, longitudinale et sans biais

    No full text
    Aujourd’hui, un monde sans Internet est inimaginable. En interconnectant des milliards de personnes dans le monde et en offrant un nombre incalculable de services, il est dĂ©sormais pleinement intĂ©grĂ© Ă  la sociĂ©tĂ© moderne. Pourtant, malgrĂ© l’évolution et le dĂ©veloppement de la technologie, son omniprĂ©sence et son hĂ©tĂ©rogĂ©nĂ©itĂ© soulĂšvent encore de nouveaux dĂ©fis, tels que les problĂšmes de sĂ©curitĂ©, le contrĂŽle de la qualitĂ© d’expĂ©rience des utilisateurs (QoE), le souci de transparence et celui d’équitĂ© .En consĂ©quence, l’objectif de cette thĂšse est d’apporter un nouvel Ă©clairage sur certains des dĂ©fis qui ont Ă©mergĂ© ces derniĂšres annĂ©es. En particulier, nous fournissons une analyse approfondie de certains des aspects les plus importants de l’Internet moderne. Un accent particulier est mis sur le World Wide Web, qui, parmi tous, est sans doute l’une des applications Internet les plus populaires, et un regard spĂ©cifique sur son interaction avec l’apprentissage automatique.La premiĂšre partie de ce travail Ă©tudie la qualitĂ© de l’expĂ©rience de navigation des utilisateurs sur le Web, avec des mesures effectuĂ©es Ă  la fois “in the wild" et dans des environnements contrĂŽlĂ©s. Nos contributions continuent avec une analyse originale de l’avis subjectif des utilisateurs et des mesures objectives de la qualitĂ© d’expĂ©rience, montrant la difficultĂ© de construire des modĂšles supervisĂ©s prĂ©cis, basĂ©s sur des donnĂ©es, capables de prĂ©dire la satisfaction des utilisateurs, ainsi qu’une discussion approfondie de la nature multimodale des avis subjectifs des utilisateurs. Dans la deuxiĂšme partie de ce travail, nous analysons et discutons l’équitĂ© des modĂšles de langage basĂ©s sur des transformateurs de pointe, qui sont prĂ©-entraĂźnĂ©s sur des corpus basĂ©s sur le Web et qui sont gĂ©nĂ©ralement utilisĂ©s pour rĂ©soudre une grande variĂ©tĂ© de tĂąches de traitement du langage naturel (NLP). Nous nous demandons ici si la taille et l’hĂ©tĂ©rogĂ©nĂ©itĂ© du Web garantissent la diversitĂ© des modĂšles. Le cƓur de nos contributions repose sur la mesure du biais intĂ©grĂ© dans les modĂšles, que nous discutons sous diffĂ©rents angles. Enfin, la derniĂšre partie de cette thĂšse traite de la classification d’objets gĂ©nĂ©rĂ©s par des machines Ă  l’aide de certains des plus simples algorithmes d’apprentissage automatique supervisĂ©s Ă  l’état de l’art. GrĂące Ă  un framework solide mais peu intrusif, nous montrons que les diffĂ©rents comportements d’un champ du paquet IP, l’identification IP (IP-ID), peuvent ĂȘtre facilement classifiĂ©s avec peu de caractĂ©ristiques ayant un haut pouvoir discriminatoire. Nous appliquons enfin notre technique Ă  un census Ă  l’échelle de l’Internet et fournissons une vue actualisĂ©e de l’adoption de ses diffĂ©rentes implĂ©mentations dans l’Internet.Today, a world without the Internet is unimaginable. By interconnecting billions of people worldwide and by offering an uncountable number of services, it is now fully embedded in the modern society. Yet, despite technology evolution and development, its pervasiveness and heterogeneity still raise new challenges, such as security concerns, monitoring of the users' Quality of Experience (QoE), care for transparency and fairness. Accordingly, the goal of this thesis is to shed new light on some of the challenges emerged in recent years. In particular, we provide an in-depth analysis of some of the most prominent aspects of modern Internet. A particular emphasis is given on the World Wide Web, which among all, is undoubtedly one of the most popular Internet applications, and a specific regard to its interaction with machine learning. The first part of this work studies the Quality of Experience of users' browsing the Web, with measurements led both in the wild and in controlled environments. Our contributions follow with an original analysis of both the subjective user feedback and the objective QoE metrics, showing how hard it is to build accurate supervised data-driven models capable to predict the user satisfaction, along with an in-depth discussion of the multi-modal nature of the subjective user opinions.In the second part of this work, we analyze and discuss the fairness of state-of-the-art transformer-based language models, which are pre-trained on Web-based corpora and which are typically used to solve a wide variety of Natural Language Processing (NLP) tasks. Here, we question whether the sheer size and heterogeneity of the Web guarantee diversity in the models. The core of our contributions rests in the measure of the bias embedded in the models, that we discuss under different angles. Finally, the last part of this dissertation addresses the classification of objects generated by machines through some of the simplest state-of-the-art supervised machine learning algorithms. Through a minimally intrusive, robust and lightweight framework, we show that the different behaviors of a field of the IP packet, the IP identification (IP-ID), could be easily classified with few features having high discriminative power. We finally apply our technique to an Internet-wide census and provide an updated view of the adoption of the different implementations in the Internet

    Mesuré d'Internet à large échelle, longitudinale et sans biais

    No full text
    Today, a world without the Internet is unimaginable. By interconnecting billions of people worldwide and by offering an uncountable number of services, it is now fully embedded in the modern society. Yet, despite technology evolution and development, its pervasiveness and heterogeneity still raise new challenges, such as security concerns, monitoring of the users' Quality of Experience (QoE), care for transparency and fairness. Accordingly, the goal of this thesis is to shed new light on some of the challenges emerged in recent years. In particular, we provide an in-depth analysis of some of the most prominent aspects of modern Internet. A particular emphasis is given on the World Wide Web, which among all, is undoubtedly one of the most popular Internet applications, and a specific regard to its interaction with machine learning. The first part of this work studies the Quality of Experience of users' browsing the Web, with measurements led both in the wild and in controlled environments. Our contributions follow with an original analysis of both the subjective user feedback and the objective QoE metrics, showing how hard it is to build accurate supervised data-driven models capable to predict the user satisfaction, along with an in-depth discussion of the multi-modal nature of the subjective user opinions.In the second part of this work, we analyze and discuss the fairness of state-of-the-art transformer-based language models, which are pre-trained on Web-based corpora and which are typically used to solve a wide variety of Natural Language Processing (NLP) tasks. Here, we question whether the sheer size and heterogeneity of the Web guarantee diversity in the models. The core of our contributions rests in the measure of the bias embedded in the models, that we discuss under different angles. Finally, the last part of this dissertation addresses the classification of objects generated by machines through some of the simplest state-of-the-art supervised machine learning algorithms. Through a minimally intrusive, robust and lightweight framework, we show that the different behaviors of a field of the IP packet, the IP identification (IP-ID), could be easily classified with few features having high discriminative power. We finally apply our technique to an Internet-wide census and provide an updated view of the adoption of the different implementations in the Internet.Aujourd’hui, un monde sans Internet est inimaginable. En interconnectant des milliards de personnes dans le monde et en offrant un nombre incalculable de services, il est dĂ©sormais pleinement intĂ©grĂ© Ă  la sociĂ©tĂ© moderne. Pourtant, malgrĂ© l’évolution et le dĂ©veloppement de la technologie, son omniprĂ©sence et son hĂ©tĂ©rogĂ©nĂ©itĂ© soulĂšvent encore de nouveaux dĂ©fis, tels que les problĂšmes de sĂ©curitĂ©, le contrĂŽle de la qualitĂ© d’expĂ©rience des utilisateurs (QoE), le souci de transparence et celui d’équitĂ© .En consĂ©quence, l’objectif de cette thĂšse est d’apporter un nouvel Ă©clairage sur certains des dĂ©fis qui ont Ă©mergĂ© ces derniĂšres annĂ©es. En particulier, nous fournissons une analyse approfondie de certains des aspects les plus importants de l’Internet moderne. Un accent particulier est mis sur le World Wide Web, qui, parmi tous, est sans doute l’une des applications Internet les plus populaires, et un regard spĂ©cifique sur son interaction avec l’apprentissage automatique.La premiĂšre partie de ce travail Ă©tudie la qualitĂ© de l’expĂ©rience de navigation des utilisateurs sur le Web, avec des mesures effectuĂ©es Ă  la fois “in the wild" et dans des environnements contrĂŽlĂ©s. Nos contributions continuent avec une analyse originale de l’avis subjectif des utilisateurs et des mesures objectives de la qualitĂ© d’expĂ©rience, montrant la difficultĂ© de construire des modĂšles supervisĂ©s prĂ©cis, basĂ©s sur des donnĂ©es, capables de prĂ©dire la satisfaction des utilisateurs, ainsi qu’une discussion approfondie de la nature multimodale des avis subjectifs des utilisateurs. Dans la deuxiĂšme partie de ce travail, nous analysons et discutons l’équitĂ© des modĂšles de langage basĂ©s sur des transformateurs de pointe, qui sont prĂ©-entraĂźnĂ©s sur des corpus basĂ©s sur le Web et qui sont gĂ©nĂ©ralement utilisĂ©s pour rĂ©soudre une grande variĂ©tĂ© de tĂąches de traitement du langage naturel (NLP). Nous nous demandons ici si la taille et l’hĂ©tĂ©rogĂ©nĂ©itĂ© du Web garantissent la diversitĂ© des modĂšles. Le cƓur de nos contributions repose sur la mesure du biais intĂ©grĂ© dans les modĂšles, que nous discutons sous diffĂ©rents angles. Enfin, la derniĂšre partie de cette thĂšse traite de la classification d’objets gĂ©nĂ©rĂ©s par des machines Ă  l’aide de certains des plus simples algorithmes d’apprentissage automatique supervisĂ©s Ă  l’état de l’art. GrĂące Ă  un framework solide mais peu intrusif, nous montrons que les diffĂ©rents comportements d’un champ du paquet IP, l’identification IP (IP-ID), peuvent ĂȘtre facilement classifiĂ©s avec peu de caractĂ©ristiques ayant un haut pouvoir discriminatoire. Nous appliquons enfin notre technique Ă  un census Ă  l’échelle de l’Internet et fournissons une vue actualisĂ©e de l’adoption de ses diffĂ©rentes implĂ©mentations dans l’Internet

    A closer look at IP-ID behavior in the Wild

    No full text
    International audienceOriginally used to assist network-layer fragmentation and reassembly, the IP identification field (IP-ID) has been used and abused for a range of tasks, from counting hosts behind NAT, to detect router aliases and, lately, to assist detection of censorship in the Internet at large. These inferences have been possible since, in the past, the IP- ID was mostly implemented as a simple packet counter: however, this behavior has been discouraged for security reasons and other policies, such as random values, have been suggested. In this study, we propose a framework to classify the different IP-ID behaviors using active probing from a single host. Despite being only minimally intrusive, our technique is significantly accurate (99% true positive classification) robust against packet losses (up to 20%) and lightweight (few packets suffices to discriminate all IP-ID behaviors). We then apply our technique to an Internet-wide census, where we actively probe one alive target per each routable /24 subnet: we find that that the majority of hosts adopts a constant IP-IDs (39%) or local counter (34%), that the fraction of global counters (18%) significantly diminished, that a non marginal number of hosts have an odd behavior (7%) and that random IP-IDs are still an exception (2%)

    Implications of the Multi-Modality of User Perceived Page Load Time

    No full text
    International audienceWeb browsing is one of the most popular applications for both desktop and mobile users. A lot of effort has been devoted to speedup the Web, as well as in designing metrics that can accurately tell whether a webpage loaded fast or not. An often implicit assumption made by industrial and academic research communities is that a single metric is sufficient to assess whether a webpage loaded fast. In this paper we collect and make publicly available a unique dataset which contains webpage features (e.g., number and type of embedded objects) along with both objective and subjective Web quality metrics. This dataset was collected by crawling over 100 websites-representative of the top 1 M websites in the Web-while crowdsourcing 6,000 user opinions on user perceived page load time (uPLT). We show that the uPLT distribution is often multi-modal and that, in practice, no more than three modes are present. The main conclusion drawn from our analysis is that, for complex webpages, each of the different objective QoE metrics proposed in the literature (such as AFT, TTI, PLT, etc.) is suited to approximate one of the different uPLT modes

    Eltrombopag versus placebo for low-risk myelodysplastic syndromes with thrombocytopenia (EQoL-MDS): phase 1 results of a single-blind, randomised, controlled, phase 2 superiority trial

    Get PDF
    Background In myelodysplastic syndromes, thrombocytopenia is associated with mortality, but treatments in this setting are scarce. We tested whether eltrombopag, a thrombopoietin receptor agonist, might be effective in improving thrombocytopenia in lower-risk myelodysplastic syndromes and severe thrombocytopenia. Methods EQoL-MDS was a single-blind, randomised, controlled, phase 2 superiority trial of adult patients with low-risk or International Prognostic Scoring System intermediate-1-risk myelodysplastic syndromes and severe thrombocytopenia. Patients with a stable platelet count of lower than 30 × 109 platelets per L, aged at least 18 years, with refractoriness, ineligibility to receive treatment with alternative medications, or relapse while receiving treatment with alternative medications were included in this trial. Patients were randomly assigned (2:1) to receive eltrombopag (50 mg to 300 mg) or placebo for at least 24 weeks and until disease progression and were masked to treatment allocation. Here, we report the results in the intention-to-treat population of the first phase of the trial, for which the primary endpoints were the proportion of patients achieving a platelet response within 24 weeks and safety. The interim analysis presented here was protocol-specified and used a two-sided significance level of 0·001 and a p value at or below this limit for both primary endpoints to indicate the need for early trial termination. Duration of platelet transfusion independence, duration of response, overall survival, leukaemia-free survival, and pharmacokinetics will be reported at the end of the phase 2 portion of the trial. This trial is registered with EudraCT, number 2010-022890-33. Findings Between June 13, 2011, and June 17, 2016, we enrolled 90 participants for the first phase of the trial. The median follow-up time to assess platelet responses was 11 weeks (IQR 4–24). Platelet responses occurred in 28 (47%) of 59 patients in the eltrombopag group versus one (3%) of 31 patients in the placebo group (odds ratio 27·1 [95% CI 3·5–211·9], p=0·0017). During the follow-up, 21 patients had at least one severe bleeding event (WHO bleeding score ≄2). There were a higher number of bleeders in the placebo (13 [42%] of 31 patients) than in the eltrombopag arm (eight [14%] of 59 patients; p=0·0025). 52 grade 3–4 adverse events occurred in 27 (46%) of 59 patients in the eltrombopag group versus nine events in five (16%) of 31 patients in the placebo group (χ2=7·8, p=0·0053, stopping rule not reached). The outcome acute myeloid leukaemia evolution or disease progression occurred in seven (12%) of 59 patients in the eltrombopag group versus five (16%) of 31 patients in the placebo group (χ2=0·06, p=0·81). Interpretation Eltrombopag is well-tolerated in patients with lower-risk myelodysplastic syndromes and severe thrombocytopenia and is clinically effective in raising platelet counts and reducing bleeding events. The assessment of long-term safety and efficacy of eltrombopag and its effect on survival (phase 2 part of study) is still ongoing. Funding Associazione QOL-ONE

    Eltrombopag for Low-Risk Myelodysplastic Syndromes With Thrombocytopenia: Interim Results of a Phase-II, Randomized, Placebo-Controlled Clinical Trial (EQOL-MDS)

    Get PDF
    Purpose: In myelodysplastic syndromes (MDS), severe thrombocytopenia is associated with poor prognosis. This multicenter trial presents the second-part long-term efficacy and safety results of eltrombopag in patients with low-risk MDS and severe thrombocytopenia. Methods: In this single-blind, randomized, placebo-controlled, phase-II trial of adult patients with International Prognostic Scoring System low- or intermediate-1-risk MDS, patients with a stable platelet (PLT) count (<30 × 103/mm3) received eltrombopag or placebo until disease progression. Primary end points were duration of PLT response (PLT-R; calculated from the time of PLT-R to date of loss of PLT-R, defined as bleeding/PLT count <30 × 103/mm3 or last date in observation) and long-term safety and tolerability. Secondary end points included incidence and severity of bleeding, PLT transfusions, quality of life, leukemia-free survival, progression-free survival, overall survival and pharmacokinetics. Results: From 2011 to 2021, of 325 patients screened, 169 patients were randomly assigned oral eltrombopag (N = 112) or placebo (N = 57) at a starting dose of 50 mg once daily to maximum of 300 mg. PLT-R, with 25-week follow-up (IQR, 14-68) occurred in 47/111 (42.3%) eltrombopag patients versus 6/54 (11.1%) in placebo (odds ratio, 5.9; 95% CI, 2.3 to 14.9; P < .001). In eltrombopag patients, 12/47 (25.5%) lost the PLT-R, with cumulative thrombocytopenia relapse-free survival at 60 months of 63.6% (95% CI, 46.0 to 81.2). Clinically significant bleeding (WHO bleeding score ≄ 2) occurred less frequently in the eltrombopag arm than in the placebo group (incidence rate ratio, 0.54; 95% CI, 0.38 to 0.75; P = .0002). Although no difference in the frequency of grade 1-2 adverse events (AEs) was observed, a higher proportion of eltrombopag patients experienced grade 3-4 AEs (χ2 = 9.5, P = .002). AML evolution and/or disease progression occurred in 17% (for both) of eltrombopag and placebo patients with no difference in survival times. Conclusion: Eltrombopag was effective and relatively safe in low-risk MDS with severe thrombocytopenia. This trial is registered with ClinicalTrials.gov identifier: NCT02912208 and EU Clinical Trials Register: EudraCT No. 2010-022890-33

    AVALON: The Italian cohort study on real‐life efficacy of hypomethylating agents plus venetoclax in newly diagnosed or relapsed/refractory patients with acute myeloid leukemia

    Get PDF
    Background: Venetoclax in combination with hypomethylating agents (HMA) is revolutionizing the therapy of acute myeloid leukemia (AML). However, evidence on large sets of patients is lacking, especially in relapsed or refractory leukemia. Methods: AVALON is a multicentric cohort study that was conducted in Italy on patients with AML who received venetoclax-based therapies from 2015 to 2020. The study was approved by the ethics committee of the participating institution and was conducted in accordance with the Declaration of Helsinki. The effectiveness and toxicity of venetoclax + HMA in 190 (43 newly diagnosed, 68 refractory, and 79 relapsed) patients with AML are reported here. Results: In the newly diagnosed AML, the overall response rate and survival confirmed the brilliant results demonstrated in VIALE-A. In the relapsed or refractory AML, the combination demonstrated a surprisingly complete remission rate (44.1% in refractory and 39.7% in relapsed evaluable patients) and conferred to treated patients a good expectation of survival. Toxicities were overall manageable, and most incidents occurred in the first 60 days of therapy. Infections were confirmed as the most common nonhematologic adverse event. Conclusions: Real-life data show that the combination of venetoclax and HMA offers an expectation of remission and long-term survival to elderly, newly diagnosed patients, and to relapsed or chemoresistant AML, increasing the chance of cure through a different mechanism of action. The venetoclax + HMA combination is expected to constitute the base for triplet combinations and integration of target therapies. Our data contribute to ameliorate the understanding of venetoclax + HMA effectiveness and toxicities in real life
    corecore