261 research outputs found

    Parameterizing Random Test Data According to Equivalence Classes

    Get PDF
    We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. Random testing is one solution, but may have limited effectiveness in cases in which a reliable test oracle does not exist, as is the case of the machine learning applications of interest. To address this problem, we have developed an approach to creating data sets called "parameterized random data generation"Â. Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications

    An Approach to Software Testing of Machine Learning Applications

    Get PDF
    Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test such ML software, because there is no reliable test oracle. We describe a software testing approach aimed at addressing this problem. We present our findings from testing implementations of two different ML ranking algorithms: Support Vector Machines and MartiRank

    Open Research Data and Innovative Scholarly Writing: OPERAS highlights

    Get PDF
    Pre-print of the article to be puslihed in OA on http://www.ressi.ch/ We present here highlights from an enquiry on the innovations in scholarly writing in the Humanities and Social Sciences in the H2020 project OPERAS-P. This article explores the theme of Open Research Data and its role in the emergence of new models of scholarly writing. We examine more closely the obstacles and fostering conditions to the publication of research data, both from a social and a technical perspective

    tsGT: Stochastic Time Series Modeling With Transformer

    Full text link
    Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values

    A Comparison Between the Populations of Late Presenters and Non-late Presenters

    Get PDF
    Funding Information: We want to thank Frank Tang, Bin Lin, Mike Cohen and the rest of the Google speech team for their insightful discussions and inputs. Publisher Copyright: Copyright © 2022 Miranda, Pingarilho, Pimentel, Martins, Kaiser, Seguin-Devaux, Paredes, Zazzi, Incardona and Abecasis.Background: The increased use of antiretroviral therapy (ART) has decreased mortality and morbidity of HIV-1 infected people but increasing levels of HIV drug resistance threatens the success of ART regimens. Conversely, late presentation can impact treatment outcomes, health costs, and potential transmission of HIV. Objective: To describe the patterns of transmitted drug resistance (TDR) and acquired drug resistance (ADR) in HIV-1 infected patients followed in Europe, to compare its patterns in late presenters (LP) vs non-late presenters (NLP), and to analyze the most prevalent drug resistance mutations among HIV-1 subtypes. Methods: Our study included clinical, socio-demographic, and genotypic information from 26,973 HIV-1 infected patients from the EuResist Integrated Database (EIDB) between 1981 and 2019. Results: Among the 26,973 HIV-1 infected patients in the analysis, 11,581 (42.9%) were ART-naïve patients and 15,392 (57.1%) were ART-experienced. The median age was 37 (IQR: 27.0–45.0) years old and 72.6% were males. The main transmission route was through heterosexual contact (34.9%) and 81.7% of patients originated from Western Europe. 71.9% of patients were infected by subtype B and 54.8% of patients were classified as LP. The overall prevalence of TDR was 12.8% and presented an overall decreasing trend (p for trend < 0.001), the ADR prevalence was 68.5% also with a decreasing trend (p for trend < 0.001). For LP and NLP, the TDR prevalence was 12.3 and 12.6%, respectively, while for ADR, 69.9 and 68.2%, respectively. The most prevalent TDR drug resistance mutations, in both LP and NLP, were K103N/S, T215rev, T215FY, M184I/V, M41I/L, M46I/L, and L90M. Conclusion: Our study showed that the overall TDR (12.8%) and ADR (68.5%) presented decreasing trends during the study time period. For LP, the overall TDR was slightly lower than for NLP (12.3 vs 12.6%, respectively); while this pattern was opposite for ADR (LP slightly higher than NLP). We suggest that these differences, in the case of TDR, can be related to the dynamics of fixation of drug resistance mutations; and in the case of ADR with the more frequent therapeutic failure in LPs.publishersversionpublishe

    CCS Acceptability: Social Site Characterization and Advancing Awareness at Prospective Storage Sites in Poland and Scotland

    Get PDF
    This paper summarizes the work on the social dimension conducted within the EU FP7 SiteChar project. The most important aim of the research was to advance public awareness and draw lessons for successful public engagement activities when developing a CO2 storage permit application. To this end, social site characterization (e.g. representative surveys) and public participation activities (focus conference) were conducted at two prospective Carbon Capture and Storage (CCS) sites: an onshore site in Poland and an offshore site in Scotland. The research consisted of four steps over a time period of 1.5 year, from early 2011 to mid-2012. The first step consisted of four related qualitative and quantitative research activities to provide a social characterization of the areas: desk research, stakeholder interviews, media analyses, and a survey among representative samples of the local community. The aim was to identify: - stakeholders or interested parties; - factors that may drive their perceptions of and attitudes towards CCS. Results were used to as input for the second step, in which a new format for public engagement named ‘focus conferences’ was tested at both sites involving a small sample of the local community. The third step consisted of making available generic as well as site-specific information to the general and local public, by: - setting up a bilingual set of information pages on the project website suitable for a lay audience; - organizing information meetings at both sites that were open to all who took interest. The fourth step consisted of a second survey among a new representative sample of the local community. The survey was largely identical to the survey in step 1 to enable the monitoring of changes in awareness, knowledge and opinions over time. Results provide insight in the way local CCS plans may be perceived by the local stakeholders, how this can be reliably assessed at early stage without raising unnecessary concerns, and how results of this inventory can be used to develop effective local communication and participation strategies. In future project development, if any, these results can be used to start up and inform the process of information provision and public engagement

    SiteChar Characterisation of European CO2 Storage Deliverable N° D8.3 Public Outreach Activities

    Get PDF
    This deliverable describes the task of making available generic and site-specific information about the SiteChar activities regarding the site explorations to the general public as well as to the local public at the Scottish site and at the Polish site. Full texts can be found in Appendices I (English) and II (Polish). Generic as well as site-specific information has been made available to the general and local public through specific sections on the SiteChar website. These activities are reported in chapter 2. Locally, information meetings have been held at the Polish site (chapter 3) and at the Scottish site (chapter 4). A conclusion is provided in chapter 5.This deliverable describes the task of making available generic and site-specific information about the SiteChar activities regarding the site explorations to the general public as well as to the local public at the Scottish site and at the Polish site. Full texts can be found in Appendices I (English) and II (Polish). Generic as well as site-specific information has been made available to the general and local public through specific sections on the SiteChar website. These activities are reported in chapter 2. Locally, information meetings have been held at the Polish site (chapter 3) and at the Scottish site (chapter 4). A conclusion is provided in chapter 5

    Integron gene cassettes harboring novel variants of D-alanine-D-alanine ligase confer high-level resistance to D-cycloserine

    Get PDF
    Antibiotic resistance poses an increasing threat to global health. To tackle this problem, the identification of principal reservoirs of antibiotic resistance genes (ARGs) plus an understanding of drivers for their evolutionary selection are important. During a PCR-based screen of antibiotic resistance genes (ARGs) associated with integrons in saliva-derived metagenomic DNA of healthy human volunteers, two novel variants of genes encoding a D-alanine-D-alanine ligase (ddl6 and ddl7) located within gene cassettes in the first position of a reverse integron were identified. Treponema denticola was identified as the likely host of the ddl cassettes. Both ddl6 and ddl7 conferred high level resistance to D-cycloserine when expressed in Escherichia coli with ddl7 conferring four-fold higher resistance to D-cycloserine compared to ddl6. A SNP was found to be responsible for this difference in resistance phenotype between the two ddl variants. Molecular dynamics simulations were used to explain the mechanism of this phenotypic change at the atomic scale. A hypothesis for the evolutionary selection of ddl containing integron gene cassettes is proposed, based on molecular docking of plant metabolites within the ATP and D-cycloserine binding pockets of Ddl

    Evaluating Face2Gene as a Tool to Identify Cornelia de Lange Syndrome by Facial Phenotypes

    Get PDF
    Characteristic or classic phenotype of Cornelia de Lange syndrome (CdLS) is associated with a recognisable facial pattern. However, the heterogeneity in causal genes and the presence of overlapping syndromes have made it increasingly difficult to diagnose only by clinical features. DeepGestalt technology, and its app Face2Gene, is having a growing impact on the diagnosis and management of genetic diseases by analysing the features of affected individuals. Here, we performed a phenotypic study on a cohort of 49 individuals harbouring causative variants in known CdLS genes in order to evaluate Face2Gene utility and sensitivity in the clinical diagnosis of CdLS. Based on the profile images of patients, a diagnosis of CdLS was within the top five predicted syndromes for 97.9% of our cases and even listed as first prediction for 83.7%. The age of patients did not seem to affect the prediction accuracy, whereas our results indicate a correlation between the clinical score and affected genes. Furthermore, each gene presents a different pattern recognition that may be used to develop new neural networks with the goal of separating different genetic subtypes in CdLS. Overall, we conclude that computer-assisted image analysis based on deep learning could support the clinical diagnosis of CdL
    corecore