1,297 research outputs found

    Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

    Full text link
    Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural networks pre-trained on massive datasets of both natural and programming languages. These models are rapidly being employed in commercial AI-based developer tools, such as GitHub CoPilot. However, measuring and explaining their effectiveness on programming tasks is a challenging proposition, given their size and complexity. The methods for evaluating and explaining LLMs for code are inextricably linked. That is, in order to explain a model's predictions, they must be reliably mapped to fine-grained, understandable concepts. Once this mapping is achieved, new methods for detailed model evaluations are possible. However, most current explainability techniques and evaluation benchmarks focus on model robustness or individual task performance, as opposed to interpreting model predictions. To this end, this paper introduces ASTxplainer, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes, by extracting and aggregating normalized model logits within AST structures. To demonstrate the practical benefit of ASTxplainer, we illustrate the insights that our framework can provide by performing an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects. Additionally, we perform a user study examining the usefulness of an ASTxplainer-derived visualization of model predictions aimed at enabling model users to explain predictions. The results of these studies illustrate the potential for ASTxplainer to provide insights into LLM effectiveness, and aid end-users in understanding predictions

    Amplification curve analysis: Data-driven multiplexing using real-time digital PCR

    Get PDF
    Information about the kinetics of PCR reactions are encoded in the amplification curve. However, in digital PCR (dPCR), this information is typically neglected by collapsing each amplification curve into a binary output (positive/negative). Here, we demonstrate that the large volume of raw data obtained from realtime dPCR instruments can be exploited to perform data-driven multiplexing in a single fluorescent channel using machine learning methods, by virtue of the information in the amplification curve. This new approach, referred to as amplification curve analysis (ACA), was shown using an intercalating dye (EvaGreen), reducing the cost and complexity of the assay and enabling the use of melting curve analysis for validation. As a case study, we multiplexed 3 carbapenem-resistant genes to show the impact of this approach on global challenges such as antimicrobial resistance. In the presence of single targets, we report a classification accuracy of 99.1% (N = 16188) which represents a 19.7% increase compared to multiplexing based on the final fluorescent intensity. Considering all combinations of amplification events (including coamplifications), the accuracy was shown to be 92.9% (N = 10383). To support the analysis, we derived a formula to estimate the occurrence of co-amplification in dPCR based on multivariate Poisson statistics, and suggest reducing the digital occupancy in the case of multiple targets in the same digital panel. The ACA approach takes a step towards maximizing the capabilities of existing real-time dPCR instruments and chemistries, by extracting more information from data to enable data-driven multiplexing with high accuracy. Furthermore, we expect that combining this method with existing probe-based assays will increase multiplexing capabilities significantly. We envision that once emerging point-of-care technologies can reliably capture real-time data from isothermal chemistries, the ACA method will facilitate the implementation of dPCR outside of the lab

    Quantitative and rapid Plasmodium falciparum malaria diagnosis and artemisinin-resistance detection using a CMOS Lab-on-Chip platform

    Get PDF
    Early and accurate diagnosis of malaria and drug-resistance is essential to effective disease management. Available rapid malaria diagnostic tests present limitations in analytical sensitivity, drug-resistance testing and/or quantification. Conversely, diagnostic methods based on nucleic acid amplification stepped forwards owing to their high sensitivity, specificity and robustness. Nevertheless, these methods commonly rely on optical measurements and complex instrumentation which limit their applicability in resource-poor, point-of-care settings. This paper reports the specific, quantitative and fully-electronic detection of Plasmodium falciparum, the predominant malaria-causing parasite worldwide, using a Lab-on-Chip platform developed in-house. Furthermore, we demonstrate on-chip detection of C580Y, the most prevalent single-nucleotide polymorphism associated to artemisinin-resistant malaria. Real-time non-optical DNA sensing is facilitated using Ion-Sensitive Field-Effect Transistors, fabricated in unmodified complementary metal-oxide-semiconductor (CMOS) technology, coupled with loop-mediated isothermal amplification. This work holds significant potential for the development of a fully portable and quantitative malaria diagnostic that can be used as a rapid point-of-care test

    Rapid detection of mobilized colistin resistance using a nucleic acid based lab-on-a-chip diagnostic system

    Get PDF
    The increasing prevalence of antimicrobial resistance is a serious threat to global public health. One of the most concerning trends is the rapid spread of Carbapenemase-Producing Organisms (CPO), where colistin has become the last-resort antibiotic treatment. The emergence of colistin resistance, including the spread of mobilized colistin resistance (mcr) genes, raises the possibility of untreatable bacterial infections and motivates the development of improved diagnostics for the detection of colistin-resistant organisms. This work demonstrates a rapid response for detecting the most recently reported mcr gene, mcr−9, using a portable and affordable lab-on-a-chip (LoC) platform, offering a promising alternative to conventional laboratory-based instruments such as real-time PCR (qPCR). The platform combines semiconductor technology, for non-optical real-time DNA sensing, with a smartphone application for data acquisition, visualization and cloud connectivity. This technology is enabled by using loop-mediated isothermal amplification (LAMP) as the chemistry for targeted DNA detection, by virtue of its high sensitivity, specificity, yield, and manageable temperature requirements. Here, we have developed the first LAMP assay for mcr−9 - showing high sensitivity (down to 100 genomic copies/reaction) and high specificity (no cross-reactivity with other mcr variants). This assay is demonstrated through supporting a hospital investigation where we analyzed nucleic acids extracted from 128 carbapenemase-producing bacteria isolated from clinical and screening samples and found that 41 carried mcr−9 (validated using whole genome sequencing). Average positive detection times were 6.58 ± 0.42 min when performing the experiments on a conventional qPCR instrument (n = 41). For validating the translation of the LAMP assay onto a LoC platform, a subset of the samples were tested (n = 20), showing average detection times of 6.83 ± 0.92 min for positive isolates (n = 14). All experiments detected mcr−9 in under 10 min, and both platforms showed no statistically significant difference (p-value > 0.05). When sample preparation and throughput capabilities are integrated within this LoC platform, the adoption of this technology for the rapid detection and surveillance of antimicrobial resistance genes will decrease the turnaround time for DNA detection and resistotyping, improving diagnostic capabilities, patient outcomes, and the management of infectious diseases

    A novel hotspot specific isothermal amplification method for detection of the common PIK3CA p.H1047R breast cancer mutation

    Get PDF
    Breast cancer (BC) is a common cancer in women worldwide. Despite advances in treatment, up to 30% of women eventually relapse and die of metastatic breast cancer. Liquid biopsy analysis of circulating cell-free DNA fragments in the patients’ blood can monitor clonality and evolving mutations as a surrogate for tumour biopsy. Next generation sequencing platforms and digital droplet PCR can be used to profile circulating tumour DNA from liquid biopsies; however, they are expensive and time consuming for clinical use. Here, we report a novel strategy with proof-of-concept data that supports the usage of loop-mediated isothermal amplification (LAMP) to detect PIK3CA c.3140 A > G (H1047R), a prevalent BC missense mutation that is attributed to BC tumour growth. Allele-specific primers were designed and optimized to detect the p.H1047R variant following the USS-sbLAMP method. The assay was developed with synthetic DNA templates and validated with DNA from two breast cancer cell-lines and two patient tumour tissue samples through a qPCR instrument and finally piloted on an ISFET enabled microchip. This work sets a foundation for BC mutational profiling on a Lab-on-Chip device, to help the early detection of patient relapse and to monitor efficacy of systemic therapies for personalised cancer patient management

    A missense mutation in TRAPPC6A leads to build-up of the protein, in patients with a neurodevelopmental syndrome and dysmorphic features.

    Get PDF
    Childhood onset clinical syndromes involving intellectual disability and dysmorphic features, such as polydactyly, suggest common developmental pathways link seemingly unrelated phenotypes. We identified a consanguineous family of Saudi origin with varying complex features including intellectual disability, speech delay, facial dysmorphism and polydactyly. Combining, microarray based comparative genomic hybridisation (CGH) to identify regions of homozygosity, with exome sequencing, led to the identification of homozygous mutations in five candidate genes (RSPH6A, ANKK1, AMOTL1, ALKBH8, TRAPPC6A), all of which appear to be pathogenic as predicted by Proven, SIFT and PolyPhen2 and segregate perfectly with the disease phenotype. We therefore looked for differences in expression levels of each protein in HEK293 cells, expressing either the wild-type or mutant full-length cDNA construct. Unexpectedly, wild-type TRAPPC6A appeared to be unstable, but addition of the proteasome inhibitor MG132 stabilised its expression. Mutations have previously been reported in several members of the TRAPP complex of proteins, including TRAPPC2, TRAPPC9 and TRAPPC11, resulting in disorders involving skeletal abnormalities, intellectual disability, speech impairment and developmental delay. TRAPPC6A joins a growing list of proteins belonging to the TRAPP complex, implicated in clinical syndromes with neurodevelopmental abnormalities

    Deficiency in origin licensing proteins impairs cilia formation: implications for the aetiology of meier-gorlin syndrome

    Get PDF
    Mutations in ORC1, ORC4, ORC6, CDT1, and CDC6, which encode proteins required for DNA replication origin licensing, cause Meier-Gorlin syndrome (MGS), a disorder conferring microcephaly, primordial dwarfism, underdeveloped ears, and skeletal abnormalities. Mutations in ATR, which also functions during replication, can cause Seckel syndrome, a clinically related disorder. These findings suggest that impaired DNA replication could underlie the developmental defects characteristic of these disorders. Here, we show that although origin licensing capacity is impaired in all patient cells with mutations in origin licensing component proteins, this does not correlate with the rate of progression through S phase. Thus, the replicative capacity in MGS patient cells does not correlate with clinical manifestation. However, ORC1-deficient cells from MGS patients and siRNA-mediated depletion of origin licensing proteins also have impaired centrosome and centriole copy number. As a novel and unexpected finding, we show that they also display a striking defect in the rate of formation of primary cilia. We demonstrate that this impacts sonic hedgehog signalling in ORC1-deficient primary fibroblasts. Additionally, reduced growth factor-dependent signaling via primary cilia affects the kinetics of cell cycle progression following cell cycle exit and re-entry, highlighting an unexpected mechanism whereby origin licensing components can influence cell cycle progression. Finally, using a cell-based model, we show that defects in cilia function impair chondroinduction. Our findings raise the possibility that a reduced efficiency in forming cilia could contribute to the clinical features of MGS, particularly the bone development abnormalities, and could provide a new dimension for considering developmental impacts of licensing deficiency

    Handheld point-of-care system for rapid detection of SARS-CoV-2 extracted RNA in under 20 min

    Get PDF
    The COVID-19 pandemic is a global health emergency characterized by the high rate of transmission and ongoing increase of cases globally. Rapid point-of-care (PoC) diagnostics to detect the causative virus, SARS-CoV-2, are urgently needed to identify and isolate patients, contain its spread and guide clinical management. In this work, we report the development of a rapid PoC diagnostic test (<20 min) based on reverse transcriptase loop-mediated isothermal amplification (RT-LAMP) and semiconductor technology for the detection of SARS-CoV-2 from extracted RNA samples. The developed LAMP assay was tested on a real-time benchtop instrument (RT-qLAMP) showing a lower limit of detection of 10 RNA copies per reaction. It was validated against extracted RNA from 183 clinical samples including 127 positive samples (screened by the CDC RT-qPCR assay). Results showed 91% sensitivity and 100% specificity when compared to RT-qPCR and average positive detection times of 15.45 ± 4.43 min. For validating the incorporation of the RT-LAMP assay onto our PoC platform (RT-eLAMP), a subset of samples was tested (n = 52), showing average detection times of 12.68 ± 2.56 min for positive samples (n = 34), demonstrating a comparable performance to a benchtop commercial instrument. Paired with a smartphone for results visualization and geolocalization, this portable diagnostic platform with secure cloud connectivity will enable real-time case identification and epidemiological surveillance

    Highlights from the Pierre Auger Observatory

    Full text link
    The Pierre Auger Observatory is the world's largest cosmic ray observatory. Our current exposure reaches nearly 40,000 km2^2 str and provides us with an unprecedented quality data set. The performance and stability of the detectors and their enhancements are described. Data analyses have led to a number of major breakthroughs. Among these we discuss the energy spectrum and the searches for large-scale anisotropies. We present analyses of our Xmax_{max} data and show how it can be interpreted in terms of mass composition. We also describe some new analyses that extract mass sensitive parameters from the 100% duty cycle SD data. A coherent interpretation of all these recent results opens new directions. The consequences regarding the cosmic ray composition and the properties of UHECR sources are briefly discussed.Comment: 9 pages, 12 figures, talk given at the 33rd International Cosmic Ray Conference, Rio de Janeiro 201

    Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015

    Get PDF
    SummaryBackground The Global Burden of Diseases, Injuries, and Risk Factors Study 2015 provides an up-to-date synthesis of the evidence for risk factor exposure and the attributable burden of disease. By providing national and subnational assessments spanning the past 25 years, this study can inform debates on the importance of addressing risks in context. Methods We used the comparative risk assessment framework developed for previous iterations of the Global Burden of Disease Study to estimate attributable deaths, disability-adjusted life-years (DALYs), and trends in exposure by age group, sex, year, and geography for 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks from 1990 to 2015. This study included 388 risk-outcome pairs that met World Cancer Research Fund-defined criteria for convincing or probable evidence. We extracted relative risk and exposure estimates from randomised controlled trials, cohorts, pooled cohorts, household surveys, census data, satellite data, and other sources. We used statistical models to pool data, adjust for bias, and incorporate covariates. We developed a metric that allows comparisons of exposure across risk factors—the summary exposure value. Using the counterfactual scenario of theoretical minimum risk level, we estimated the portion of deaths and DALYs that could be attributed to a given risk. We decomposed trends in attributable burden into contributions from population growth, population age structure, risk exposure, and risk-deleted cause-specific DALY rates. We characterised risk exposure in relation to a Socio-demographic Index (SDI). Findings Between 1990 and 2015, global exposure to unsafe sanitation, household air pollution, childhood underweight, childhood stunting, and smoking each decreased by more than 25%. Global exposure for several occupational risks, high body-mass index (BMI), and drug use increased by more than 25% over the same period. All risks jointly evaluated in 2015 accounted for 57·8% (95% CI 56·6–58·8) of global deaths and 41·2% (39·8–42·8) of DALYs. In 2015, the ten largest contributors to global DALYs among Level 3 risks were high systolic blood pressure (211·8 million [192·7 million to 231·1 million] global DALYs), smoking (148·6 million [134·2 million to 163·1 million]), high fasting plasma glucose (143·1 million [125·1 million to 163·5 million]), high BMI (120·1 million [83·8 million to 158·4 million]), childhood undernutrition (113·3 million [103·9 million to 123·4 million]), ambient particulate matter (103·1 million [90·8 million to 115·1 million]), high total cholesterol (88·7 million [74·6 million to 105·7 million]), household air pollution (85·6 million [66·7 million to 106·1 million]), alcohol use (85·0 million [77·2 million to 93·0 million]), and diets high in sodium (83·0 million [49·3 million to 127·5 million]). From 1990 to 2015, attributable DALYs declined for micronutrient deficiencies, childhood undernutrition, unsafe sanitation and water, and household air pollution; reductions in risk-deleted DALY rates rather than reductions in exposure drove these declines. Rising exposure contributed to notable increases in attributable DALYs from high BMI, high fasting plasma glucose, occupational carcinogens, and drug use. Environmental risks and childhood undernutrition declined steadily with SDI; low physical activity, high BMI, and high fasting plasma glucose increased with SDI. In 119 countries, metabolic risks, such as high BMI and fasting plasma glucose, contributed the most attributable DALYs in 2015. Regionally, smoking still ranked among the leading five risk factors for attributable DALYs in 109 countries; childhood underweight and unsafe sex remained primary drivers of early death and disability in much of sub-Saharan Africa. Interpretation Declines in some key environmental risks have contributed to declines in critical infectious diseases. Some risks appear to be invariant to SDI. Increasing risks, including high BMI, high fasting plasma glucose, drug use, and some occupational exposures, contribute to rising burden from some conditions, but also provide opportunities for intervention. Some highly preventable risks, such as smoking, remain major causes of attributable DALYs, even as exposure is declining. Public policy makers need to pay attention to the risks that are increasingly major contributors to global burden. Funding Bill & Melinda Gates Foundation
    corecore