1,523 research outputs found
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code?
This paper discusses the limitations of evaluating Masked Language Models
(MLMs) in code completion tasks. We highlight that relying on accuracy-based
measurements may lead to an overestimation of models' capabilities by
neglecting the syntax rules of programming languages. To address these issues,
we introduce a technique called SyntaxEval in which Syntactic Capabilities are
used to enhance the evaluation of MLMs. SyntaxEval automates the process of
masking elements in the model input based on their Abstract Syntax Trees
(ASTs). We conducted a case study on two popular MLMs using data from GitHub
repositories. Our results showed negative causal effects between the node types
and MLMs' accuracy. We conclude that MLMs under study fail to predict some
syntactic capabilities
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures
Large Language Models (LLMs) for code are a family of high-parameter,
transformer-based neural networks pre-trained on massive datasets of both
natural and programming languages. These models are rapidly being employed in
commercial AI-based developer tools, such as GitHub CoPilot. However, measuring
and explaining their effectiveness on programming tasks is a challenging
proposition, given their size and complexity. The methods for evaluating and
explaining LLMs for code are inextricably linked. That is, in order to explain
a model's predictions, they must be reliably mapped to fine-grained,
understandable concepts. Once this mapping is achieved, new methods for
detailed model evaluations are possible. However, most current explainability
techniques and evaluation benchmarks focus on model robustness or individual
task performance, as opposed to interpreting model predictions.
To this end, this paper introduces ASTxplainer, an explainability method
specific to LLMs for code that enables both new methods for LLM evaluation and
visualizations of LLM predictions that aid end-users in understanding model
predictions. At its core, ASTxplainer provides an automated method for aligning
token predictions with AST nodes, by extracting and aggregating normalized
model logits within AST structures. To demonstrate the practical benefit of
ASTxplainer, we illustrate the insights that our framework can provide by
performing an empirical evaluation on 12 popular LLMs for code using a curated
dataset of the most popular GitHub projects. Additionally, we perform a user
study examining the usefulness of an ASTxplainer-derived visualization of model
predictions aimed at enabling model users to explain predictions. The results
of these studies illustrate the potential for ASTxplainer to provide insights
into LLM effectiveness, and aid end-users in understanding predictions
Amplification curve analysis: Data-driven multiplexing using real-time digital PCR
Information about the kinetics of PCR reactions are encoded in the amplification curve. However, in digital PCR (dPCR), this information is typically neglected by collapsing each amplification curve into a binary output (positive/negative). Here, we demonstrate that the large volume of raw data obtained from realtime dPCR instruments can be exploited to perform data-driven multiplexing in a single fluorescent channel using machine learning methods, by virtue of the information in the amplification curve. This new approach, referred to as amplification curve analysis (ACA), was shown using an intercalating dye (EvaGreen), reducing the cost and complexity of the assay and enabling the use of melting curve analysis for validation. As a case study, we multiplexed 3 carbapenem-resistant genes to show the impact of this approach on global challenges such as antimicrobial resistance. In the presence of single targets, we report a classification accuracy of 99.1% (N = 16188) which represents a 19.7% increase compared to multiplexing based on the final fluorescent intensity. Considering all combinations of amplification events (including coamplifications), the accuracy was shown to be 92.9% (N = 10383). To support the analysis, we derived a formula to estimate the occurrence of co-amplification in dPCR based on multivariate Poisson statistics, and suggest reducing the digital occupancy in the case of multiple targets in the same digital panel. The ACA approach takes a step towards maximizing the capabilities of existing real-time dPCR instruments and chemistries, by extracting more information from data to enable data-driven multiplexing with high accuracy. Furthermore, we expect that combining this method with existing probe-based assays will increase multiplexing capabilities significantly. We envision that once emerging point-of-care technologies can reliably capture real-time data from isothermal chemistries, the ACA method will facilitate the implementation of dPCR outside of the lab
Quantitative and rapid Plasmodium falciparum malaria diagnosis and artemisinin-resistance detection using a CMOS Lab-on-Chip platform
Early and accurate diagnosis of malaria and drug-resistance is essential to effective disease management. Available rapid malaria diagnostic tests present limitations in analytical sensitivity, drug-resistance testing and/or quantification. Conversely, diagnostic methods based on nucleic acid amplification stepped forwards owing to their high sensitivity, specificity and robustness. Nevertheless, these methods commonly rely on optical measurements and complex instrumentation which limit their applicability in resource-poor, point-of-care settings. This paper reports the specific, quantitative and fully-electronic detection of Plasmodium falciparum, the predominant malaria-causing parasite worldwide, using a Lab-on-Chip platform developed in-house. Furthermore, we demonstrate on-chip detection of C580Y, the most prevalent single-nucleotide polymorphism associated to artemisinin-resistant malaria. Real-time non-optical DNA sensing is facilitated using Ion-Sensitive Field-Effect Transistors, fabricated in unmodified complementary metal-oxide-semiconductor (CMOS) technology, coupled with loop-mediated isothermal amplification. This work holds significant potential for the development of a fully portable and quantitative malaria diagnostic that can be used as a rapid point-of-care test
Rapid detection of mobilized colistin resistance using a nucleic acid based lab-on-a-chip diagnostic system
The increasing prevalence of antimicrobial resistance is a serious threat to global public health. One of the most concerning trends is the rapid spread of Carbapenemase-Producing Organisms (CPO), where colistin has become the last-resort antibiotic treatment. The emergence of colistin resistance, including the spread of mobilized colistin resistance (mcr) genes, raises the possibility of untreatable bacterial infections and motivates the development of improved diagnostics for the detection of colistin-resistant organisms. This work demonstrates a rapid response for detecting the most recently reported mcr gene, mcr−9, using a portable and affordable lab-on-a-chip (LoC) platform, offering a promising alternative to conventional laboratory-based instruments such as real-time PCR (qPCR). The platform combines semiconductor technology, for non-optical real-time DNA sensing, with a smartphone application for data acquisition, visualization and cloud connectivity. This technology is enabled by using loop-mediated isothermal amplification (LAMP) as the chemistry for targeted DNA detection, by virtue of its high sensitivity, specificity, yield, and manageable temperature requirements. Here, we have developed the first LAMP assay for mcr−9 - showing high sensitivity (down to 100 genomic copies/reaction) and high specificity (no cross-reactivity with other mcr variants). This assay is demonstrated through supporting a hospital investigation where we analyzed nucleic acids extracted from 128 carbapenemase-producing bacteria isolated from clinical and screening samples and found that 41 carried mcr−9 (validated using whole genome sequencing). Average positive detection times were 6.58 ± 0.42 min when performing the experiments on a conventional qPCR instrument (n = 41). For validating the translation of the LAMP assay onto a LoC platform, a subset of the samples were tested (n = 20), showing average detection times of 6.83 ± 0.92 min for positive isolates (n = 14). All experiments detected mcr−9 in under 10 min, and both platforms showed no statistically significant difference (p-value > 0.05). When sample preparation and throughput capabilities are integrated within this LoC platform, the adoption of this technology for the rapid detection and surveillance of antimicrobial resistance genes will decrease the turnaround time for DNA detection and resistotyping, improving diagnostic capabilities, patient outcomes, and the management of infectious diseases
A novel hotspot specific isothermal amplification method for detection of the common PIK3CA p.H1047R breast cancer mutation
Breast cancer (BC) is a common cancer in women worldwide. Despite advances in treatment, up to 30% of women eventually relapse and die of metastatic breast cancer. Liquid biopsy analysis of circulating cell-free DNA fragments in the patients’ blood can monitor clonality and evolving mutations as a surrogate for tumour biopsy. Next generation sequencing platforms and digital droplet PCR can be used to profile circulating tumour DNA from liquid biopsies; however, they are expensive and time consuming for clinical use. Here, we report a novel strategy with proof-of-concept data that supports the usage of loop-mediated isothermal amplification (LAMP) to detect PIK3CA c.3140 A > G (H1047R), a prevalent BC missense mutation that is attributed to BC tumour growth. Allele-specific primers were designed and optimized to detect the p.H1047R variant following the USS-sbLAMP method. The assay was developed with synthetic DNA templates and validated with DNA from two breast cancer cell-lines and two patient tumour tissue samples through a qPCR instrument and finally piloted on an ISFET enabled microchip. This work sets a foundation for BC mutational profiling on a Lab-on-Chip device, to help the early detection of patient relapse and to monitor efficacy of systemic therapies for personalised cancer patient management
A missense mutation in TRAPPC6A leads to build-up of the protein, in patients with a neurodevelopmental syndrome and dysmorphic features.
Childhood onset clinical syndromes involving intellectual disability and dysmorphic features, such as polydactyly, suggest common developmental pathways link seemingly unrelated phenotypes. We identified a consanguineous family of Saudi origin with varying complex features including intellectual disability, speech delay, facial dysmorphism and polydactyly. Combining, microarray based comparative genomic hybridisation (CGH) to identify regions of homozygosity, with exome sequencing, led to the identification of homozygous mutations in five candidate genes (RSPH6A, ANKK1, AMOTL1, ALKBH8, TRAPPC6A), all of which appear to be pathogenic as predicted by Proven, SIFT and PolyPhen2 and segregate perfectly with the disease phenotype. We therefore looked for differences in expression levels of each protein in HEK293 cells, expressing either the wild-type or mutant full-length cDNA construct. Unexpectedly, wild-type TRAPPC6A appeared to be unstable, but addition of the proteasome inhibitor MG132 stabilised its expression. Mutations have previously been reported in several members of the TRAPP complex of proteins, including TRAPPC2, TRAPPC9 and TRAPPC11, resulting in disorders involving skeletal abnormalities, intellectual disability, speech impairment and developmental delay. TRAPPC6A joins a growing list of proteins belonging to the TRAPP complex, implicated in clinical syndromes with neurodevelopmental abnormalities
Deficiency in origin licensing proteins impairs cilia formation: implications for the aetiology of meier-gorlin syndrome
Mutations in ORC1, ORC4, ORC6, CDT1, and CDC6, which encode proteins required for DNA replication origin licensing, cause Meier-Gorlin syndrome (MGS), a disorder conferring microcephaly, primordial dwarfism, underdeveloped ears, and skeletal abnormalities. Mutations in ATR, which also functions during replication, can cause Seckel syndrome, a clinically related disorder. These findings suggest that impaired DNA replication could underlie the developmental defects characteristic of these disorders. Here, we show that although origin licensing capacity is impaired in all patient cells with mutations in origin licensing component proteins, this does not correlate with the rate of progression through S phase. Thus, the replicative capacity in MGS patient cells does not correlate with clinical manifestation. However, ORC1-deficient cells from MGS patients and siRNA-mediated depletion of origin licensing proteins also have impaired centrosome and centriole copy number. As a novel and unexpected finding, we show that they also display a striking defect in the rate of formation of primary cilia. We demonstrate that this impacts sonic hedgehog signalling in ORC1-deficient primary fibroblasts. Additionally, reduced growth factor-dependent signaling via primary cilia affects the kinetics of cell cycle progression following cell cycle exit and re-entry, highlighting an unexpected mechanism whereby origin licensing components can influence cell cycle progression. Finally, using a cell-based model, we show that defects in cilia function impair chondroinduction. Our findings raise the possibility that a reduced efficiency in forming cilia could contribute to the clinical features of MGS, particularly the bone development abnormalities, and could provide a new dimension for considering developmental impacts of licensing deficiency
Handheld point-of-care system for rapid detection of SARS-CoV-2 extracted RNA in under 20 min
The COVID-19 pandemic is a global health emergency characterized by the high rate of transmission and ongoing increase of cases globally. Rapid point-of-care (PoC) diagnostics to detect the causative virus, SARS-CoV-2, are urgently needed to identify and isolate patients, contain its spread and guide clinical management. In this work, we report the development of a rapid PoC diagnostic test (<20 min) based on reverse transcriptase loop-mediated isothermal amplification (RT-LAMP) and semiconductor technology for the detection of SARS-CoV-2 from extracted RNA samples. The developed LAMP assay was tested on a real-time benchtop instrument (RT-qLAMP) showing a lower limit of detection of 10 RNA copies per reaction. It was validated against extracted RNA from 183 clinical samples including 127 positive samples (screened by the CDC RT-qPCR assay). Results showed 91% sensitivity and 100% specificity when compared to RT-qPCR and average positive detection times of 15.45 ± 4.43 min. For validating the incorporation of the RT-LAMP assay onto our PoC platform (RT-eLAMP), a subset of samples was tested (n = 52), showing average detection times of 12.68 ± 2.56 min for positive samples (n = 34), demonstrating a comparable performance to a benchtop commercial instrument. Paired with a smartphone for results visualization and geolocalization, this portable diagnostic platform with secure cloud connectivity will enable real-time case identification and epidemiological surveillance
Highlights from the Pierre Auger Observatory
The Pierre Auger Observatory is the world's largest cosmic ray observatory.
Our current exposure reaches nearly 40,000 km str and provides us with an
unprecedented quality data set. The performance and stability of the detectors
and their enhancements are described. Data analyses have led to a number of
major breakthroughs. Among these we discuss the energy spectrum and the
searches for large-scale anisotropies. We present analyses of our X
data and show how it can be interpreted in terms of mass composition. We also
describe some new analyses that extract mass sensitive parameters from the 100%
duty cycle SD data. A coherent interpretation of all these recent results opens
new directions. The consequences regarding the cosmic ray composition and the
properties of UHECR sources are briefly discussed.Comment: 9 pages, 12 figures, talk given at the 33rd International Cosmic Ray
Conference, Rio de Janeiro 201
- …