15 research outputs found
SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation
In recent years, there has been growing interest in text-to-SQL translation,
which is the task of converting natural language questions into executable SQL
queries. This technology is important for its potential to democratize data
extraction from databases. However, some of its key hurdles include domain
generalisation, which is the ability to adapt to previously unseen databases,
and alignment of natural language questions with the corresponding SQL queries.
To overcome these challenges, we introduce SQLformer, a novel Transformer
architecture specifically crafted to perform text-to-SQL translation tasks. Our
model predicts SQL queries as abstract syntax trees (ASTs) in an autoregressive
way, incorporating structural inductive bias in the encoder and decoder layers.
This bias, guided by database table and column selection, aids the decoder in
generating SQL query ASTs represented as graphs in a Breadth-First Search
canonical order. Comprehensive experiments illustrate the state-of-the-art
performance of SQLformer in the challenging text-to-SQL Spider benchmark. Our
implementation is available at https://github.com/AdrianBZG/SQLformerComment: 11 pages, 4 figure
Unsupervised Fact Verification by Language Model Distillation
Unsupervised fact verification aims to verify a claim using evidence from a
trustworthy knowledge base without any kind of data annotation. To address this
challenge, algorithms must produce features for every claim that are both
semantically meaningful, and compact enough to find a semantic alignment with
the source information. In contrast to previous work, which tackled the
alignment problem by learning over annotated corpora of claims and their
corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via
Language Model Distillation), a novel unsupervised framework that leverages
pre-trained language models to distil self-supervised features into
high-quality claim-fact alignments without the need for annotations. This is
enabled by a novel contrastive loss function that encourages features to attain
high-quality claim and evidence alignments whilst preserving the semantic
relationships across the corpora. Notably, we present results that achieve a
new state-of-the-art on the standard FEVER fact verification benchmark (+8%
accuracy) with linear evaluation
A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies
The development of machine learning systems for the diagnosis of rare
diseases is challenging mainly due the lack of data to study them. Despite this
challenge, this paper proposes a system for the Computer Aided Diagnosis (CAD)
of low-prevalence, congenital muscular dystrophies from confocal microscopy
images. The proposed CAD system relies on a Convolutional Neural Network (CNN)
which performs an independent classification for non-overlapping patches tiling
the input image, and generates an overall decision summarizing the individual
decisions for the patches on the query image. This decision scheme points to
the possibly problematic areas in the input images and provides a global
quantitative evaluation of the state of the patients, which is fundamental for
diagnosis and to monitor the efficiency of therapies.Comment: Submitted for review to Expert Systems With Application
Recommended from our members
Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.
A major cause of failed drug discovery programs is suboptimal target selection, resulting in the development of drug candidates that are potent inhibitors, but ineffective at treating the disease. In the genomics era, the availability of large biomedical datasets with genome-wide readouts has the potential to transform target selection and validation. In this study we investigate how computational intelligence methods can be applied to predict novel therapeutic targets in oncology. We compared different machine learning classifiers applied to the task of drug target classification for nine different human cancer types. For each cancer type, a set of "known" target genes was obtained and equally-sized sets of "non-targets" were sampled multiple times from the human protein-coding genes. Models were trained on mutation, gene expression (TCGA), and gene essentiality (DepMap) data. In addition, we generated a numerical embedding of the interaction network of protein-coding genes using deep network representation learning and included the results in the modeling. We assessed feature importance using a random forests classifier and performed feature selection based on measuring permutation importance against a null distribution. Our best models achieved good generalization performance based on the AUROC metric. With the best model for each cancer type, we ran predictions on more than 15,000 protein-coding genes to identify potential novel targets. Our results indicate that this approach may be useful to inform early stages of the drug discovery pipeline.Innovate UK Knowledge Transfership Programme grant KTP01126
Translating synthetic natural language to database queries with a polyglot deep learning framework
Abstract: The number of databases as well as their size and complexity is increasing. This creates a barrier to use especially for non-experts, who have to come to grips with the nature of the data, the way it has been represented in the database, and the specific query languages or user interfaces by which data are accessed. These difficulties worsen in research settings, where it is common to work with many different databases. One approach to improving this situation is to allow users to pose their queries in natural language. In this work we describe a machine learning framework, Polyglotter, that in a general way supports the mapping of natural language searches to database queries. Importantly, it does not require the creation of manually annotated data for training and therefore can be applied easily to multiple domains. The framework is polyglot in the sense that it supports multiple different database engines that are accessed with a variety of query languages, including SQL and Cypher. Furthermore Polyglotter supports multi-class queries. Good performance is achieved on both toy and real databases, as well as a human-annotated WikiSQL query set. Thus Polyglotter may help database maintainers make their resources more accessible
A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies
The development of machine learning systems for the diagnosis of rare
diseases is challenging mainly due the lack of data to study them. Despite this
challenge, this paper proposes a system for the Computer Aided Diagnosis (CAD)
of low-prevalence, congenital muscular dystrophies from confocal microscopy
images. The proposed CAD system relies on a Convolutional Neural Network (CNN)
which performs an independent classification for non-overlapping patches tiling
the input image, and generates an overall decision summarizing the individual
decisions for the patches on the query image. This decision scheme points to
the possibly problematic areas in the input images and provides a global
quantitative evaluation of the state of the patients, which is fundamental for
diagnosis and to monitor the efficiency of therapies.Comment: Submitted for review to Expert Systems With Application
Synthesis and characterization of M(II) phosphonates (M = Fe, Co, Zn, Mn) as precursors for PEMFCs electrocatalysts
Metal phosphonates are promising precursors for applications such as proton conductivity [1] and catalysis [2]. Specifically, upon calcination metal polyphosphates are generated that can be used as non-noble metal alternatives [3] to the highly expensive commercial catalysts (Pt) for proton exchange membrane fuel cells (PEMFCs).
In this work, we present the synthesis and characterization of metal polyphosphates obtained from transition divalent metal phosphonates (M= Fe, Mn, Co and Zn) both as monometallic and bimetallic systems (solid solutions). For the preparation of the metal phosphonate precursors, two types of organic linkers were selected, i.e. 2-R,S-hydroxiphosphonoacetic acid [HO3PCH(OH)COOH, HPAA] and nitrilotrismethylenephosphonic acid [N(CH2PO3H2)3, ATMP]. The as synthesized compounds were calcined between 700 and 1000 ºC under N2. Depending on the metal/phosphorous molar ratio in the precursor phases, different compositions were found, the corresponding metal pyrophosphate being the major component according to the crystallographic data. Interestingly, in most of cases the solid solutions were preserved in the final product, for instance Fe-Mn, Fe-Co and Fe-Zn. All calcined materials have been also characterized by XPS, SEM/EDS, FTIR-Raman.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Transition metal hydroxyphosphonoacetates as precursors of electrocatalysts
Contribución a CongresosCoordination polymers (PCs) are widely studied due to their applicability in many fields. Among them, metal phosphonates are attractive materials due to their great structural and functional diversity, as proton conductors and/or precursors of electrocatalysts, alternative to the high-cost commercial catalysts based on noble metals, for both, PEMFCs and electrolytic systems.
In this research-work, we report the synthesis, characterization and electrochemical properties of several coordination polymers derived from (R,S)-2-hydroxyphosphonoacetic acid (HPAA) with transition metals (MII = Fe, Co, Mn, Ni) as well as their solid solutions. The precursor PCs decompose, upon heating in different conditions, to the corresponding metal oxalate solid solutions, which are then used as intermediate materials for obtaining new Non-Precious Metal Electrocatalysts (NPMCs), by pyrolytic treatment at different temperatures under N2/H2 atmospheres. The electrochemical behavior of these compounds, regarding to the Oxygen Evolution and Reduction Reactions (OER and ORR, respectively), show that the structural features are of considerable importance as to their electrocatalytic activities