Search CORE

5,244 research outputs found

Parallelized pairwise sequence alignment using CUDA on multiple GPUs

Author: Sungbo Jung
TF Smith
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Do consumers gamble to convexify?

Author: Crossley TF
Low H
Smith S
Publication venue: Journal of Economic Behavior and Organization
Publication date: 01/01/2016
Field of study

The combination of credit constraints and indivisible consumption goods may induce some risk-averse individuals to gamble to have a chance of crossing a purchasing threshold. This idea has been demonstrated theoretically, but not explored empirically. We test this idea by focusing on a key implication: income effects for individuals who choose to gamble are likely to be larger than for the general population. Using UK data on gambling wins, other windfalls and durable goods purchases, we show that winners display higher income effects than non-winners but only amongst those likely to be credit-constrained. This is consistent with credit-constrained, risk-averse agents gambling to convexify their budget set.This work was supported in part by the ESRC-funded Centre for Microeconomic Analysis of Public Policy at the Institute for Fiscal Studies (grant number RES-544-28-5001.)This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.jebo.2016.07.02

University of Essex Research Repository

Elsevier - Publisher Connector

Crossref

Oxford University Research Archive

Apollo (Cambridge)

Explore Bristol Research

Parallel approach to sliding window sums

Author: A Basak
DE Wood
H Li
H Li
I Sović
M Roberts
SF Altschul
TF Smith
V Kotu
Publication venue
Publication date: 03/09/2019
Field of study

Sliding window sums are widely used in bioinformatics applications, including sequence assembly, k-mer generation, hashing and compression. New vector algorithms which utilize the advanced vector extension (AVX) instructions available on modern processors, or the parallel compute units on GPUs and FPGAs, would provide a significant performance boost for the bioinformatics applications. We develop a generic vectorized sliding sum algorithm with speedup for window size w and number of processors P is O(P/w) for a generic sliding sum. For a sum with commutative operator the speedup is improved to O(P/log(w)). When applied to the genomic application of minimizer based k-mer table generation using AVX instructions, we obtain a speedup of over 5X.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Space-efficient Feature Maps for String Alignment Kernels

Author: CC Chang
G Cormode
H Lodhi
H Saigo
M Kanehisa
MC Ferris
RE Fan
S Kim
T Gärtner
T Hofmann
TF Smith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We address this need by presenting the first approximation for string alignment kernels, which we call space-efficient feature maps for edit distance with moves (SFMEDM), by leveraging a metric embedding named edit sensitive parsing (ESP) and feature maps (FMs) of random Fourier features (RFFs) for large-scale string analyses. The original FMs for RFFs consume a huge amount of memory proportional to the dimension d of input vectors and the dimension D of output vectors, which prohibits its large-scale applications. We present novel space-efficient feature maps (SFMs) of RFFs for a space reduction from O(dD) of the original FMs to O(d) of SFMs with a theoretical guarantee with respect to concentration bounds. We experimentally test SFMEDM on its ability to learn SVM for large-scale string classifications with various massive string data, and we demonstrate the superior performance of SFMEDM with respect to prediction accuracy, scalability and computation efficiency.Comment: Full version for ICDM'19 pape

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

Author: A Lavecchia
ACA Nascimento
C-C Chang
D Rogers
H Ding
L Jacob
M Bouchard
M Gonen
M Hattori
MN Drwal
S Daminelli
T Laarhoven van
T Laarhoven van
TF Smith
Y Liu
Y Yamanishi
Publication venue
Publication date: 29/06/2017
Field of study

Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent

arXiv.org e-Print Archive

Crossref

Bioisosteric similarity of drugs in virtual screening

Author: GA Patani
M Krier
Markus Krier
Michael C Hutter
RP Sheridan
S Henikoff
TF Smith
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

galign: A Tool for Rapid Genome Polymorphism Discovery

Author: B Langmead
EA Perens
H Li
Ilya Ruvinsky
R Li
S Sarin
Shai Shaham
TF Smith
Z Ning
Publication venue: Public Library of Science
Publication date: 25/09/2009
Field of study

BACKGROUND: Highly parallel sequencing technologies have become important tools in the analysis of sequence polymorphisms on a genomic scale. However, the development of customized software to analyze data produced by these methods has lagged behind. METHODS/PRINCIPAL FINDINGS: Here I describe a tool, 'galign', designed to identify polymorphisms between sequence reads obtained using Illumina/Solexa technology and a reference genome. The 'galign' alignment tool does not use Smith-Waterman matrices for sequence comparisons. Instead, a simple algorithm comparing parsed sequence reads to parsed reference genome sequences is used. 'galign' output is geared towards immediate user application, displaying polymorphism locations, nucleotide changes, and relevant predicted amino-acid changes for ease of information processing. To do so, 'galign' requires several accessory files easily derived from an annotated reference genome. Direct sequencing as well as in silico studies demonstrate that 'galign' provides lesion predictions comparable in accuracy to available prediction programs, accompanied by greater processing speed and more user-friendly output. We demonstrate the use of 'galign' to identify mutations leading to phenotypic consequences in C. elegans. CONCLUSION/SIGNIFICANCE: Our studies suggest that 'galign' is a useful tool for polymorphism discovery, and is of immediate utility for sequence mining in C. elegans

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Clustering exact matches of pairwise sequence alignments by weighted linear regression

Author: Alvaro J González
F Sanger
Li Liao
PA Pevzner
S Kurtz
SF Altschul
TF Smith
WJ Kent
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless. Results We have developed an algorithm that uses the coordinates of all the exact matches or high similarity local alignments, clusters them with respect to the main diagonal in the dot plot using a weighted linear regression technique, and identifies the starting and ending coordinates of the region of interest. Conclusion This algorithm complements existing pairwise sequence alignment packages by replacing the time-consuming seed extension phase with a weighted linear regression for the alignment seeds. It was experimentally shown that the gain in execution time can be outstanding without compromising the accuracy. This method should be of great utility to sequence assembly and genome comparison projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multipurpose High Frequency Electron Spin Resonance Spectrometer for Condensed Matter Research

Author: A Rockenbauer
AL Barra
András Jánossy
B Náfrádi
B Náfrádi
CP Slichter
D Goldfarb
Dario Quintavalle
DF Smith
E Reijerse
EJ Reijerse
G Klupp
GM Smith
H Blok
H Ohta
J Tol van
KA Earle
KA Earle
KA Earle
KL Nagy
Kálmán L. Nagy
M Bennati
MM Hertel
ND Kushch
TF Prisner
The LNCMP-team
Titusz Fehér
V Brouet
WB Lynch
Y Kubozono
YA Grishin
Á Antal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/10/2009
Field of study

We describe a quasi-optical multifrequency ESR spectrometer operating in the 75-225 GHz range and optimized at 210 GHz for general use in condensed matter physics, chemistry and biology. The quasi-optical bridge detects the change of mm wave polarization at the ESR. A controllable reference arm maintains a mm wave bias at the detector. The attained sensitivity of 2x10^10 spin/G/(Hz)1/2, measured on a dilute Mn:MgO sample in a non-resonant probe head at 222.4 GHz and 300 K, is comparable to commercial high sensitive X band spectrometers. The spectrometer has a Fabry-Perot resonator based probe head to measure aqueous solutions, and a probe head to measure magnetic field angular dependence of single crystals. The spectrometer is robust and easy to use and may be operated by undergraduate students. Its performance is demonstrated by examples from various fields of condensed matter physics.Comment: submitted to Journal of Magnetic Resonanc

arXiv.org e-Print Archive

Crossref

Neutron studies of Na-ion battery materials

Author: Brett DJL
Cullen PL
Hack J
Headen TF
Howard CA
Miller TS
Neville TP
Shah AR
Shutt RRC
Smith K
Publication venue: 'IOP Publishing'
Publication date: 01/10/2021
Field of study

The relative vast abundance and more equitable global distribution of terrestrial sodium makes sodium-ion batteries (NIBs) potentially cheaper and more sustainable alternatives to commercial lithium-ion batteries (LIBs). However, the practical capacities and cycle lives of NIBs at present do not match those of LIBs and have therefore hindered their progress to commercialisation. The present drawback of NIB technology stems largely from the electrode materials and their associated Na+ion storage mechanisms. Increased understanding of the electrochemical storage mechanisms and kinetics is therefore vital for the development of current and novel materials to realise the commercial NIB. In contrast to x-ray techniques, the non-dependency of neutron scattering on the atomic number of elements (Z) can substantially increase the scattering contrast of small elements such as sodium and carbon, making neutron techniques powerful for the investigation of NIB electrode materials. Moreover, neutrons are far more penetrating which enables more complex sample environments including in situ and operando studies. Here, we introduce the theory of, and review the use of, neutron diffraction and quasi-elastic neutron scattering, to investigate the structural and dynamic properties of electrode and electrolyte materials for NIBs. To improve our understanding of the actual sodium storage mechanisms and identify intermediate stages during charge/discharge, ex situ, in situ, and operando neutron experiments are required. However, to date there are few studies where operando experiments are conducted during electrochemical cycling. This highlights an opportunity for research to elucidate the operating mechanisms within NIB materials that are under much debate at present

UCL Discovery

Queen Mary Research Online