Search CORE

32 research outputs found

Local Binary Patterns as a Feature Descriptor in Alignment-free Visualisation of Metagenomic Data

Author: Kouchaki Samaneh
Robertson David L.
Tapinos Avraam
Tirunagari Santosh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

Shotgun sequencing has facilitated the analysis of complex microbial communities. However, clustering and visualising these communities without prior taxonomic information is a major challenge. Feature descriptor methods can be utilised to extract these taxonomic relations from the data. Here, we present a novel approach consisting of local binary patterns (LBP) coupled with randomised singular value decomposition (RSVD) and Barnes-Hut t-stochastic neighbor embedding (BH-tSNE) to highlight the underlying taxonomic structure of the metagenomic data. The effectiveness of our approach is demonstrated using several simulated and a real metagenomic datasets

Crossref

Enlighten

Interpreting Differentiable Latent States for Healthcare Time-series Data

Author: Barnaghi Payam
Bijlani Nivedita
Chen Yu
Kouchaki Samaneh
Publication venue
Publication date: 29/11/2023
Field of study

Machine learning enables extracting clinical insights from large temporal datasets. The applications of such machine learning models include identifying disease patterns and predicting patient outcomes. However, limited interpretability poses challenges for deploying advanced machine learning in digital healthcare. Understanding the meaning of latent states is crucial for interpreting machine learning models, assuming they capture underlying patterns. In this paper, we present a concise algorithm that allows for i) interpreting latent states using highly related input features; ii) interpreting predictions using subsets of input features via latent states; and iii) interpreting changes in latent states over time. The proposed algorithm is feasible for any model that is differentiable. We demonstrate that this approach enables the identification of a daytime behavioral pattern for predicting nocturnal behavior in a real-world healthcare dataset

arXiv.org e-Print Archive

On the Effectiveness of Compact Biomedical Transformers

Author: Clifton David A.
Kouchaki Samaneh
Nouriborji Mohammadmahdi
Rohanian Omid
Publication venue
Publication date: 07/09/2022
Field of study

Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers. The natural language processing (NLP) community has developed numerous strategies to compress these models utilising techniques such as pruning, quantisation, and knowledge distillation, resulting in models that are considerably faster, smaller, and subsequently easier to use in practice. By the same token, in this paper we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset via the Masked Language Modelling (MLM) objective. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create efficient lightweight models that perform on par with their larger counterparts. All the models will be publicly available on our Huggingface profile at https://huggingface.co/nlpie and the codes used to run the experiments will be available at https://github.com/nlpie-research/Compact-Biomedical-Transformers

arXiv.org e-Print Archive

Oxford University Research Archive

Marginalised stack denoising autoencoders for metagenomic data binning

Author: Kouchaki Samaneh
Robertson David L.
Tapinos Avraam
Tirunagari Santosh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Shotgun sequencing has facilitated the analysis of complex microbial communities. Recently we have shown how local binary patterns (LBP) from image processing can be used to analyse the sequenced samples. LBP codes represent the data in a sparse high dimensional space. To improve the performance of our pipeline, marginalised stacked autoencoders are used here to learn frequent LBP codes and map the high dimensional space to a lower dimension dense space. We demonstrate its performance using both low and high complexity simulated metagenomic data and compare the performance of our method with several existing techniques including principal component analysis (PCA) in the dimension reduction step and fc-mer frequency in feature extraction step

Crossref

Enlighten

Lightweight transformers for clinical natural language processing

Author: Clifton David A
Clifton Lei
Jauncey Hannah
Kouchaki Samaneh
Merson Laura
Nooralahzadeh Farhad
Nouriborji Mohammadmahdi
Rohanian Omid
Publication venue: Cambridge University Press
Publication date: 12/01/2024
Field of study

Specialised pre-trained language models are becoming more frequent in Natural language Processing (NLP) since they can potentially outperform models trained on generic texts. BioBERT (Sanh et al., Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108, 2019) and BioClinicalBERT (Alsentzer et al., Publicly available clinical bert embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78, 2019) are two examples of such models that have shown promise in medical NLP tasks. Many of these models are overparametrised and resource-intensive, but thanks to techniques like knowledge distillation, it is possible to create smaller versions that perform almost as well as their larger counterparts. In this work, we specifically focus on development of compact language models for processing clinical texts (i.e. progress notes, discharge summaries, etc). We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning, with the number of parameters ranging from million to million. These models performed comparably to larger models such as BioBERT and ClinicalBioBERT and significantly outperformed other compact models trained on general or biomedical data. Our extensive evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks, including natural language inference, relation extraction, named entity recognition and sequence classification. To our knowledge, this is the first comprehensive study specifically focused on creating efficient and compact transformers for clinical NLP tasks. The models and code used in this study can be found on our Huggingface profile at https://huggingface.co/nlpie and Github page at https://github.com/nlpie-research/Lightweight-Clinical-Transformers, respectively, promoting reproducibility of our results

Oxford University Research Archive

The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.

Author: Anton
Avraam Tapinos
Bede Constantinides
Bellman
David L. Robertson
Hendriks
Jensen
Kotsakos
Matthew Cotten
Mitsa
My V. T. Phan
Mörchen
Nair
Samaneh Kouchaki
Shumway
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data

Multidisciplinary Digital Publishing Institute

Crossref

LSHTM Research Online

Directory of Open Access Journals

EUR Research Repository

Oxford University Research Archive

The University of Manchester - Institutional Repository

Erasmus University Digital Repository

Enlighten

A crowd of BashTheBug volunteers reproducibly and accurately measure the minimum inhibitory concentrations of 13 antitubercular drugs from photographs of 96-well broth microdilution plates

Author: Baeten Elizabeth ML
Clifton David
Crook Derrick W
CRyPTIC Consortium
Fowler Philip
Gilbertoni Cruz Ana L
Hoosdally Sarah W
Kouchaki Samaneh
Lintott Charles
Miller Grant
Peto Timothy EA
Spiers Helen
Walker Ann
Walker Timothy M
Wright Carla
Zhu Tingting
Zooniverse Vounteer Community
Publication venue: eLife Sciences Publications Ltd
Publication date: 19/05/2022
Field of study

Tuberculosis is a respiratory disease that is treatable with antibiotics. An increasing prevalence of resistance means that to ensure a good treatment outcome it is desirable to test the susceptibility of each infection to different antibiotics. Conventionally, this is done by culturing a clinical sample and then exposing aliquots to a panel of antibiotics, each being present at a pre-determined concentration, thereby determining if the sample isresistant or susceptible to each sample. The minimum inhibitory concentration (MIC) of a drug is the lowestconcentration that inhibits growth and is a more useful quantity but requires each sample to be tested at a range ofconcentrations for each drug. Using 96-well broth micro dilution plates with each well containing a lyophilised pre-determined amount of an antibiotic is a convenient and cost-effective way to measure the MICs of several drugs at once for a clinical sample. Although accurate, this is still an expensive and slow process that requires highly-skilled and experienced laboratory scientists. Here we show that, through the BashTheBug project hosted on the Zooniverse citizen science platform, a crowd of volunteers can reproducibly and accurately determine the MICs for 13 drugs and that simply taking the median or mode of 11–17 independent classifications is sufficient. There is therefore a potential role for crowds to support (but not supplant) the role of experts in antibiotic susceptibility testing

UCL Discovery

Quantitative measurement of antibiotic resistance in Mycobacterium tuberculosis reveals genetic determinants of resistance and susceptibility in a target gene approach

Author: Barilar I
Battaglia S
Borroni E
Clifton David
Crook Derrick
Earle Sarah
Fowler Philip
Giberto Cruz Ana
Hoosdally Sarah
Hunt Martin
Knaggs Jeffrey
Kouchaki Samaneh
Lachapelle Alexander
Peto Timothy
Rodger Gillian
Roohi Aysha
Thwaites Guy
Walker Ann Sarah
Wilson Daniel
Yang Yang
Zankin Alice
Publication venue: Springer Nature
Publication date: 12/01/2024
Field of study

The World Health Organization has a goal of universal drug susceptibility testing for patients with tuberculosis; however, molecular diagnostics to date have focused largely on first-line drugs and predicting binary susceptibilities. We used a multivariable linear mixed model alongside whole genome sequencing and a quantitative microtiter plate assay to relate genomic mutations to minimum inhibitory concentration in 15,211 Mycobacterium tuberculosis patient isolates from 23 countries across five continents. This identified 492 unique MIC-elevating variants across thirteen drugs, as well as 91 mutations resulting in hypersensitivity. Our results advance genetics-based diagnostics for tuberculosis and serve as a curated training/testing dataset for development of drug resistance prediction algorithms

Oxford University Research Archive

Multiview classification and dimensionality reduction of scalp and intracranial EEG data through tensor factorisation

Author: A Agresti
C Schaffer
C Wang
CD Binnie
D Nayak
E Acar
F Argoud
F Cong
F Grouiller
H Adeli
H Adeli
H Wieser
J Farquhar
J Fernandez Torre
J Levin
JH Margerison
K Kobayashi
K Nazarpour
K Vijayalakshmi
L Rokach
L Spyrou
Loukianos Spyrou
M De Lucia
M Mørup
N Acir
N Kissani
PM Kroonenberg
Q Yuan
R Janca
R Martis
S Ferdowsi
S Kouchaki
Saeid Sanei
Samaneh Kouchaki
SB Wilson
SB Wilson
SS Lodder
TG Kolda
YC Liu
Z Vahabi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2018
Field of study

Electroencephalography (EEG) signals arise as a mixture of various neural processes that occur in different spatial, frequency and temporal locations. In classification paradigms, algorithms are developed that can distinguish between these processes. In this work, we apply tensor factorisation to a set of EEG data from a group of epileptic patients and factorise the data into three modes; space, time and frequency with each mode containing a number of components or signatures. We train separate classifiers on various feature sets corresponding to complementary combinations of those modes and components and test the classification accuracy of each set. The relative influence on the classification accuracy of the respective spatial, temporal or frequency signatures can then be analysed and useful interpretations can be made. Additionaly, we show that through tensor factorisation we can perform dimensionality reduction by evaluating the classification performance with regards to the number mode components and by rejecting components with insignificant contribution to the classification accuracy

Crossref

Nottingham Trent Institutional Repository (IRep)