748 research outputs found

    Genogroup IV and VI canine noroviruses interact with histo-blood group antigens.

    Get PDF
    UNLABELLED: Human noroviruses (HuNV) are a significant cause of viral gastroenteritis in humans worldwide. HuNV attaches to cell surface carbohydrate structures known as histo-blood group antigens (HBGAs) prior to internalization, and HBGA polymorphism among human populations is closely linked to susceptibility to HuNV. Noroviruses are divided into 6 genogroups, with human strains grouped into genogroups I (GI), II, and IV. Canine norovirus (CNV) is a recently discovered pathogen in dogs, with strains classified into genogroups IV and VI. Whereas it is known that GI to GIII noroviruses bind to HBGAs and GV noroviruses recognize terminal sialic acid residues, the attachment factors for GIV and GVI noroviruses have not been reported. This study sought to determine the carbohydrate binding specificity of CNV and to compare it to the binding specificities of noroviruses from other genogroups. A panel of synthetic oligosaccharides were used to assess the binding specificity of CNV virus-like particles (VLPs) and identified α1,2-fucose as a key attachment factor. CNV VLP binding to canine saliva and tissue samples using enzyme-linked immunosorbent assays (ELISAs) and immunohistochemistry confirmed that α1,2-fucose-containing H and A antigens of the HBGA family were recognized by CNV. Phenotyping studies demonstrated expression of these antigens in a population of dogs. The virus-ligand interaction was further characterized using blockade studies, cell lines expressing HBGAs, and enzymatic removal of candidate carbohydrates from tissue sections. Recognition of HBGAs by CNV provides new insights into the evolution of noroviruses and raises concerns regarding the potential for zoonotic transmission of CNV to humans. IMPORTANCE: Infections with human norovirus cause acute gastroenteritis in millions of people each year worldwide. Noroviruses can also affect nonhuman species and are divided into 6 different groups based on their capsid sequences. Human noroviruses in genogroups I and II interact with histo-blood group antigen carbohydrates, bovine noroviruses (genogroup III) interact with alpha-galactosidase (α-Gal) carbohydrates, and murine norovirus (genogroup V) recognizes sialic acids. The canine-specific strains of norovirus are grouped into genogroups IV and VI, and this study is the first to characterize which carbohydrate structures they can recognize. Using canine norovirus virus-like particles, this work shows that representative genogroup IV and VI viruses can interact with histo-blood group antigens. The binding specificity of canine noroviruses is therefore very similar to that of the human norovirus strains classified into genogroups I and II. This raises interesting questions about the evolution of noroviruses and suggests it may be possible for canine norovirus to infect humans.The authors would like to thank Wood Green Animal Shelter for allowing SC to collect canine saliva samples, and Dr. Nathalie Ruvoën-Clouet and Béatrice Vaidye for the preparation of the anti-CNV antibodies. The authors also thank Dr. Takane Katayama (Ishikawa Prefectural University, Nonoichi, Ishikawa, Japan) for his generous gift of 1,2fucosidase and the Cellular and Tissular Imaging core facility of the Nantes University (MicroPiCell). This collaborative project was greatly facilitated by the Society of Microbiology’s President’s Fund awarded to SC and by the Region des Pays de la Loire ARMINA project. This work was supported by a PhD studentship from the Medical Research Council to SC and a Wellcome Trust Senior Fellowship to IG (Ref: WT097997MA). IG is a Wellcome Senior Fellow.This is the final published version. It's also available from ASM at http://jvi.asm.org/content/88/18/10377.long

    Carcinoma-associated fucosylated antigens are markers of the epithelial state and can contribute to cell adhesion through CLEC17A (Prolectin)

    Get PDF
    International audienceTerminal fucosylated motifs of glycoproteins and glycolipid chains are often altered in cancer cells. We investigated the link between fucosylation changes and critical steps in cancer progression: epithelial-to-mesenchymal transition (EMT) and lymph node metastasis. Using mammary cell lines, we demonstrate that during EMT, expression of some fucosylated antigens (e.g.: Lewis Y) is decreased as a result of repression of the fucosyltransferase genes FUT1 and FUT3. Moreover, we identify the fucose-binding bacterial lectin BC2L-C-Nt as a specific probe for the epithelial state. Prolectin (CLEC17A), a human lectin found on lymph node B cells, shares ligand specificities with BC2L-C-Nt. It binds preferentially to epithelial rather than to mesenchymal cells, and microfluidic experiments showed that prolectin behaves as a cell adhesion molecule for epithelial cells. Comparison of paired primary tumors/ lymph node metastases revealed an increase of prolectin staining in metastasis and high FUT1 and FUT3 mRNA expression was associated with poor prognosis. Our data suggest that tumor cells invading the lymph nodes and expressing fucosylated motifs associated with the epithelial state could use prolectin as a colonization factor

    What is the correct cost functional for variational data assimilation?

    Get PDF
    Variational approaches to data assimilation, and weakly constrained four dimensional variation (WC-4DVar) in particular, are important in the geosciences but also in other communities (often under different names). The cost functions and the resulting optimal trajectories may have a probabilistic interpretation, for instance by linking data assimilation with maximum aposteriori (MAP) estimation. This is possible in particular if the unknown trajectory is modelled as the solution of a stochastic differential equation (SDE), as is increasingly the case in weather forecasting and climate modelling. In this situation, the MAP estimator (or “most probable path” of the SDE) is obtained by minimising the Onsager–Machlup functional. Although this fact is well known, there seems to be some confusion in the literature, with the energy (or “least squares”) functional sometimes been claimed to yield the most probable path. The first aim of this paper is to address this confusion and show that the energy functional does not, in general, provide the most probable path. The second aim is to discuss the implications in practice. Although the mentioned results pertain to stochastic models in continuous time, they do have consequences in practice where SDE’s are approximated by discrete time schemes. It turns out that using an approximation to the SDE and calculating its most probable path does not necessarily yield a good approximation to the most probable path of the SDE proper. This suggest that even in discrete time, a version of the Onsager–Machlup functional should be used, rather than the energy functional, at least if the solution is to be interpreted as a MAP estimator

    On the combination of omics data for prediction of binary outcomes

    Full text link
    Enrichment of predictive models with new biomolecular markers is an important task in high-dimensional omic applications. Increasingly, clinical studies include several sets of such omics markers available for each patient, measuring different levels of biological variation. As a result, one of the main challenges in predictive research is the integration of different sources of omic biomarkers for the prediction of health traits. We review several approaches for the combination of omic markers in the context of binary outcome prediction, all based on double cross-validation and regularized regression models. We evaluate their performance in terms of calibration and discrimination and we compare their performance with respect to single-omic source predictions. We illustrate the methods through the analysis of two real datasets. On the one hand, we consider the combination of two fractions of proteomic mass spectrometry for the calibration of a diagnostic rule for the detection of early-stage breast cancer. On the other hand, we consider transcriptomics and metabolomics as predictors of obesity using data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort, from Finland

    An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++

    Get PDF
    Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license

    Factors influencing Manx Shearwater grounding on the west coast of Scotland

    Get PDF
    Grounding of thousands of newly fledged petrels and shearwaters (family Procellariidae) in built‐up areas due to artificial light is a global problem. Due to their anatomy these grounded birds find it difficult to take off from built‐up areas and many fall victim to predation, cars, dehydration or starvation. This research investigated a combination of several factors that may influence the number of Manx Shearwater Puffinus puffinus groundings in a coastal village of Scotland located close to a nesting site for this species. A model was developed that used meteorological variables and moon cycle to predict the daily quantity of birds that were recovered on the ground. The model, explaining 46.32% of the variance of the data, revealed how new moon and strong onshore winds influence grounding. To a lesser extent, visibility conditions can also have an effect on grounding probabilities. The analysis presented in this study can improve rescue campaigns of not only Manx Shearwaters but also other species attracted to the light pollution by predicting conditions leading to an increase in the number of groundings. It could also inform local authorities when artificial light intensity needs to be reduced

    A Comparison of Machine Learning Methods for Cross-Domain Few-Shot Learning

    Get PDF
    We present an empirical evaluation of machine learning algorithms in cross-domain few-shot learning based on a fixed pre-trained feature extractor. Experiments were performed in five target domains (CropDisease, EuroSAT, Food101, ISIC and ChestX) and using two feature extractors: a ResNet10 model trained on a subset of ImageNet known as miniImageNet and a ResNet152 model trained on the ILSVRC 2012 subset of ImageNet. Commonly used machine learning algorithms including logistic regression, support vector machines, random forests, nearest neighbour classification, naïve Bayes, and linear and quadratic discriminant analysis were evaluated on the extracted feature vectors. We also evaluated classification accuracy when subjecting the feature vectors to normalisation using p-norms. Algorithms originally developed for the classification of gene expression data—the nearest shrunken centroid algorithm and LDA ensembles obtained with random projections—were also included in the experiments, in addition to a cosine similarity classifier that has recently proved popular in few-shot learning. The results enable us to identify algorithms, normalisation methods and pre-trained feature extractors that perform well in cross-domain few-shot learning. We show that the cosine similarity classifier and ℓ² -regularised 1-vs-rest logistic regression are generally the best-performing algorithms. We also show that algorithms such as LDA yield consistently higher accuracy when applied to ℓ² -normalised feature vectors. In addition, all classifiers generally perform better when extracting feature vectors using the ResNet152 model instead of the ResNet10 model

    A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage

    Get PDF
    Record linkage (RL) is the process of identifying and linking data that relates to the same physical entity across multiple heterogeneous data sources. Deterministic linkage methods rely on the presence of common uniquely identifying attributes across all sources while probabilistic approaches use non-unique attributes and calculates similarity indexes for pair wise comparisons. A key component of record linkage is accuracy assessment — the process of manually verifying and validating matched pairs to further refine linkage parameters and increase its overall effectiveness. This process however is time-consuming and impractical when applied to large administrative data sources where millions of records must be linked. Additionally, it is potentially biased as the gold standard used is often the reviewer’s intuition. In this paper, we present an approach for assessing and refining the accuracy of probabilistic linkage based on different supervised machine learning methods (decision trees, naïve Bayes, logistic regression, random forest, linear support vector machines and gradient boosted trees). We used data sets extracted from huge Brazilian socioeconomic and public health care data sources. These models were evaluated using receiver operating characteristic plots, sensitivity, specificity and positive predictive values collected from a 10-fold cross-validation method. Results show that logistic regression outperforms other classifiers and enables the creation of a generalized, very accurate model to validate linkage results

    Goodness-of-fit testing in high dimensional generalized linear models

    Get PDF
    We propose a family of tests to assess the goodness-of-fit of a high-dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed against testing specific non-linearities and interaction effects, or for testing the significance of groups of variables. The methodology is based on extracting left-over signal in the residuals from an initial fit of a generalized linear model. This can be achieved by predicting this signal from the residuals using modern flexible regression or machine learning methods such as random forests or boosted trees. Under the null hypothesis that the generalized linear model is correct, no signal is left in the residuals and our test statistic has a Gaussian limiting distribution, translating to asymptotic control of type I error. Under a local alternative, we establish a guarantee on the power of the test. We illustrate the effectiveness of the methodology on simulated and real data examples by testing goodness-of-fit in logistic regression models. Software implementing the methodology is available in the R package `GRPtests'

    A data mining approach in home healthcare: outcomes and service use

    Get PDF
    BACKGROUND: The purpose of this research is to understand the performance of home healthcare practice in the US. The relationships between home healthcare patient factors and agency characteristics are not well understood. In particular, discharge destination and length of stay have not been studied using a data mining approach which may provide insights not obtained through traditional statistical analyses. METHODS: The data were obtained from the 2000 National Home and Hospice Care Survey data for three specific conditions (chronic obstructive pulmonary disease, heart failure and hip replacement), representing nearly 580 patients from across the US. The data mining approach used was CART (Classification and Regression Trees). Our aim was twofold: 1) determining the drivers of home healthcare service outcomes (discharge destination and length of stay) and 2) examining the applicability of induction through data mining to home healthcare data. RESULTS: Patient age (85 and older) was a driving force in discharge destination and length of stay for all three conditions. There were also impacts from the type of agency, type of payment, and ethnicity. CONCLUSION: Patients over 85 years of age experience differential outcomes depending on the condition. There are also differential effects related to agency type by condition although length of stay was generally lower for hospital-based agencies. The CART procedure was sufficiently accurate in correctly classifying patients in all three conditions which suggests continuing utility in home health care
    corecore