21 research outputs found

    Algorithms for integrated analysis of glycomics and glycoproteomics by LC-MS/MS

    Get PDF
    The glycoproteome is an intricate and diverse component of a cell, and it plays a key role in the definition of the interface between that cell and the rest of its world. Methods for studying the glycoproteome have been developed for released glycan glycomics and site-localized bottom-up glycoproteomics using liquid chromatography-coupled mass spectrometry and tandem mass spectrometry (LC-MS/MS), which is itself a complex problem. Algorithms for interpreting these data are necessary to be able to extract biologically meaningful information in a high throughput, automated context. Several existing solutions have been proposed but may be found lacking for larger glycopeptides, for complex samples, different experimental conditions, different instrument vendors, or even because they simply ignore fundamentals of glycobiology. I present a series of open algorithms that approach the problem from an instrument vendor neutral, cross-platform fashion to address these challenges, and integrate key concepts from the underlying biochemical context into the interpretation process. In this work, I created a suite of deisotoping and charge state deconvolution algorithms for processing raw mass spectra at an LC scale from a variety of instrument types. These tools performed better than previously published algorithms by enforcing the underlying chemical model more strictly, while maintaining a higher degree of signal fidelity. From this summarized, vendor-normalized data, I composed a set of algorithms for interpreting glycan profiling experiments that can be used to quantify glycan expression. From this I constructed a graphical method to model the active biosynthetic pathways of the sample glycome and dig deeper into those signals than would be possible from the raw data alone. Lastly, I created a glycopeptide database search engine from these components which is capable of identifying the widest array of glycosylation types available, and demonstrate a learning algorithm which can be used to tune the model to better understand the process of glycopeptide fragmentation under specific experimental conditions to outperform a simpler model by between 10% and 15%. This approach can be further augmented with sample-wide or site-specific glycome models to increase depth-of-coverage for glycoforms consistent with prior beliefs

    Integrative methods for analyzing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Integrative methods for analysing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    An Imaging Mass Spectrometry Investigation Into the N-linked Glycosylation Landscape of Pancreatic Ductal Adenocarcinoma and the Development of Associated Tools for Enhanced Glycan Separation and Characterization

    Get PDF
    The severity of pancreatic ductal adenocarcinoma (PDAC) is largely attributed to a failure to detect the disease before metastatic spread has occurred. CA19-9, a carbohydrate biomarker, is used clinically to surveille disease progression, but due to specificity challenges is not suitable for early discovery. As CA19-9 and other prospective markers are glycan epitopes, there is great clinical interest in understanding the glycobiology of pancreatic cancer. Unfortunately, few studies have been able to link glycosylation changes directly to pancreatic tumors and instead have focused on peripheral glycan alterations in the serum of PDAC patients. To address this gap in our understanding, we applied an imaging mass spectrometry (IMS) approach with complementary enzymatic and chemical isomer separation techniques to spatially assess the PDAC N-glycome in a cohort of pancreatic cancer patients. Orthogonally, we characterized the expression of CA19-9 and a new biomarker, sTRA, by multi-round immunofluorescence (IF) in the same cohort. These analyses revealed increased sialylation, fucosylation and branching amongst other structural themes in areas of PDAC tumor tissue. CA19-9 expressing tumors were defined by multiply branched, fucosylated bisecting N-glycans while sTRA expressing tumors favored tetraantennary N-glycans with polylactosamine extensions. IMS and IF-derived glycan and biomarker features were used to build classification models that detected PDAC tissue with an AUC of 0.939, outperforming models using either dataset individually. While studying sialylation isomers in our PDAC cohort, we saw an opportunity to enhance the chemical derivatization protocol we were using to address its shortcomings and expand its functionality. Subsequently, we developed a set of novel amidation-amidation strategies to stabilize and differentially label 2,3 and 2,6-linked sialic acids. In our alkyne-based approach, the differential mass shifts induced by the reactions allow for isomeric discrimination in imaging mass spectrometry experiments. This scheme, termed AAXL, was further characterized in clinical tissue specimens, biofluids and cultured cells. Our azide-based approach, termed AAN3, was more suitable for bioorthogonal applications, where the azide tag installed on 2,3 and 2,8-sialic acids could be reacted by click chemistry with a biotin-alkyne for subsequent streptavidin-peroxidase staining. Furthering the use of AAN3, we developed two additional techniques to fluorescently label (SAFER) and preferentially enrich (SABER) 2,3 and 2,8-linked sialic acids for more advanced glycomic applications. Initial experiments with these novel approaches have shown successful fluorescent staining and the identification of over 100 sialylated glycoproteins by LC-MS/MS. These four bioorthogonal strategies provide a new glycomic tool set for the characterization of sialic acid isomers in pancreatic and other cancers. Overall, this work furthers our collective understanding of the glycobiology underpinning pancreatic cancer and potentiates the discovery of novel carbohydrate biomarkers for the early detection of PDAC

    Inflammatory response elements, glycan, and proteome profiles as salivary biomarkers for the early diagnosis of OSCC

    Get PDF
    El cáncer oral, sobre todo el, carcinoma oral de células escamosas (COCE), es la neoplasia maligna más común en la cavidad oral que se caracteriza por un mal pronóstico y una baja tasa de supervivencia, cuando se diagnostica en estadios avanzados. Puede ser tratado de forma efectiva si se detecta en estadios iniciales, mejorando el pronóstico. Por tanto, tras la prevención, el diagnóstico precoz se ha convertido en el principal objetivo en el manejo del cáncer oral, dado su impacto positivo en la supervivencia y calidad de vida de estos pacientes. A menudo, los escasos síntomas iniciales pasan desapercibidos y la enfermedad se descubre en fases avanzadas, por lo que se requiere mejorar el diagnóstico. Actualmente, no existen biomarcadores para COCE con la sensibilidad y especificidad suficiente para su uso clínico rutinario, lo que enfatiza la necesidad de una herramienta práctica y simple tanto para fines diagnósticos como de programas de detección precoz o screening. Los tumores de COCE aparecen a través de una serie de mutaciones moleculares que conducen a un crecimiento celular descontrolado, pasando por distintas fases: desde áreas de hiperplasia, lesiones displásicas, carcinoma in situ y, finalmente, carcinoma invasivo. Además, el desarrollo de muchos casos de COCE se ha correlacionado con la transformación cancerosa de lesiones orales potencialmente malignas, como la leucoplasia oral. Esta última presenta subtipos heterogéneos con diferentes potenciales de modificación, entre los que destaca la leucoplasia verrugosa proliferativa (LVP) con alto riesgo de conversión maligna. Los biomarcadores derivados de la saliva ofrecen un muestreo fácil con un mínimo riesgo para los pacientes, coste y tiempo de diagnóstico, y abarcan una variedad de parámetros detectables y medibles que permiten diferenciar la salud de la enfermedad. Objetivos El objetivo general es analizar un panel de citocinas inflamatorias salivares, así como perfiles proteicos y glicanos salivares con el fin de identificar potenciales biomarcadores para el diagnóstico precoz de COCE. Metodología Se utilizaron estrategias ómicas combinadas para perfilar diferentes tipos de moléculas con potencial utilidad diagnóstica en la saliva de pacientes con LVP, estadios tempranos y avanzados de COCE y personas sanas. Se utilizó un inmunoensayo multiplexado para examinar los niveles de ocho citocinas. Los perfiles proteicos y N-glicanos derivados de la saliva se analizaron mediante cromatografía líquida (CL) acoplada a espectrometría de masas (MS). Resultados y Conclusiónes Los niveles significativamente alterados de seis citocinas asociadas con los grupos de patología sugiere su posible participación en el proceso de carcinogénesis oral. El perfil comparativo de N-glicanos reveló varios glicanos bi- y tri-antenarios fucosilados expresados diferencialmente entre los grupos estudiados, proporcionando una plataforma razonable para investigar más a fondo la utilidad de la glicosilación salivar para el diagnóstico de COCE. Se cuantificaron más de 600 proteínas en la saliva de personas sanas y con lesiones pre/cancerosas mediante los perfiles proteicos generados por LC-MS cuantificadas. El análisis comparativo dio como resultado una lista de proteínas expresadas diferencialmente, caracterizadas en COCE, que indica mecanismos significativamente alterados como el sistema inmunológico, la inhibición de actividades enzimáticas y la adhesión celular. Entre los marcadores candidatos identificados, algunos se habían descrito previamente en la saliva. Además, se han anotado varias proteínas nuevas asociadas a COCE.Oral squamous cell carcinoma (OSCC) is the most common neoplasia in the oral cavity characterized by poor prognosis and a low survival rate when diagnosed at advanced phases. Many OSCC cases have been correlated to the cancerous transformation of oral potentially malignant disorders (OPMDs) such as oral leukoplakia. The latter exhibits heterogeneous subtypes, among which proliferative verrucous leukoplakia (PVL) stands out with a high risk of malignant conversion. Reliable biomarkers for OSCC are yet unavailable in the routine clinical setting, emphasizing the need for a practical and simple tool to be used for definitive diagnosis and screening programs. Saliva-derived biomarkers offer easy sampling with reduced risk for the patients, cost and diagnosis time, and a variety of measurable parameters that can discriminate health from disease. To identify novel salivary biomarkers of OSCC, we herein combined multi-omics analytical strategies to profile different types of molecules with potential diagnostic utility. As inflammation has previously been linked to oral pathologies, research so far indicates the possibility of using salivary cytokines for the screening of oral disorders. Altered cytokine responsiveness has been tightly associated with the development of OSCC, as well as detected in patients with OPMDs. To reveal changes related to oral carcinogenesis, a multiplex immune bead-based assay was used to survey the levels of 8 cytokines in the saliva of patients with PVL, at early and advanced OSCC stages and their healthy counterparts. Results indicated altered cytokine activity associated with the pathology groups, the predictive models for sensitivity and specificity of which showed high ROC AUC values. It is now known that defective glycosylation accompanies many chronic and infectious conditions and is a common feature of tumor cells that may affect N-glycans on glycoproteins. To compile a list of candidate biomarkers, another type of molecule has been studied. Saliva protein released N-glycans of healthy volunteers, PVL, and OSCC patients were analysed by the means of liquid chromatography (LC) coupled to mass spectrometry (MS). Comparative N-glycome profiling disclosed differentially expressed fucosylated bi- and tri-antennary glycans among the studied groups, providing a reasonable platform to further investigate the utility of salivary glycosylation for diagnosis of OSCC. Lastly, to get an insight into the complex molecular alterations associated with cancer, LC-MS generated proteomic profiles revealed more than 600 quantified proteins in the saliva of healthy and pre/cancerous lesions. The comparative analysis outlined a list of differentially expressed proteins, characterized in OSCC, indicating altered mechanisms implicating the immune system, inhibition of enzymatic activities, and cell adhesion. In this discovery phase of biomarkers identification, we provided a molecular panel of potential indicators of malignant transformation and/or early OSCC diagnosis

    Utjecaj genskih i okolišnih čimbenika na N-glikozilaciju imunoglobulina G i ukupnih plazmatskih proteina određen studijom na blizancima

    Get PDF
    Glycans constitute the most abundant and diverse form of the post-translational modifications. While genes unequivocally determine the structure of each polypeptide, there is no genetic template for the glycan part. Instead, hundreds of genes and their products interact in the very complex pathway of glycan biosynthesis which is further complicated by environmental influences. Therefore, the aim of this thesis was to determine the extent to which individual differences in immunoglobulin G and total plasma proteins glycosylation patterns reflect genetic versus environmental influences. A twin study design was used and study subjects were twins enrolled in the TwinsUK registry, a national register of adult twins. More than 4500 samples were analyzed by HILIC-UPLC (Hydrophilic Interaction Ultra Performance Liquid Chromatography). A high contribution of the genetic component to N-glycome composition was found. Variation in levels of 51 of the 76 IgG glycan traits studied was at least 50% heritable and only a small proportion of N-glycan traits had a low genetic contribution. Heritability of plasma N-glycome was also high, with half of the plasma glycan traits being at least 50% heritable. Further, epigenome-wide association (EWA) analysis showed that methylation levels at some genes are also implicated in glycome composition, both in those with high heritability and those with a lower genetic contribution. The study to investigate the potential role of glycosylation in kidney function was also conducted. Fourteen IgG glycan traits were associated with renal function in discovery population and remained significant after validation in an independent subset of monozygotic twins discordant for renal disease, reflecting difference in galactosylation, sialylation, and level of bisecting N-acetylglucosamine. Using the weighted correlation network analysis (WGCNA) for IgG glycan traits, a correlation between low back pain (LBP) and glycan modules was established. There was a weak positive correlation between pain phenotypes and "proantibody- dependent cell-mediated cytotoxicity (ADCC)" WGCNA glycan modules (high bisecting Nacetylglucosamine and low core fucose) and a weak negative correlation between pain phenotypes and "anti-ADCC" module (high core fucose, no bisecting N-acetylglucosamine). This suggests that glycans are promising candidates for biomarkers in many different diseases.Glikani predstavljaju najzastupljeniji i najraznolikiji oblik posttranslacijske modifikacije. Dok geni nedvosmisleno određuju strukturu svakog polipeptida, za sintezu glikana ne postoji genski predložak. Umjesto toga, stotine gena i njihovih produkata sudjeluju u vrlo kompleksnoj biosintezi glikana koju okolišni utjecaji čine još složenijom. Stoga je cilj ovog doktorskog rada odrediti razmjer kojim genski i okolišni čimbenici utječu na N-glikane imunoglobulina G (IgG) i ukupnih glikoproteina plazme. Da bi se postigao navedeni cilj, upotrijebljena je studija na blizancima. Ispitanici su regrutirani TwinsUK registrom, najvećim registrom blizanaca u Velikoj Britaniji. Više od 4500 uzoraka analizirano je HILIC-UPLC metodom (kromatografijom vrlo visoke djelotvornosti temeljenoj na hidrofilnim interakcijama, eng. Hydrophilic Interaction Ultra Performance Liquid Chromatography). Velika genska komponenta (heritabilnost ≥ 50%) pokazana je za 51 od 76 glikanskih svojstava IgG-a. Nasuprot tome, samo je 12 glikanskih svojstava IgG-a pokazalo malu gensku komponentu. Heritabilnost plazmatskog N-glikoma također se pokazala velikom. Polovica plazmatskih glikanskih svojstava bila je barem 50% heritabilna. Epigenomska asocijacijska analiza pokazala je da razina metilacije na nekim genima također utječe na sastav glikoma, i to na visokoheritabilne i niskoheritabilne glikane. Također je provedeno prvo istraživanje potencijalne uloge IgG glikozilacije u funkciji bubrega. U prvoj je analizi pronađena značajna povezanost s bubrežnom funkcijom za 14 glikanskih svojstava, a ostala je značajna i nakon validacije na neovisnoj podskupini monozigotnih blizanaca diskordantnih za bubrežnu funkciju. Ta glikanska svojstva pripadaju trima glavnim glikozilacijskim karakteristikama IgG-a: galaktozilaciji, sijalinizaciji i razini račvajućeg Nacetilglukozamina. Koristeći se WGCNA (eng. weighted correlation network analysis) metodologijom, provedena je mrežna analiza razina IgG glikana kod blizanaca, da bi se uspostavili klasteri koreliranih glikana. Pronađene su povezanosti između tih klastera i fenotipova boli kod blizanaca s LBP-om. Opažena je pozitivna korelacija između fenotipova boli i „pro-stanična citotoksičnost ovisna o protutijelima (eng. antibody-dependent cell-mediated cytotoxicity, ADCC)“ WGCNA glikanskog modula (visoka razina račvajućeg N-acetilglukozamina i niska razina sržne fukoze) te negativna korelacija između fenotipova boli i „anti-ADCC“ modula (visoka razina sržne fukoze, bez račvajućeg N-acetilglukozamina). Rezultati pokazuju da su glikani obećavajući biomarkeri za mnoge bolesti

    Immunoglobulin G glycosylation in patients with colorectal cancer

    Get PDF
    INTRODUCTION: Colorectal cancer (CRC) is a malignant neoplasm of the colon and the rectum. CRC is still associated with poor prognosis, low survival rate and usually relatively late diagnosis. MATERIALS AND METHODS: We analysed IgG glycome composition in 1229 patients with CRC and 538 matching controls. Effects of surgery were evaluated in 28 patients sampled before and three times after surgery. Furthermore, IgG glycome composition was analysed in 39 plasma samples collected before initial diagnosis of CRC. RESULTS: Clinical algorithms showed good prediction of all cause and CRC mortality. The inclusion of IgG glycan data in regression models did not lead to any statistically significant improvements in overall prediction of survival (Harrell’s C: 0.73, 0.77; AUC: 0.75, 0.79, IDI: 0.02, 0.04 respectively). However, the inclusion of IgG glycan data substantially improved the prediction of rapid progressors over clinical models in AJCC stage 4 patients (AUC 0.53 vs. 0.75, IDI 0.21). When analysing clinical characteristics among 760 patients and 538 matching controls it was found that CRC associates with decrease in IgG galactosylation, IgG sialylation and increase in core-fucosylation of neutral glycans with concurrent decrease of core fucosylation of sialylated glycans. COCLUSION: The glycan differences among CRC patients are consistent with significantly increased IgG pro-inflammatory activity being associated with poorer CRC prognosis, especially in late stage CRC. Considering the functional relevance of IgG glycosylation for both tumor immunosurveilance and clinical efficacy of therapy with monoclonal antibodies, individual variation in IgG glycosylation may turn out to be important for prediction of disease course or the choice of therapy

    Identifying the molecular components that matter: a statistical modelling approach to linking functional genomics data to cell physiology

    Get PDF
    Functional genomics technologies, in which thousands of mRNAs, proteins, or metabolites can be measured in single experiments, have contributed to reshape biological investigations. One of the most important issues in the analysis of the generated large datasets is the selection of relatively small sub-sets of variables that are predictive of the physiological state of a cell or tissue. In this thesis, a truly multivariate variable selection framework using diverse functional genomics data has been developed, characterized, and tested. This framework has also been used to prove that it is possible to predict the physiological state of the tumour from the molecular state of adjacent normal cells. This allows us to identify novel genes involved in cell to cell communication. Then, using a network inference technique networks representing cell-cell communication in prostate cancer have been inferred. The analysis of these networks has revealed interesting properties that suggests a crucial role of directional signals in controlling the interplay between normal and tumour cell to cell communication. Experimental verification performed in our laboratory has provided evidence that one of the identified genes could be a novel tumour suppressor gene. In conclusion, the findings and methods reported in this thesis have contributed to further understanding of cell to cell interaction and multivariate variable selection not only by applying and extending previous work, but also by proposing novel approaches that can be applied to any functional genomics data

    Discovering Biomarkers of Alzheimer's Disease by Statistical Learning Approaches

    Get PDF
    In this work, statistical learning approaches are exploited to discover biomarkers for Alzheimer's disease (AD). The contributions has been made in the fields of both biomarker and software driven studies. Surprising discoveries were made in the field of blood-based biomarker search. With the inclusion of existing biological knowledge and a proposed novel feature selection method, several blood-based protein models were discovered to have promising ability to separate AD patients from healthy individuals. A new statistical pattern was discovered which can be potential new guideline for diagnosis methodology. In the field of brain-based biomarker, the positive contribution of covariates such as age, gender and APOE genotype to a AD classifier was verified, as well as the discovery of panel of highly informative biomarkers comprising 26 RNA transcripts. The classifier trained by the panetl of genes shows excellent capacity in discriminating patients from control. Apart from biomarker driven studies, the development of statistical packages or application were also involved. R package metaUnion was designed and developed to provide advanced meta-analytic approach applicable for microarray data. This package overcomes the defects appearing in previous meta-analytic packages { 1) the neglection of missing data, 2) the in exibility of feature dimension 3) the lack of functions to support post-analysis summary. R package metaUnion has been applied in a published study as part of the integrated genomic approaches and resulted in significant findings. To provide benchmark references about significance of features for dementia researchers, a web-based platform AlzExpress was built to provide researchers with granular level of differential expression test and meta-analysis results. A combination of fashionable big data technologies and robust data mining algorithms make AlzExpress flexible, scalable and comprehensive platform of valuable bioinformatics in dementia research.Plymouth Universit
    corecore