1,348 research outputs found

    Strong compound-risk factors: Efficient discovery through emerging patterns and contrast sets

    Full text link
    Odds ratio (OR), relative risk (RR) (risk ratio), and absolute risk reduction (ARR) (risk difference) are biostatistics measurements that are widely used for identifying significant risk factors in dichotomous groups of subjects. In the past, they have often been used to assess simple risk factors. In this paper, we introduce the concept of compound-risk factors to broaden the applicability of these statistical tests for assessing factor interplays. We observe that compound-risk factors with a high risk ratio or a big risk difference have an one-to-one correspondence to strong emerging patterns or strong contrast sets-two types of patterns that have been extensively studied in the data mining field. Such a relationship has been unknown to researchers in the past, and efficient algorithms for discovering strong compound-risk factors have been lacking. In this paper, we propose a theoretical framework and a new algorithm that unify the discovery of compound-risk factors that have a strong OR, risk ratio, or a risk difference. Our method guarantees that all patterns meeting a certain test threshold can be efficiently discovered. Our contribution thus represents the first of its kind in linking the risk ratios and ORs to pattern mining algorithms, making it possible to find compound-risk factors in large-scale data sets. In addition, we show that using compound-risk factors can improve classification accuracy in probabilistic learning algorithms on several disease data sets, because these compound-risk factors capture the interdependency between important data attributes. © 2007 IEEE

    LncRNA profiling in early-stage chronic lymphocytic leukemia identifies transcriptional fingerprints with relevance in clinical outcome

    Get PDF
    Long non-coding RNAs (lncRNAs) represent a novel class of functional RNA molecules with an important emerging role in cancer. To elucidate their potential pathogenetic role in chronic lymphocytic leukemia (CLL), a biologically and clinically heterogeneous neoplasia, we investigated lncRNAs expression in a prospective series of 217 early-stage Binet A CLL patients and 26 different subpopulations of normal B-cells, through a custom annotation pipeline of microarray data. Our study identified a 24-lncRNAsignature specifically deregulated in CLL compared with the normal B-cell counterpart. Importantly, this classifier was validated on an independent data set of CLL samples. Belonging to the lncRNA signature characterizing distinct molecular CLL subgroups, we identified lncRNAs recurrently associated with adverse prognostic markers, such as unmutated IGHV status, CD38 expression, 11q and 17p deletions, and NOTCH1 mutations. In addition, correlation analyses predicted a putative lncRNAs interplay with genes and miRNAs expression. Finally, we generated a 2-lncRNA independent risk model, based on lnc-IRF2-3 and lnc-KIAA1755-4 expression, able to distinguish three different prognostic groups in our series of early-stage patients. Overall, our study provides an important resource for future studies on the functions of lncRNAs in CLL, and contributes to the discovery of novel molecular markers with clinical relevance associated with the disease

    Doctor of Philosophy

    Get PDF
    dissertationGene expression data repositories provide large and ever increasing data for secondary use by translational informatics methods. For example, Gene Expression Omnibus (GEO) houses over 37,000 experiments with the goal of supporting further research. To use these published results in a larger meta-analysis, consolidation of the data are needed; however, the data are largely unstructured, thus hindering data integration efforts. Here, I propose the use of a novel pipeline, Ontology Based Data Integration (OBDI), which uses an ontological approach to combine the samples across multiple GEO experiments. The ODBI pipeline uses machine learning algorithms that permit researchers to consolidate and analyze data across GEO experiments. Here, I demonstrate how using an ontological approach to integrate samples across experiments can be used to explore the immune response at a molecular level. As part of this process, a Web Ontology Language (OWL) was developed for each data platform used. OWL serves as a core component in successfully processing different sample types. Immunological experiments from GEO were consolidated to evaluate this methodology. The experiments included samples analyzed on expression arrays, BeadChips, and sequencing technologies. The integration of a complex biological system and the incorporation of different biological data types will validate the potential of OBDI. iv The nature of biological data is highly dimensional. OBDI incorporates tools and techniques that can handle the analysis of various biological data. The machine learning analysis performed within the OBDI pipeline successfully evaluated the newly annotated experiments and provides insights that can be further explored. The OBDI pipeline can help researchers annotate experiments using ontologies and analyze the annotated experiments. To successfully build the pipeline, ontologies served as the backbone of integrating samples from GEO Series records into machine learning experiments using ML-Flex. By using the OBDI pipeline, researchers can access the uncurated experiments from GEO (GEO Data Series) and annotate the data using the terms in the ontologies. This mechanism allows for the organization of data sets in relationship to new experiments independent of GEO's GDS curation process. The OBDI system allows ontologies to grow organically around a cluster of experiments. These experiments are then further analyzed in ML-Flex using machine learning algorithms. The curated experiments are analyzed in silico and the computational analyses are supported by the OBDI ontological system

    Multivariate classification of gene expression microarray data

    Get PDF
    L'expressiódels gens obtinguts de l'anàliside microarrays s'utilitza en molts casos, per classificar les cèllules. En aquestatesi, unaversióprobabilística del mètodeDiscriminant Partial Least Squares (p-DPLS)s'utilitza per classificar les mostres de les expressions delsseus gens. p-DPLS esbasa en la regla de Bayes de la probabilitat a posteriori. Aquestsclassificadorssónforaçats a classficarsempre.Per superaraquestalimitaciós'haimplementatl'opció de rebuig.Aquestaopciópermetrebutjarlesmostresamb alt riscd'errors de classificació (és a dir, mostresambigüesi outliers).Aquestaopció de rebuigcombinacriterisbasats en els residuals x, el leverage ielsvalorspredits. A més,esdesenvolupa un mètode de selecció de variables per triarels gens mésrellevants, jaque la majoriadels gens analitzatsamb un microarraysónirrellevants per al propòsit particular de classificacióI podenconfondre el classificador. Finalment, el DPLSs'estenen a la classificació multi-classemitjançant la combinació de PLS ambl'anàlisidiscriminant lineal

    Overcoming TCF4-Driven BCR Signaling in Diffuse Large B-Cell Lymphoma

    Get PDF
    Diffuse Large B-cell Lymphoma (DLBCL) is the most common subtype of lymphoma. Despite a cure rate of 40% with standard R-CHOP therapy, patients that refract or relapse are subject to a dismal prognosis. Cases of DLBCL can be classified by their molecular expression phenotype, with the GCB-like subtype aligning with the profile of a germinal center B-cell and the ABC-like subtype aligning to that of an activated B-cell. Aggressive disease is often characterized by high levels of B-cell Receptor (BCR) signaling. This pathway engages downstream kinases responsible for stimulating proliferation and survival that play a key role under the normal circumstances of B-cell development. A comprehensive study aimed at delineating sources of inhibitor insensitivity within the BCR signaling pathway was conducted in order to identify novel drivers of disease and improve clinical outcome. A cohort of 39 aggressive lymphoma cell lines was assayed for sensitivity to Ibrutinib, a BTK inhibitor, and Umbralisib, a PI3Kδ inhibitor. Combined with intracellular phosphoflow measurements, these results revealed that higher levels of proximal BCR kinase (SYK, LYN, BTK, BLNK) and AKT (downstream of PI3K) signaling were highly linked and predictive of inhibitor insensitivity. Simultaneous inhibition of these pathways with Ibrutinib and Umbralisib consequently revealed a synergistic relationship. Following these results, a DNA copy number analysis of 673 DLBCL patient profiles was performed alongside 249 matching gene expression profiles to uncover the genomic drivers responsible for higher signaling. These results identified an enrichment of genes with transcription factor activity within regions of significant DNA copy number gain and matching transcript gain. The TCF4 transcription factor was identified within the most significant gain peak at chromosome 18q21 and led to increased transcript and protein translation. TCF4 gain was associated with the aggressive ABC-like phenotype, poor survival, and increased transcription of key BCR signaling component targets, such as BLNK, BTK, PIK3CA (PI3Kα), and the IgM heavy chain constant region. Collectively, these results identified sources of inhibitor insensitivity within DLBCL, and TCF4 was characterized as a driving force behind aggressive BCR signaling

    RNA polymerase I inhibition : mechanism and exploitation in cancer treatment

    Get PDF
    Cancer is an umbrella term for diseases characterized by uncontrollably proliferating abnormal cells that often have also gained the ability to spread and invade other tissues. It is one of the leading causes of death worldwide and the second-leading cause of death in Sweden. Chemotherapy is a commonly used treatment approach, where the drugs preferentially target cellular processes needed for cancer cell proliferation, leading to cancer cell growth arrest or death. Albeit a potent tool in managing cancer, the overall success rate remains low for certain cancer types, highlighting the need to identify new chemotherapeutic targets and strategies. Ribosome biogenesis (RiBi), a fundamental process that supplies cells with ribosomes, represents an emerging target, with several cancer types relying on high RiBi rates to maintain high proliferation rates. Small-molecule-mediated RiBi inhibition induces nucleolar stress, a cellular response resulting in cell cycle arrest, and apoptosis, often dependent on p53. Pre-clinical studies have shown promising results in a variety of cancer types; however, the compounds available are limited, and their mechanistic details are yet to be explored. Thus, the characterization of cancer-specific biological effects of RiBi inhibition, together with the identification of new RiBi targets and inhibitors, may expand the therapeutic promise of this strategy, accelerate the clinical development of drug candidates and potentially facilitate the selection of patients who might benefit from the clinical use of RiBi inhibitors in the future. The primary aim of the Thesis was to study: 1. the pharmacological inhibition of RiBi focusing on RNA polymerase I (Pol I), and repurposing of clinically approved drugs with underappreciated RiBi-inhibitory effects for cancer treatment 2. the effects of Pol I inhibition in high-grade gliomas (HGG) and identify synergistic treatment strategies to prevent potential resistance development 3. alternative druggable RiBi-associated protein targets In Paper I, we identified an FDA-approved antimalarial drug, amodiaquine, with previously unknown Pol I inhibitory effects. We designed and synthesized a chemical analog with comparable efficacy to limit potential toxicity and demonstrated the effectiveness of the analog series in a panel of colorectal cancer cell lines. In Paper II, we reported the relevance and effectiveness of RiBi as a target in HGG, uncovered a novel cellular response to nucleolar stress, mediated by the Fibroblast Growth Factor 2 (FGF2)- Fibroblast Growth factor receptor 1 (FGFR1) signaling axis, and proposed a highly synergistic combination with FGFR inhibitors to limit glioma cell growth. In Paper III, we further characterized the functional role of the DEAD-Box Helicase and Exon Junction Complex protein, eIF4A3, and suggested its relevance as a target for drug discovery, showing its involvement in RiBi and highlighting its association with tumor aggressiveness

    Temporal and Spatial Analysis of Cancer Rates in the United States

    Get PDF
    Introduction: Spatial, temporal and racial patterns of cancer remain largely unexplained in the United States. Time trends of cancer incidence and mortality can be used to estimate the current cancer burden, anticipate clinical care needs, and suggest hypotheses regarding possible etiologic explanations for underlying trends. Methods: Using U.S. 1979-2003 cancer incidence and 1969-2003 cancer mortality data, age-period-cohort and Joinpoint regression models were fit to summarize gender- and race-specific temporal trends for three broad cancer categories that include tobacco-related cancer, screen-detectable cancer, and cancer unrelated to tobacco and screening. Demographic patterns and time trends of non-Hodgkin's lymphoma (NHL) incidence between Pennsylvania and the U.S. from 1985 to 2004 were compared. Using Idaho cancer incidence, 1990-2005, and arsenic levels in ground water, 1990-2005, spatial analysis was conducted to identify geographic patterns of cancer incidence and to evaluate the relationship between arsenic exposure in ground water and cancer incidence in Idaho. Results: Over the last three decades, tobacco-related cancer incidence declined among men and increased among women. Screen-detectable cancer incidence increased, more rapidly among men than women. For cancer unrelated to tobacco and screening, incidence increased in every gender-and-race group. Though not identical, NHL incidence patterns, with substantial increases, were similar in the U.S. and Pennsylvania. NHL incidence was higher in Pennsylvania counties with a greater percentage of urban residents. Although spatial clustering was demonstrated in Idaho cancer incidence, no relationship was found between arsenic exposure in ground water and Idaho cancer incidence. Conclusion: NHL and other cancers unrelated to smoking or screening have increased in the U.S. in the past two decades in white and black men and women. Etiologic research should attempt to identify modifiable risk factors, including environmental exposures, responsible for the increasing incidence of NHL and cancer unrelated to tobacco and screening. The ecologic association observed in Pennsylvania between NHL incidence and urban residence may be relevant to NHL risk in the entire United States. Additional environmental and demographic information should be evaluated in order to clarify the arsenic-related cancer risk in Idaho counties where ground water has been found to contain higher levels of arsenic. Public Health Significance: Age, period and cohort modeling of cancer incidence and mortality provides important indications of current and future health care needs and also suggests hypotheses for future research. The results of this analysis provide health professionals, researchers, and policy-makers with detailed information and an understandable overview of cancer patterns in the United States. Hypotheses should be generated about these unexplained patterns of cancer so that avoidable cancer risks can be identified that will decrease the cancer burden and associated requirements for health care

    Hematological image analysis for acute lymphoblastic leukemia detection and classification

    Get PDF
    Microscopic analysis of peripheral blood smear is a critical step in detection of leukemia.However, this type of light microscopic assessment is time consuming, inherently subjective, and is governed by hematopathologists clinical acumen and experience. To circumvent such problems, an efficient computer aided methodology for quantitative analysis of peripheral blood samples is required to be developed. In this thesis, efforts are therefore made to devise methodologies for automated detection and subclassification of Acute Lymphoblastic Leukemia (ALL) using image processing and machine learning methods.Choice of appropriate segmentation scheme plays a vital role in the automated disease recognition process. Accordingly to segment the normal mature lymphocyte and malignant lymphoblast images into constituent morphological regions novel schemes have been proposed. In order to make the proposed schemes viable from a practical and real–time stand point, the segmentation problem is addressed in both supervised and unsupervised framework. These proposed methods are based on neural network,feature space clustering, and Markov random field modeling, where the segmentation problem is formulated as pixel classification, pixel clustering, and pixel labeling problem respectively. A comprehensive validation analysis is presented to evaluate the performance of four proposed lymphocyte image segmentation schemes against manual segmentation results provided by a panel of hematopathologists. It is observed that morphological components of normal and malignant lymphocytes differ significantly. To automatically recognize lymphoblasts and detect ALL in peripheral blood samples, an efficient methodology is proposed.Morphological, textural and color features are extracted from the segmented nucleus and cytoplasm regions of the lymphocyte images. An ensemble of classifiers represented as EOC3 comprising of three classifiers shows highest classification accuracy of 94.73% in comparison to individual members. The subclassification of ALL based on French–American–British (FAB) and World Health Organization (WHO) criteria is essential for prognosis and treatment planning. Accordingly two independent methodologies are proposed for automated classification of malignant lymphocyte (lymphoblast) images based on morphology and phenotype. These methods include lymphoblast image segmentation, nucleus and cytoplasm feature extraction, and efficient classification
    corecore