27 research outputs found

    Model based approaches to characterize heterogeneity in gene regulation across cells and disease types

    Get PDF
    Access to large genome-wide biological datasets has now enabled computational researchers to tackle long-standing questions in Biomedicine through the lens of Machine Learning (ML) and Artificial Intelligence (AI). The potential benefits of such computational approaches to biological research are immense. For example, efficient, and yet interpretable, machine learning models of disease/drug response/phenotype can impact our life at both personal and social levels. However, heterogeneity is found at multiple scales in biology, manifested as the context-specificity of biological processes. This context-specific heterogeneity poses a major challenge to ML models. Even though context-specific models are often trained, this is mostly done without the benefit of mechanistic insights about the biological processes being modeled, and as such do not help improve our biological understanding. This dissertation addresses these challenges and their limitations by: a) designing appropriate features and ML models motivated by the current biological hypothesis at hand, b) building pipelines to analyze multiple context-specific models together, and c) developing data integration and imputation methods to address the problems of insufficient and missing data. The first project studies loss of methylation or hypo-methylation in large blocks causing aberrant gene activity, a well-known phenomenon in cancer. To find the associated markers, I designed a classification model of hypo-methylated block boundaries and non-boundaries in colon cancer. The second project models binding of transcription factor (TF) to specific DNA element to the genome, one of the principal components of gene regulation. Since condition specificity of TF binding is not yet well understood, this dissertation examines a design of cell type-specific models for transcription factor (TF) binding using ChIPSeq data. A meta-analysis pipeline, called TRISECT, is applied for multiple TF binding models to understand heterogeneity of cell specificity across those models. Next, models for breast cancer metastasis using gene expression data are discussed. In breast cancer metastasis, the affinity towards distant tissues called secondary tissues has not been comprehended. Therefore, going beyond mere discriminatory models, I propose another meta-analysis pipeline, MONTAGE intending to understand the organotropism of breast cancer metastasis across secondary tissues. Building ML models can be hindered by the data size, specially, for rare diseases. Therefore, by necessity, molecular data have been merged across multiple studies, and across multiple technical platforms which has vulnerability of so called batch effects diluting the actual biological signal. Existing methods are not capable of removing multi-variate confounding artifacts leading to inaccurate models. To circumvent this issue, this dissertation examines a deep learning based technique (deepSavior) which ‘translates’ the gene expression profile from samples of one technical platform to another platform. To summarize, this dissertation makes three distinct contributions, a) designing effective ML model to explore the determinants of cancer-associated hypomethlation, b) designing meta-analysis pipelines to compare multiple related but context-specific ML models to understand heterogeneous relations among biological processes, and b) developing new method to overcome the data integration and imputation challenges

    A simple reversed phase High Performance Liquid Chromatography method development and validation for determination of Carvedilol in pharmaceutical dosage forms

    Get PDF
    A simple, sensitive and precise reverse phase high performance liquid chromatographic method has been developed for the estimation of Carvedilol in pharmaceutical preparations. Chromatographic determination was performed on a reversed phase C 18 column (4.5 mm x 250 mm; 5 m particle size) using a mixture of Phosphate buffer: Acetonitrile (65:35) as mobile phase at a flow rate of 1ml/min with UV detection at 240 nm. The method was validated for linearity, accuracy, repeatability, precision, reproducibility, and specificity as per International ICH guidelines. The method was also used in determination Carvedilol content in five commercial brands available in Bangladeshi market. The method was linear in the range between 5 35 g/ml, exhibited good correlation coefficient (R 2 = 0.998) and good Accuracy study (98.08 %-99.91%). The method was found to specific for Carvedilol in presence of common excipients. Statistical analysis performed with proposed method proved it to be precise, accurate and reproducible. Hence it can be employed for routine analysis of Carvedilol both in bulk and commercial formulations

    Distinct genomic and epigenomic features demarcate hypomethylated blocks in colon cancer

    Get PDF
    Large mega base-pair genomic regions show robust alterations in DNA methylation levels in multiple cancers. A vast majority of these regions are hypomethylated in cancers. These regions are generally enriched for CpG islands, Lamin Associated Domains and Large organized chromatin lysine modification domains, and are associated with stochastic variability in gene expression. Given the size and consistency of hypomethylated blocks (HMB) across cancer types, we hypothesized that the immediate causes of methylation instability are likely to be encoded in the genomic region near HMB boundaries, in terms of specific genomic or epigenomic signatures. However, a detailed characterization of the HMB boundaries has not been reported. Here, we focused on ~13 k HMBs, encompassing approximately half of the genome, identified in colon cancer. We modeled the genomic features of HMB boundaries by Random Forest to identify their salient features, in terms of transcription factor (TF) binding motifs. Additionally we analyzed various epigenomic marks, and chromatin structural features of HMB boundaries relative to the non-HMB genomic regions. We found that the classical promoter epigenomic mark – H3K4me3, is highly enriched at HMB boundaries, as are CTCF bound sites. HMB boundaries harbor distinct combinations of TF motifs. Our Random Forest model based on TF motifs can accurately distinguish boundaries not only from regions inside and outside HMBs, but surprisingly, from active promoters as well. Interestingly, the distinguishing TFs and their interacting proteins are involved in chromatin modification. Finally, HMB boundaries significantly coincide with the boundaries of Topologically Associating Domains of the chromatin. Our analyses suggest that the overall architecture of HMBs is guided by pre-existing chromatin architecture, and are associated with aberrant activity of promoter-like sequences at the boundary.https://doi.org/10.1186/s12885-016-2128-

    Design and Analysis of Parabolic Trough Solar Water Heating System

    Get PDF
    Renewable energy technology is one of the prospective sources which can meet the energy demand and can contribute to achieve sustainable development goals. Concentrated collectors are widely used in solar thermal power generation and water heating system also. It is very popular due to its high thermal efficiency, simple construction requirements and low manufacturing cost. This paper is concerned with an experimental study of parabolic trough collector for water heating technology. It focuses on the performance of concentrating solar collector by changing the reflector materials (aluminum sheet, aluminum foil and mirror film). In Bangladesh, it is possible to use low cost solar concentrating technologies for domestic as well as industrial process heat applications. The line focusing parabolic trough collectors have been designed, developed and evaluated its performance by collecting solar radiation, inlet and outlet water temperature, flow rate, efficiency etc

    Cross-sectional Ct distributions from qPCR tests can provide an early warning signal for the spread of COVID-19 in communities

    Get PDF
    BackgroundSARS-CoV-2 PCR testing data has been widely used for COVID-19 surveillance. Existing COVID-19 forecasting models mainly rely on case counts obtained from qPCR results, even though the binary PCR results provide a limited picture of the pandemic trajectory. Most forecasting models have failed to accurately predict the COVID-19 waves before they occur. Recently a model utilizing cross-sectional population cycle threshold (Ct—the number of cycles required for the fluorescent signal to cross the background threshold) values obtained from PCR tests (Ct-based model) was developed to overcome the limitations of using only binary PCR results. In this study, we aimed to improve on COVID-19 forecasting models using features derived from the Ct-based model, to detect epidemic waves earlier than case-based trajectories.MethodsPCR data was collected weekly at Northeastern University (NU) between August 2020 and January 2022. Campus and county epidemic trajectories were generated from case counts. A novel forecasting approach was developed by enhancing a recent deep learning model with Ct-based features and applied in Suffolk County and NU campus. For this, cross-sectional Ct values from PCR data were used to generate Ct-based epidemic trajectories, including effective reproductive rate (Rt) and incidence. The improvement in forecasting performance was compared using absolute errors and residual squared errors with respect to actual observed cases at the 7-day and 14-day forecasting horizons. The model was also tested prospectively over the period January 2022 to April 2022.ResultsRt curves estimated from the Ct-based model indicated epidemic waves 12 to 14 days earlier than Rt curves from NU campus and Suffolk County cases, with a correlation of 0.57. Enhancing the forecasting models with Ct-based information significantly decreased absolute error (decrease of 49.4 and 221.5 for the 7 and 14-day forecasting horizons) and residual squared error (40.6 and 217.1 for the 7 and 14-day forecasting horizons) compared to the original model without Ct features.ConclusionCt-based epidemic trajectories can herald an earlier signal for impending epidemic waves in the community and forecast transmission peaks. Moreover, COVID-19 forecasting models can be enhanced using these Ct features to improve their forecasting accuracy. In this study, we make the case that public health agencies should publish Ct values along with the binary positive/negative PCR results. Early and accurate forecasting of epidemic waves can inform public health policies and countermeasures which can mitigate spread
    corecore