10 research outputs found

    A meta-data based method for DNA microarray imputation

    Get PDF
    BACKGROUND: DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. RESULTS: We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. CONCLUSION: Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available

    How to Improve Postgenomic Knowledge Discovery Using Imputation

    Get PDF
    While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN) reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures

    Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technologies produced large amount of data. In a previous study, we have shown the interest of <it>k-Nearest Neighbour </it>approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human.</p> <p>Results</p> <p>We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (<it>EM_array</it>). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that <it>k-means </it>approach is more efficient to conserve gene associations.</p> <p>Conclusions</p> <p>More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The <it>EM_array </it>approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.</p

    Comparative analysis of biological versus chemical synthesis of palladium nanoparticles for catalysis of chromium (VI) reduction

    Get PDF
    The discharge of hexavalent chromium [Cr(VI)] from several anthropogenic activities leads to environmental pollution. In this study, we explore a simple yet cost effective method for the synthesis of palladium (Pd) nanoparticles for the treatment of Cr(VI). The presence of elemental Pd [Pd(0)] was confirmed by scanning electron microscope (SEM), electron dispersive spectroscopy and X-ray diffraction (XRD). We show here that the biologically synthesized nanoparticles (Bio-PdNPs) exhibit improved catalytic reduction of Cr(VI) due to their size being smaller and also being highly dispersed as compared to chemically synthesized nanoparticles (Chem-PdNPs). The Langmuir–Hinshelwood mechanism was successfully used to model the kinetics. Using this model, the Bio-PdNPs were shown to perform better than Chem-PdNPs due to the rate constant (kbio = 6.37 mmol s−1 m−2) and Cr(VI) adsorption constant (KCr(VI),bio = 3.11 × 10−2 L mmol−1) of Bio-PdNPs being higher than the rate constant (kchem = 3.83 mmol s−1 m−2) and Cr(VI) adsorption constant (KCr(VI),chem = 1.14 × 10−2 L mmol−1) of Chem-PdNPs. In addition, product inhibition by trivalent chromium [Cr(III)] was high in Chem-PdNPs as indicated by the high adsorption constant of Cr(III) in Chem-PdNPs of KCr(III),chem = 52.9 L mmol−1 as compared to the one for Bio-PdNPs of KCr(III),bio = 2.76 L mmol−1.http://www.nature.com/srep/index.htmlhttps://www.nature.com/srepChemical Engineerin

    Machine Learning Methods To Identify Hidden Phenotypes In The Electronic Health Record

    Get PDF
    The widespread adoption of Electronic Health Records (EHRs) means an unprecedented amount of patient treatment and outcome data is available to researchers. Research is a tertiary priority in the EHR, where the priorities are patient care and billing. Because of this, the data is not standardized or formatted in a manner easily adapted to machine learning approaches. Data may be missing for a large variety of reasons ranging from individual input styles to differences in clinical decision making, for example, which lab tests to issue. Few patients are annotated at a research quality, limiting sample size and presenting a moving gold standard. Patient progression over time is key to understanding many diseases but many machine learning algorithms require a snapshot, at a single time point, to create a usable vector form. In this dissertation, we develop new machine learning methods and computational workflows to extract hidden phenotypes from the Electronic Health Record (EHR). In Part 1, we use a semi-supervised deep learning approach to compensate for the low number of research quality labels present in the EHR. In Part 2, we examine and provide recommendations for characterizing and managing the large amount of missing data inherent to EHR data. In Part 3, we present an adversarial approach to generate synthetic data that closely resembles the original data while protecting subject privacy. We also introduce a workflow to enable reproducible research even when data cannot be shared. In Part 4, we introduce a novel strategy to first extract sequential data from the EHR and then demonstrate the ability to model these sequences with deep learning

    Enhanced energy recovery in a microbial fuel cell and improved catalytic chromium (VI) reduction using biogenic palladium nanoparticles

    Get PDF
    Dissertation (MEng (Water Utilisation Engineering) )--University of Pretoria, 2021.Palladium (Pd) is a cheap and effective electrocatalyst that is capable of replacing platinum (Pt) in various applications. However, the problem in using chemically synthesized Pd nanoparticles (Chem-PdNPs) is that they are mostly fabricated using toxic chemicals under severe conditions. In this study, we present a more environmentally friendly process in the fabrication of biogenic Pd nanoparticles (Bio-PdNPs) using Citrobacter sp. isolated from wastewater sludge. Successful fabrication of Bio-PdNPs was achieved under anaerobic conditions at pH 6 and a temperature of 30 °C using sodium formate (HCOONa) as an electron donor. Citrobacter sp. showed biosorption capabilities with no enzymatic contribution to palladium (II) [Pd(II)] uptake during absence of HCOONa in both live and dead cells. Citrobacter sp. live cells also displayed high enzymatic contribution to the removal of Pd(II) by biological reduction. This was confirmed by scanning electron microscope (SEM), electron dispersive spectroscopy (EDS) and X-ray diffraction (XRD) characterization, which revealed the presence of Bio-PdNPs deposited on the bacterial cells. The Bio-PdNPs successfully enhanced the anode performance of the Microbial Fuel Cell (MFC). The MFC with the highest Bio-PdNPs loading (4 mg Bio-PdNP cm-2) achieved a maximum power density of 539.3 mW m-3 (4.01 mW m-2) and peak voltage of 328.4 mV. The discharge of hexavalent chromium [Cr(VI)] from several anthropogenic activities which are responsible for the production of Cr(VI) leads to environmental pollution and concerns over plant growth inhibition and carcinogenesis in animal life-forms. In this study, we explored a simple yet cost effective method for the catalytic reduction of Cr(VI) using chemically and biologically synthesized Pd nanoparticles. The Bio-PdNPs were fabricated at a wide range of Pd(II) concentrations within 24 h, pH of 6 and a lower temperature of 30 °C as compared to Chem-PdNPs which were fabricated at the same pH but at a different temperature of 70 °C. In addition, the presence of elemental Pd was confirmed by SEM, EDS and XRD. In this study, it was shown that the Bio-PdNPs have the capability of improving the catalytic reduction of Cr(VI) due to them being smaller in size and also being highly dispersed as compared to Chem-PdNPs. Furthermore, although both the synthesis methods used in this study required less chemical agents which are not severely harmful and are cost effective, the Bio-PdNPs fabricated using the catalyst concentration of 1.1 g Bio-PdNPs L-1 resulted in a faster removal rate where 0.962 mmol L-1 of Cr(VI) was removed within 2 h. In order to model the kinetics of the catalytic Cr(VI) reduction, the Langmuir–Hinshelwood mechanism was successfully used. The Langmuir–Hinshelwood mechanism is used to describe the bimolecular reactions on the catalysts surfaces and considers a case where two reactants get adsorbed on the surface of the catalyst and then react together afterwards.Chemical EngineeringMEng (Water Utilisation Engineering)Unrestricte

    Chromium (VI) reduction in a microbial fuel cell using biogenic palladium nanoparticles and model development

    Get PDF
    Microbial fuel cell (MFC) architectural modification is increasingly becoming an important area of research due to the need to improve energy recovery. In this study, we present a simple low-cost modification method of the anode that does not require pre-treatment-step involving hazardous chemicals to improve performance. The modification step involves deposition of granular activated carbon (GAC) which is highly conductive and provides a high specific surface area inside a carbon cloth that acts as an anode and as a supporting material. The GAC particle size of 0.6-1.1 mm led to an increase in air-cathode MFC performance due to both an increase in the available surface area of 879.5 m2 g-1 for cell attachment based on Brunauer, Emmett, and Teller (BET) results, and an increase in relevant surface for cell attachment which was rough based on the scanning electron microscope (SEM) results. This study also showed that there is an economic benefit in modifying carbon cloth with GAC. The second part of the study explored an environmentally friendly process for the treatment of Cr(VI) with a codeposition of biologically synthesized zero-valent palladium nanoparticles on the anode electrode of a dual chambered microbial fuel cell (MFC). The MFC featured a granular activated carbon (GAC) anode modified with biogenic palladium nanoparticles (Bio-PdNPs). Temperature, pH, and initial Cr(VI) concentration were first optimized to 38 °C, pH 4, and 100 mg L-1 Cr(VI), respectively. Thereafter, the GAC average particle size was successfully optimized to 0.6-1.1 mm. The results from the study showed that GAC can be successfully modified using Bio-PdNPs to improve the performance of Cr(VI)-reducing MFC with Bio-PdNPs loading of 6 mg Bio-PdNPs g-1 GAC resulting in peak output potential difference of 393.1 mV, maximum power density of 1965.4 mW m-3, and complete removal of 100 mg L-1 Cr(VI) in 25 h. The third part of the study was to develop a dynamic computational model for Cr(VI) reduction in MFC. The model incorporated Monod kinetics with Butler-Volmer equation. Accuracy of the parameter estimation and capacity of prediction of the model was validated with usage of two independent data sets. The results of the normalized root mean squared errors for both reduction of Cr(VI) and output voltage were less than 0.2, which indicated that the model fit for the experimental data was acceptable. The model was then used to demonstrate the effect of both the primary microbial cell and substrate concentration on Cr(VI)-reducing MFC performance. An increase in primary microbial cell and substrate concentration improved the reduction rate of Cr(VI) in the cathode chamber. Lastly, the model was used for the optimization of both concentrations. The time it takes to achieve maximum power output was minimized by using a primary microbial cell concentration of 25 mg L-1 as opposed to a value of 45 mg L-1. In addition, the substrate concentration was optimized to 60 mmol L-1 as opposed to a value of 120 mmol L-1. Overall, the model provided an initial step into determining optimal MFC operational conditions without doing much lab-work.Thesis (PhD (Chemical Engineering))--University of Pretoria, 2022.Chemical EngineeringPhD (Chemical Engineering)Unrestricte

    A meta-data based method for DNA microarray imputation

    No full text
    Abstract Background DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. Results We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. Conclusion Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available.</p
    corecore