52 research outputs found

    Templateâ based protein structure prediction in CASP11 and retrospect of Iâ TASSER in the last decade

    Full text link
    We report the structure prediction results of a new composite pipeline for templateâ based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based metaâ threading programs, the QUARK ab initio folding program is extended to generate initial fullâ length models under strong constraints from template alignments. The final atomic models are then constructed by Iâ TASSER based fragment reassembly simulations, followed by the fragmentâ guided molecular dynamic simulation and the MQAPâ based model selection. It was found that the inclusion of QUARKâ TBM simulations as an intermediate modeling step could help improve the quality of the Iâ TASSER models for both Easy and Hard TBM targets. Overall, the average TMâ score of the first Iâ TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threadingâ aligned regions reduced from 5.8 to 4.7 à . Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the Iâ TASSER pipeline in the last five CASP experiments (CASP7â 11). The data show no clear progress of the LOMETS threading programs over PSIâ BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomicâ level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233â 246. © 2015 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134137/1/prot24918.pd

    Features weight estimation using a genetic algorithm for customer churn prediction in the telecom sector

    Get PDF
    © Springer Nature Switzerland AG 2019. The high dimensional dataset results in more noise, will require more computations, has huge sparsity linked with high dimensional features and has thus introduced great challenges in data analysis. To efficiently manipulate and address the impact of the challenges faced by high dimensional dataset, researchers used several features reduction methods. Feature reduction is a formidable step, when dealing with improving the accuracy and reducing the processing time, within high-dimensional data; wherein the feature set is reduced before applying data mining or statistical methods. However with attribute reduction, there is a high chance of loss of important information. In order to avoid information loss, one way is to assign weights to the attributes through domain-expert which is a subjective exercise. It is not only costly but also requires human-expert of the field. Therefore, there is a need for a technique to automatically assign more appropriate weights without involving domain expert. This paper presents a novel features weighting technique. The technique employs a genetic algorithm (GA) to automatically assign weights to the attributes based on Naïve Bayes (NB) classification. Experiments have been conducted on publically available dataset to compare the performance of the proposed approach and NB approach without the weighted features for predicting customer churn in telecommunication sector. The experimental results have demonstrated that the proposed technique outperformed through achieving an overall 89.1% accuracy, 95.65% precision which shows the effectiveness of the proposed technique

    A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery

    Get PDF
    The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information

    Computational Methods for Resolving Heterogeneity in Biological Data

    Full text link
    The complexity in biological data reflects the heterogeneous nature of biological processes. Computational methods need to preserve as much information regarding the biological process of interest as possible. In this work, we explore three specific tasks about resolving biological heterogeneity. The first task is to infer heterogeneous phylogenetic relationship using molecular data. The common likelihood models for phylogenetic inference often makes strong assumptions about the evolution process across different lineages and different mutation sites. We use convolutional neural network to infer phylogenies instead, allowing the model to describe more heterogeneous evolution process. The model outperformes commonly used algorithms on diverse simulation datasets. The second task is to infer the clonal composition and phylogeny from bulk DNA sequencing data of tumour samples. Estimating clonal information from bulk data often involves resolving mixture models. Unfortunately, simpler models are often unable to capture complex genetic alteration events in tumour cells, while more sophisticated models incur heavy computational burdens and are hard to converge. We solve the challenge through density-hinted optimization with post hoc adjustment. The model makes conservative predications but yields better accuracy in assessing co-clustering relationship among the somatic mutations. The third task is to estimate the abundance of splicing transcripts from full-length single-cell RNA sequencing data. Transcript inference from RNA sequencing data needs a plethora of reads for accurate abundance estimation. Yet single-cell sequencing yields much fewer reads than bulk sequencing. To recover transcripts from full-length single-cell RNA sequencing data, we pool reads from similar cells to help assign transcripts without disrupting the cluster structures. These methods describe complex biological processes with minimal runtime overhead. Taking these methods as examples, we will briefly discuss the rationale and some general principals in designing these methods.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/151385/1/zhanghj_1.pd

    Stabilization for two-dimensional delta operator systems with time-varying delays and actuator saturation

    No full text
    Abstract In this paper, stabilization is studied for a two-dimensional delta operator system with time-varying delays and actuator saturation. Both lower and upper bounds of the time-varying delays are considered. An estimate of the domain of attraction for the two-dimensional delta operator system is introduced to analyze stability of the closed-loop system. A state feedback controller is designed via a Lyapunov–Krasovskii functional approach for the two-dimensional delta operator system with time-varying delays and actuator saturation. Two numerical examples are given to illustrate the effectiveness and advantages of the developed techniques

    Non-fragile robust stabilization and H-infinity control for uncertain stochastic nonlinear time-delay systems

    No full text
    This paper deals with the problem of non-fragile robust stabilization and H∞ control for a class of uncertain stochastic nonlinear time-delay systems. The parametric uncertainties are real time-varying as well as norm bounded. The time-delay factors are unknown and time-varying with known bounds. The aim is to design a memoryless non-fragile state feedback control law such that the closed-loop system is stochastically asymptotically stable in the mean square and the effect of the disturbance input on the controlled output is less than a prescribed level for all admissible parameter uncertainties. New sufficient conditions for the existence of such controllers are presented based on the linear matrix inequalities (LMIs) approach. Numerical example is given to illustrate the effectiveness of the developed techniques

    Joint learning improves protein abundance prediction in cancers

    Full text link
    Abstract Background The classic central dogma in biology is the information flow from DNA to mRNA to protein, yet complicated regulatory mechanisms underlying protein translation often lead to weak correlations between mRNA and protein abundances. This is particularly the case in cancer samples and when evaluating the same gene across multiple samples. Results Here, we report a method for predicting proteome from transcriptome, using a training dataset provided by NCI-CPTAC and TCGA, consisting of transcriptome and proteome data from 77 breast and 105 ovarian cancer samples. First, we establish a generic model capturing the correlation between mRNA and protein abundance of a single gene. Second, we build a gene-specific model capturing the interdependencies among multiple genes in a regulatory network. Third, we create a cross-tissue model by joint learning the information of shared regulatory networks and pathways across cancer tissues. Our method ranked first in the NCI-CPTAC DREAM Proteogenomics Challenge, and the predictive performance is close to the accuracy of experimental replicates. Key functional pathways and network modules controlling the proteomic abundance in cancers were revealed, in particular metabolism-related genes. Conclusions We present a method to predict proteome from transcriptome, leveraging data from different cancer tissues to build a trans-tissue model, and suggest how to integrate information from multiple cancers to provide a foundation for further research.http://deepblue.lib.umich.edu/bitstream/2027.42/173531/1/12915_2019_Article_730.pd

    Effects of hydrostatic stress and concentrationdependent elastic modulus on diffusion-induced stresses in cylindrical Li-ion batteries

    No full text
    The effects of hydrostatic stress and concentration-dependent elastic modulus on diffusion-induced stress (DIS) in a cylindrical Li-ion battery are studied. It is found that the hydrostatic stress has little effect on the distribution of stresses but the change of elastic modulus has a significant effect on the distribution of stresses. The hydrostatic stress has little effect on the location of maximum hoop stress in active layer. The change of elastic modulus can slow down the trend with closing to the inner surface for the location of the maximum hoop stress in active layer with the thicker current collector or larger modulus of current collector and speed up the trend with closing to the outer surface with the smaller ratio of electrode radius to thickness. The current collector should be as thin and soft as possible when its premise strength is satisfied. The ratio of electrode radius to thickness should be preferably larger than 15
    corecore