8 research outputs found

    Active site prediction using evolutionary and structural information

    Get PDF
    Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites

    Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

    Get PDF
    We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities

    L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier

    Get PDF
    To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance

    Textual Analysis in Real Estate

    No full text
    This paper incorporates text data from MLS listings from Atlanta, GA into a hedonic pricing model. Text is found to decrease pricing error by more than 25%. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables. The LASSO procedure and variants are shown to outperform least-squares in out-of-sample testing

    Rivaroxaban with or without aspirin in stable cardiovascular disease

    No full text
    BACKGROUND: We evaluated whether rivaroxaban alone or in combination with aspirin would be more effective than aspirin alone for secondary cardiovascular prevention. METHODS: In this double-blind trial, we randomly assigned 27,395 participants with stable atherosclerotic vascular disease to receive rivaroxaban (2.5 mg twice daily) plus aspirin (100 mg once daily), rivaroxaban (5 mg twice daily), or aspirin (100 mg once daily). The primary outcome was a composite of cardiovascular death, stroke, or myocardial infarction. The study was stopped for superiority of the rivaroxaban-plus-aspirin group after a mean follow-up of 23 months. RESULTS: The primary outcome occurred in fewer patients in the rivaroxaban-plus-aspirin group than in the aspirin-alone group (379 patients [4.1%] vs. 496 patients [5.4%]; hazard ratio, 0.76; 95% confidence interval [CI], 0.66 to 0.86; P<0.001; z=−4.126), but major bleeding events occurred in more patients in the rivaroxaban-plus-aspirin group (288 patients [3.1%] vs. 170 patients [1.9%]; hazard ratio, 1.70; 95% CI, 1.40 to 2.05; P<0.001). There was no significant difference in intracranial or fatal bleeding between these two groups. There were 313 deaths (3.4%) in the rivaroxaban-plus-aspirin group as compared with 378 (4.1%) in the aspirin-alone group (hazard ratio, 0.82; 95% CI, 0.71 to 0.96; P=0.01; threshold P value for significance, 0.0025). The primary outcome did not occur in significantly fewer patients in the rivaroxaban-alone group than in the aspirin-alone group, but major bleeding events occurred in more patients in the rivaroxaban-alone group. CONCLUSIONS: Among patients with stable atherosclerotic vascular disease, those assigned to rivaroxaban (2.5 mg twice daily) plus aspirin had better cardiovascular outcomes and more major bleeding events than those assigned to aspirin alone. Rivaroxaban (5 mg twice daily) alone did not result in better cardiovascular outcomes than aspirin alone and resulted in more major bleeding events
    corecore