235 research outputs found
Probability calibration trees
Obtaining accurate and well calibrated probability estimates from classifiers is useful in many applications, for example, when minimising the expected cost of classifications. Existing methods of calibrating probability estimates are applied globally, ignoring the potential for improvements by applying a more fine-grained model. We propose probability calibration trees, a modification of logistic model trees that identifies regions of the input space in which different probability calibration models are learned to improve performance. We compare probability calibration trees to two widely used calibration methods—isotonic regression and Platt scaling—and show that our method results in lower root mean squared error on average than both methods, for estimates produced by a variety of base learners
Tree-structured multiclass probability estimators
Nested dichotomies are used as a method of transforming a multiclass classification problem into a series of binary problems. A binary tree structure is constructed over the label space that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. Several distinct nested dichotomy structures can be built in an ensemble for superior performance. In this thesis, we introduce two new methods for constructing more accurate nested dichotomies. Random-pair selection is a subset selection method that aims to group similar classes together in a non-deterministic fashion to easily enable the construction of accurate ensembles. Multiple subset evaluation takes this, and other subset selection methods, further by evaluating several different splits and choosing the best performing one. Finally, we also discuss the calibration of the probability estimates produced by nested dichotomies. We observe that nested dichotomies systematically produce under-confident predictions, even if the binary classifiers are well calibrated, and especially when the number of classes is high. Furthermore, substantial performance gains can be made when probability calibration methods are also applied to the internal models
The SOD2 C47T polymorphism influences NAFLD fibrosis severity: evidence from case-control and intra-familial allele association studies.
AIMS:
Non-alcoholic fatty liver disease (NAFLD) is a complex disease trait where genetic variations and environment interact to determine disease progression. The association of PNPLA3 with advanced disease has been consistently demonstrated but many other modifier genes remain unidentified. In NAFLD, increased fatty acid oxidation produces high levels of reactive oxygen species. Manganese-dependent superoxide dismutase (MnSOD), encoded by the SOD2 gene, plays an important role in protecting cells from oxidative stress. A common non-synonymous polymorphism in SOD2 (C47T; rs4880) is associated with decreased MnSOD mitochondrial targeting and activity making it a good candidate modifier of NAFLD severity.
METHODS:
The relevance of the SOD2 C47T polymorphism to fibrotic NAFLD was assessed by two complementary approaches: we sought preferential transmission of alleles from parents to affected children in 71 family trios and adopted a case-control approach to compare genotype frequencies in a cohort of 502 European NAFLD patients.
RESULTS:
In the family study, 55 families were informative. The T allele was transmitted on 47/76 (62%) possible occasions whereas the C allele was transmitted on only 29/76 (38%) occasions, p=0.038. In the case control study, the presence of advanced fibrosis (stage>1) increased with the number of T alleles, p=0.008 for trend. Multivariate analysis showed susceptibility to advanced fibrotic disease was determined by SOD2 genotype (OR 1.56 (95% CI 1.09-2.25), p=0.014), PNPLA3 genotype (p=0.041), type 2 diabetes mellitus (p=0.009) and histological severity of NASH (p=2.0×10(-16)).
CONCLUSIONS:
Carriage of the SOD2 C47T polymorphism is associated with more advanced fibrosis in NASH
Ensembles of nested dichotomies with multiple subset evaluation
A system of nested dichotomies (NDs) is a method of decomposing a multiclass problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of NDs produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of NDs, regardless of whether they are employed as an individual model or in an ensemble setting
On calibration of nested dichotomies
Nested dichotomies (NDs) are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these NDs typically exhibit poor probability calibration, even when the binary base models are well-calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full ND structure, especially when the number of classes is high
Protein Translation and Cell Death: The Role of Rare tRNAs in Biofilm Formation and in Activating Dormant Phage Killer Genes
We discovered previously that the small Escherichia coli proteins Hha (hemolysin expression modulating protein) and the adjacent, poorly-characterized YbaJ are important for biofilm formation; however, their roles have been nebulous. Biofilms are intricate communities in which cell signaling often converts single cells into primitive tissues. Here we show that Hha decreases biofilm formation dramatically by repressing the transcription of rare codon tRNAs which serves to inhibit fimbriae production and by repressing to some extent transcription of fimbrial genes fimA and ihfA. In vivo binding studies show Hha binds to the rare codon tRNAs argU, ileX, ileY, and proL and to two prophage clusters D1P12 and CP4-57. Real-time PCR corroborated that Hha represses argU and proL, and Hha type I fimbriae repression is abolished by the addition of extra copies of argU, ileY, and proL. The repression of transcription of rare codon tRNAs by Hha also leads to cell lysis and biofilm dispersal due to activation of prophage lytic genes rzpD, yfjZ, appY, and alpA and due to induction of ClpP/ClpX proteases which activate toxins by degrading antitoxins. YbaJ serves to mediate the toxicity of Hha. Hence, we have identified that a single protein (Hha) can control biofilm formation by limiting fimbriae production as well as by controlling cell death. The mechanism used by Hha is the control of translation via the availability of rare codon tRNAs which reduces fimbriae production and activates prophage lytic genes. Therefore, Hha acts as a toxin in conjunction with co-transcribed YbaJ (TomB) that attenuates Hha toxicity
TM6SF2 rs58542926 influences hepatic fibrosis progression in patients with non-alcoholic fatty liver disease.
Non-alcoholic fatty liver disease (NAFLD) is an increasingly common condition, strongly associated with the metabolic syndrome, that can lead to progressive hepatic fibrosis, cirrhosis and hepatic failure. Subtle inter-patient genetic variation and environmental factors combine to determine variation in disease progression. A common non-synonymous polymorphism in TM6SF2 (rs58542926 c.449 C>T, p.Glu167Lys) was recently associated with increased hepatic triglyceride content, but whether this variant promotes clinically relevant hepatic fibrosis is unknown. Here we confirm that TM6SF2 minor allele carriage is associated with NAFLD and is causally related to a previously reported chromosome 19 GWAS signal that was ascribed to the gene NCAN. Furthermore, using two histologically characterized cohorts encompassing steatosis, steatohepatitis, fibrosis and cirrhosis (combined n=1,074), we demonstrate a new association, independent of potential confounding factors (age, BMI, type 2 diabetes mellitus and PNPLA3 rs738409 genotype), with advanced hepatic fibrosis/cirrhosis. These findings establish new and important clinical relevance to TM6SF2 in NAFLD
- …