134 research outputs found

    Changing realities: perspectives on Balinese rice cultivation

    No full text
    This paper discusses issues of agrarian change in south-central Bali. The proximity to urban areas, especially the tourist centers along the southern coast, provides, any off-farm employment opportunities for small scale farming households. Although rice farming continues, for many households it has become a side business. The flexible nature of rice farming in terms of labour input and available casual off-farm work allows farming households to allocate their available labor to a variety of on-farm and off-farm income generating activities. The subak which unites farmers in the irrigation and cultivation of the rice crop plays an important role in supporting this flexibility. Still, the future of rice farming and the organisation behind looks rather dim with an unwilling younger generation to work in the “mud” and little appreciation of the many benefits subak provides not only to the farming but the wider community

    On PAC-Bayesian Bounds for Random Forests

    Full text link
    Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments

    Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search

    Full text link
    Top-k maximum inner product search (MIPS) is a central task in many machine learning applications. This paper extends top-k MIPS with a budgeted setting, that asks for the best approximate top-k MIPS given a limit of B computational operations. We investigate recent advanced sampling algorithms, including wedge and diamond sampling to solve it. Though the design of these sampling schemes naturally supports budgeted top-k MIPS, they suffer from the linear cost from scanning all data points to retrieve top-k results and the performance degradation for handling negative inputs. This paper makes two main contributions. First, we show that diamond sampling is essentially a combination between wedge sampling and basic sampling for top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge is competitive (often superior) to diamond on approximating top-k MIPS regarding both efficiency and accuracy. Second, we propose a series of algorithmic engineering techniques to deploy wedge sampling on budgeted top-k MIPS. Our novel deterministic wedge-based algorithm runs significantly faster than the state-of-the-art methods for budgeted and exact top-k MIPS while maintaining the top-5 precision at least 80% on standard recommender system data sets.Comment: ECML-PKDD 202

    Information Bottleneck: Exact Analysis of (Quantized) Neural Networks

    Full text link
    The information bottleneck (IB) principle has been suggested as a way to analyze deep neural networks. The learning dynamics are studied by inspecting the mutual information (MI) between the hidden layers and the input and output. Notably, separate fitting and compression phases during training have been reported. This led to some controversy including claims that the observations are not reproducible and strongly dependent on the type of activation function used as well as on the way the MI is estimated. Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function. Our study shows that the initial IB results were not artifacts of binning when computing the MI. However, the critical claim that the compression phase may not be observed for some networks also holds true

    Inverse sequence similarity of proteins does not imply structural similarity

    Get PDF
    AbstractThere is a debate on the folding of proteins with inverted sequences. Theoretical approaches and experiments give contradictory results. Many proteins in the Protein Data Bank (PDB) show conspicuous inverse sequence similarity (ISS) to each other. Here we analyze whether this ISS is related to structural similarity. For the first time, we performed a large scale three-dimensional (3-D) superposition of corresponding Cα atoms of forwardly and inversely aligned proteins and tested the degree of secondary structure identity between them. Comparing proteins of less than 50% pairwise sequence identity, only 0.5% of the inversely aligned pairs had similar folds (99 out of 19 073), whereas about 9% of forwardly aligned proteins in the same score and length range show similar 3-D structures (1731 out of 19 248). This observation strongly supports the view that the inversion of sequences in almost all cases leads to a different folding property of the protein. Inverted sequences are thus suitable as protein-like sequences for control purposes without relations to existing proteins

    Regulation of mammalian cell cycle progression in the regenerating liver.

    Get PDF
    International audienceThe process of cell division in mammalian cells is orchestrated by cell-cycle-dependent oscillations of cyclin protein levels. Cyclin levels are controlled by redundant transcriptional, post-translational and degradation feedback loops. How each of these separate loops contributes to the regulation of the key cell cycle events and to the connection between the G1-S transition and the subsequent mitotic events is under investigation. Here, we present an integrated computational model of the mammalian cell cycle based on the sequential activation of cyclins. We validate the model against experimental data on liver cells (hepatocytes), which undergo one or two rounds of synchronous circadian-clock gated cell divisions during liver regeneration, after partial hepatectomy (PH). The model exhibits bandpass filter properties that allow the system to ignore strong but transient, or sustained but weak damages after PH. Bifurcation analysis of the model suggests two different threshold mechanisms for the progression of the cell through mitosis. These results are coherent with the notion that the mitotic exit in mammalian cells is bistable, and suggests that Cdc20 homologue 1 (Cdh1) is an important regulator of mitosis. Regulation by Cdh1 also explains the observed G2/M phase prolongation after hepatocyte growth factor (HGF) stimulation during S phase

    Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

    Full text link
    We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble

    Altered Cytokine Response of Human Brain Endothelial Cells after Stimulation with Malaria Patient Plasma

    Get PDF
    Infections with the deadliest malaria parasite, Plasmodium falciparum, are accompanied by a strong immunological response of the human host. To date, more than 30 cytokines have been detected in elevated levels in plasma of malaria patients compared to healthy controls. Endothelial cells (ECs) are a potential source of these cytokines, but so far it is not known if their cytokine secretion depends on the direct contact of the P. falciparum-infected erythrocytes (IEs) with ECs in terms of cytoadhesion. Culturing ECs with plasma from malaria patients (27 returning travellers) resulted in significantly increased secretion of IL-11, CXCL5, CXCL8, CXCL10, vascular endothelial growth factor (VEGF) and angiopoietin-like protein 4 (ANGPTL4) if compared to matching controls (22 healthy individuals). The accompanying transcriptome study of the ECs identified 43 genes that were significantly increased in expression (≄1.7 fold) after co-incubation with malaria patient plasma, including cxcl5 and angptl4. Further bioinformatic analyses revealed that biological processes such as cell migration, cell proliferation and tube development were particularly affected in these ECs. It can thus be postulated that not only the cytoadhesion of IEs, but also molecules in the plasma of malaria patients exerts an influence on ECs, and that not only the immunological response but also other processes, such as angiogenesis, are altered
    • 

    corecore