134 research outputs found
Changing realities: perspectives on Balinese rice cultivation
This paper discusses issues of agrarian change in south-central Bali. The proximity to urban areas, especially the tourist centers along the southern coast, provides, any off-farm employment opportunities for small scale farming households. Although rice farming continues, for many households it has become a side business. The flexible nature of rice farming in terms of labour input and available casual off-farm work allows farming households to allocate their available labor to a variety of on-farm and off-farm income generating activities. The subak which unites farmers in the irrigation and cultivation of the rice crop plays an important role in supporting this flexibility. Still, the future of rice farming and the organisation behind looks rather dim with an unwilling younger generation to work in the âmudâ and little appreciation of the many benefits subak provides not only to the farming but the wider community
On PAC-Bayesian Bounds for Random Forests
Existing guarantees in terms of rigorous upper bounds on the generalization
error for the original random forest algorithm, one of the most frequently used
machine learning methods, are unsatisfying. We discuss and evaluate various
PAC-Bayesian approaches to derive such bounds. The bounds do not require
additional hold-out data, because the out-of-bag samples from the bagging in
the training process can be exploited. A random forest predicts by taking a
majority vote of an ensemble of decision trees. The first approach is to bound
the error of the vote by twice the error of the corresponding Gibbs classifier
(classifying with a single member of the ensemble selected at random). However,
this approach does not take into account the effect of averaging out of errors
of individual classifiers when taking the majority vote. This effect provides a
significant boost in performance when the errors are independent or negatively
correlated, but when the correlations are strong the advantage from taking the
majority vote is small. The second approach based on PAC-Bayesian C-bounds
takes dependencies between ensemble members into account, but it requires
estimating correlations between the errors of the individual classifiers. When
the correlations are high or the estimation is poor, the bounds degrade. In our
experiments, we compute generalization bounds for random forests on various
benchmark data sets. Because the individual decision trees already perform
well, their predictions are highly correlated and the C-bounds do not lead to
satisfactory results. For the same reason, the bounds based on the analysis of
Gibbs classifiers are typically superior and often reasonably tight. Bounds
based on a validation set coming at the cost of a smaller training set gave
better performance guarantees, but worse performance in most experiments
Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search
Top-k maximum inner product search (MIPS) is a central task in many machine
learning applications. This paper extends top-k MIPS with a budgeted setting,
that asks for the best approximate top-k MIPS given a limit of B computational
operations. We investigate recent advanced sampling algorithms, including wedge
and diamond sampling to solve it. Though the design of these sampling schemes
naturally supports budgeted top-k MIPS, they suffer from the linear cost from
scanning all data points to retrieve top-k results and the performance
degradation for handling negative inputs.
This paper makes two main contributions. First, we show that diamond sampling
is essentially a combination between wedge sampling and basic sampling for
top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge
is competitive (often superior) to diamond on approximating top-k MIPS
regarding both efficiency and accuracy. Second, we propose a series of
algorithmic engineering techniques to deploy wedge sampling on budgeted top-k
MIPS. Our novel deterministic wedge-based algorithm runs significantly faster
than the state-of-the-art methods for budgeted and exact top-k MIPS while
maintaining the top-5 precision at least 80% on standard recommender system
data sets.Comment: ECML-PKDD 202
Information Bottleneck: Exact Analysis of (Quantized) Neural Networks
The information bottleneck (IB) principle has been suggested as a way to
analyze deep neural networks. The learning dynamics are studied by inspecting
the mutual information (MI) between the hidden layers and the input and output.
Notably, separate fitting and compression phases during training have been
reported. This led to some controversy including claims that the observations
are not reproducible and strongly dependent on the type of activation function
used as well as on the way the MI is estimated. Our study confirms that
different ways of binning when computing the MI lead to qualitatively different
results, either supporting or refusing IB conjectures. To resolve the
controversy, we study the IB principle in settings where MI is non-trivial and
can be computed exactly. We monitor the dynamics of quantized neural networks,
that is, we discretize the whole deep learning system so that no approximation
is required when computing the MI. This allows us to quantify the information
flow without measurement errors. In this setting, we observed a fitting phase
for all layers and a compression phase for the output layer in all experiments;
the compression in the hidden layers was dependent on the type of activation
function. Our study shows that the initial IB results were not artifacts of
binning when computing the MI. However, the critical claim that the compression
phase may not be observed for some networks also holds true
Inverse sequence similarity of proteins does not imply structural similarity
AbstractThere is a debate on the folding of proteins with inverted sequences. Theoretical approaches and experiments give contradictory results. Many proteins in the Protein Data Bank (PDB) show conspicuous inverse sequence similarity (ISS) to each other. Here we analyze whether this ISS is related to structural similarity. For the first time, we performed a large scale three-dimensional (3-D) superposition of corresponding Cα atoms of forwardly and inversely aligned proteins and tested the degree of secondary structure identity between them. Comparing proteins of less than 50% pairwise sequence identity, only 0.5% of the inversely aligned pairs had similar folds (99 out of 19â073), whereas about 9% of forwardly aligned proteins in the same score and length range show similar 3-D structures (1731 out of 19â248). This observation strongly supports the view that the inversion of sequences in almost all cases leads to a different folding property of the protein. Inverted sequences are thus suitable as protein-like sequences for control purposes without relations to existing proteins
Regulation of mammalian cell cycle progression in the regenerating liver.
International audienceThe process of cell division in mammalian cells is orchestrated by cell-cycle-dependent oscillations of cyclin protein levels. Cyclin levels are controlled by redundant transcriptional, post-translational and degradation feedback loops. How each of these separate loops contributes to the regulation of the key cell cycle events and to the connection between the G1-S transition and the subsequent mitotic events is under investigation. Here, we present an integrated computational model of the mammalian cell cycle based on the sequential activation of cyclins. We validate the model against experimental data on liver cells (hepatocytes), which undergo one or two rounds of synchronous circadian-clock gated cell divisions during liver regeneration, after partial hepatectomy (PH). The model exhibits bandpass filter properties that allow the system to ignore strong but transient, or sustained but weak damages after PH. Bifurcation analysis of the model suggests two different threshold mechanisms for the progression of the cell through mitosis. These results are coherent with the notion that the mitotic exit in mammalian cells is bistable, and suggests that Cdc20 homologue 1 (Cdh1) is an important regulator of mitosis. Regulation by Cdh1 also explains the observed G2/M phase prolongation after hepatocyte growth factor (HGF) stimulation during S phase
Second Order PAC-Bayesian Bounds for the Weighted Majority Vote
We present a novel analysis of the expected risk of weighted majority vote in
multiclass classification. The analysis takes correlation of predictions by
ensemble members into account and provides a bound that is amenable to
efficient minimization, which yields improved weighting for the majority vote.
We also provide a specialized version of our bound for binary classification,
which allows to exploit additional unlabeled data for tighter risk estimation.
In experiments, we apply the bound to improve weighting of trees in random
forests and show that, in contrast to the commonly used first order bound,
minimization of the new bound typically does not lead to degradation of the
test error of the ensemble
Altered Cytokine Response of Human Brain Endothelial Cells after Stimulation with Malaria Patient Plasma
Infections with the deadliest malaria parasite, Plasmodium falciparum, are accompanied by a strong immunological response of the human host. To date, more than 30 cytokines have been detected in elevated levels in plasma of malaria patients compared to healthy controls. Endothelial cells (ECs) are a potential source of these cytokines, but so far it is not known if their cytokine secretion depends on the direct contact of the P. falciparum-infected erythrocytes (IEs) with ECs in terms of cytoadhesion. Culturing ECs with plasma from malaria patients (27 returning travellers) resulted in significantly increased secretion of IL-11, CXCL5, CXCL8, CXCL10, vascular endothelial growth factor (VEGF) and angiopoietin-like protein 4 (ANGPTL4) if compared to matching controls (22 healthy individuals). The accompanying transcriptome study of the ECs identified 43 genes that were significantly increased in expression (â„1.7 fold) after co-incubation with malaria patient plasma, including cxcl5 and angptl4. Further bioinformatic analyses revealed that biological processes such as cell migration, cell proliferation and tube development were particularly affected in these ECs. It can thus be postulated that not only the cytoadhesion of IEs, but also molecules in the plasma of malaria patients exerts an influence on ECs, and that not only the immunological response but also other processes, such as angiogenesis, are altered
A global alliance declaring war on cassava viruses in Africa
[Without Abstract
- âŠ