1,095 research outputs found
A POWER INDEX BASED FRAMEWORKFOR FEATURE SELECTION PROBLEMS
One of the most challenging tasks in the Machine Learning context is the feature selection. It consists in selecting the best set of features to use in the training and prediction processes. There are several benefits from pruning the set of actually operational features: the consequent reduction of the computation time, often a better quality of the prediction, the possibility to use less data to create a good predictor.
In its most common form, the problem is called single-view feature selection problem, to distinguish it from the feature selection task in Multi-view learning. In the latter, each view corresponds to a set of features and one would like to enact feature selection on each view, subject to some global constraints.
A related problem in the context of Multi-View Learning, is Feature Partitioning: it consists in splitting the set of features of a single large view into two or more views so that it becomes possible to create a good predictor based on each view. In this case, the best features must be distributed between the views, each view should contain synergistic features, while features that interfere disruptively must be placed in different views.
In the semi-supervised multi-view task known as Co-training, one requires also that each predictor trained on an individual view is able to teach something to the other views: in classification tasks for instance, one view should learn to classify unlabelled examples based on the guess provided by the other views.
There are several ways to address these problems. A set of techniques is inspired by Coalitional Game Theory. Such theory defines several useful concepts, among which two are of high practical importance: the concept of power index and the concept of interaction index. When used in the context of feature selection, they take the following meaning: the power index is a (context-dependent) synthesis measure of the prediction\u2019s capability of a feature, the interaction index is a (context-dependent) synthesis measure of the interaction (constructive/disruptive interference) between two features:
it can be used to quantify how the collaboration between two features enhances their prediction capabilities. An important point is that the powerindex of a feature is different from the predicting power of the feature in isolation: it takes into account, by a suitable averaging, the context, i.e. the fact that the feature is acting, together with other features, to train a model.
Similarly, the interaction index between two features takes into account the context, by suitably averaging the interaction with all the other features.
In this work we address both the single-view and the multi-view problems as follows.
The single-view feature selection problem, is formalized as the problem of maximization of a pseudo-boolean function, i.e. a real valued set function (that maps sets of features into a performance metric). Since one has to enact a search over (a considerable portion of) the Boolean lattice (without any special guarantees, except, perhaps, positivity) the problem is in general NP-hard. We address the problem producing candidate maximum coalitions through the selection of the subset of features characterized by the highest
power indices and using the coalition to approximate the actual maximum.
Although the exact computation of the power indices is an exponential task, the estimates of the power indices for the purposes of the present problem can be achieved in polynomial time.
The multi-view feature selection problem is formalized as the generalization of the above set-up to the case of multi-variable pseudo-boolean functions.
The multi-view splitting problem is formalized instead as the problem of maximization of a real function defined over the partition lattice. Also this problem is typically NP-hard. However, candidate solutions can be found by suitably partitioning the top power-index features and keeping in different views the pairs of features that are less interactive or negatively interactive.
The sum of the power indices of the participating features can be used to approximate the prediction capability of the view (i.e. they can be used as a proxy for the predicting power). The sum of the feature pair interactivity across views can be used as proxy for the orthogonality of the views.
Also the capability of a view to pass information (to teach) to other views, within a co-training procedure can benefit from the use of power indices based on a suitable definition of information transfer (a set of features { a coalition { classifies examples that are subsequently used in the training of a second set of features).
As to the feature selection task, not only we demonstrate the use of state of the art power index concepts (e.g. Shapley Value and Banzhaf along the 2lines described above Value), but we define new power indices, within the more general class of probabilistic power indices, that contains the Shapley and the Banzhaf Values as special cases. Since the number of features to select is often a predefined parameter of the problem, we also introduce some novel power indices, namely k-Power Index (and its specializations k-Shapley Value, k-Banzhaf Value): they help selecting the features in a more efficient way.
For the feature partitioning, we use the more general class of probabilistic interaction indices that contains the Shapley and Banzhaf Interaction Indices as members.
We also address the problem of evaluating the teaching ability of a view, introducing a suitable teaching capability index.
The last contribution of the present work consists in comparing the Game Theory approach to the classical Greedy Forward Selection approach for feature selection. In the latter the candidate is obtained by aggregating one feature at time to the current maximal coalition, by choosing always the feature with the maximal marginal contribution.
In this case we show that in typical cases the two methods are complementary, and that when used in conjunction they reduce one another error in the estimate of the maximum value.
Moreover, the approach based on game theory has two advantages: it samples the space of all possible features\u2019 subsets, while the greedy algorithm scans a selected subspace excluding totally the rest of it, and it is able, for each feature, to assign a score that describes a context-aware measure of importance in the prediction process
Signal reconstruction by means of Embedding, Clustering and AutoEncoder Ensembles
We study the denoising and reconstruction of corrupted signals by means of AutoEncoder ensembles. In order to guarantee experts' diversity in the ensemble, we apply, prior to learning, a dimensional reduction pass (to map the examples into a suitable Euclidean space) and a partitional clustering pass: each cluster is then used to train a distinct AutoEncoder. We study the approach with an audio file benchmark: the original signals are artificially corrupted by Doppler effect and reverb. The results support the comparative effectiveness of the approach, w.r.t. the approach based on a single AutoEncoder. The processing pipeline using Local Linear Embedding, k means, then k Convolutional Denoising AutoEncoders reduces the reconstruction error by 35% w.r.t. the baseline approach
NK2 homeobox gene cluster: Functions and roles in human diseases
NK2 genes (NKX2 gene cluster in humans) encode for homeodomain-containing transcription factors that are conserved along the phylogeny. According to the most detailed classifications, vertebrate NKX2 genes are classified into two distinct families, NK2.1 and NK2.2. The former is constituted by NKX2-1 and NKX2-4 genes, which are homologous to the Drosophila scro gene; the latter includes NKX2-2 and NKX2-8 genes, which are homologous to the Drosophila vnd gene. Conservation of these genes is not only related to molecular structure and expression, but also to biological functions. In Drosophila and vertebrates, NK2 genes share roles in the development of ventral regions of the central nervous system. In vertebrates, NKX2 genes have a relevant role in the development of several other organs such as the thyroid, lung, and pancreas. Loss-of-function mutations in NKX2-1 and NKX2-2 are the monogenic cause of the brain-lung-thyroid syndrome and neonatal diabetes, respectively. Alterations in NKX2-4 and NKX2-8 genes may play a role in multifactorial diseases, autism spectrum disorder, and neural tube defects, respectively. NKX2-1, NKX2-2, and NKX2-8 are expressed in various cancer types as either oncogenes or tumor suppressor genes. Several data indicate that evaluation of their expression in tumors has diagnostic and/or prognostic value
A morphometric analysis of vegetation patterns in dryland ecosystems
Vegetation in dryland ecosystems often forms remarkable spatial patterns. These range from regular bands of vegetation alternating with bare ground, to vegetated spots and labyrinths, to regular gaps of bare ground within an otherwise continuous expanse of vegetation. It has been suggested that spotted vegetation patterns could indicate that collapse into a bare ground state is imminent, and the morphology of spatial vegetation patterns, therefore, represents a potentially valuable source of information on the proximity of regime shifts in dryland ecosystems. In this paper, we have developed quantitative methods to characterize the morphology of spatial patterns in dryland vegetation. Our approach is based on algorithmic techniques that have been used to classify pollen grains on the basis of textural patterning, and involves constructing feature vectors to quantify the shapes formed by vegetation patterns. We have analysed images of patterned vegetation produced by a computational model and a small set of satellite images from South Kordofan (South Sudan), which illustrates that our methods are applicable to both simulated and real-world data. Our approach provides a means of quantifying patterns that are frequently described using qualitative terminology, and could be used to classify vegetation patterns in large-scale satellite surveys of dryland ecosystems
Il controllo di gestione per l'azienda socialmente responsabile
Viene proposta l'evoluzione degli strumenti del management control per tenere conto del profilo di resposnabilitĂ sociale di impresa
INDIGENOUS SUSTAINABLE FINANCE AND SUSTAINABLE DEVELOPMENT GOALS: CONTRIBUTIONS FROM REDD+ IN BRAZIL
Indigenous sustainable finance has emerged as a promising research field to understand how indigenous communities can address sustainable governance and economic development issues based on their relationship with the land and cultural aspects. Furthermore, the SDGs have offered a development guide for economies worldwide whilst it pushes forward the applied efforts in pursuing a sustainable future based on its 17 principles. Indigenous territories, in this case, can be understood as an essential asset that can contribute to maintaining biodiversity and remunerating communities for preserving forests, with REDD+ projects constituting a vital initiative to encourage compensation processes for economic activities. This study describes a case of Indigenous Sustainable Finance in Brazil using REDD++ and provides linkages to the Sustainable Development Goals Agenda. Results reveal that new parameters that can contribute to REDD+ processes developed by indigenous communities in Brazil should be set, facilitating the organizational strategy, credit access and territory governance status. Implications for sustainable finance are centred on developing successful constellations of stakeholder action towards social good through green, transitional and heritage bonds
The geometric measure of entanglement for a symmetric pure state with positive amplitudes
In this paper for a class of symmetric multiparty pure states we consider a
conjecture related to the geometric measure of entanglement: 'for a symmetric
pure state, the closest product state in terms of the fidelity can be chosen as
a symmetric product state'. We show that this conjecture is true for symmetric
pure states whose amplitudes are all non-negative in a computational basis. The
more general conjecture is still open.Comment: Similar results have been obtained independently and with different
methods by T-C. Wei and S. Severini, see arXiv:0905.0012v
Effect of nonnegativity on estimation errors in one-qubit state tomography with finite data
We analyze the behavior of estimation errors evaluated by two loss functions,
the Hilbert-Schmidt distance and infidelity, in one-qubit state tomography with
finite data. We show numerically that there can be a large gap between the
estimation errors and those predicted by an asymptotic analysis. The origin of
this discrepancy is the existence of the boundary in the state space imposed by
the requirement that density matrices be nonnegative (positive semidefinite).
We derive an explicit form of a function reproducing the behavior of the
estimation errors with high accuracy by introducing two approximations: a
Gaussian approximation of the multinomial distributions of outcomes, and
linearizing the boundary. This function gives us an intuition for the behavior
of the expected losses for finite data sets. We show that this function can be
used to determine the amount of data necessary for the estimation to be treated
reliably with the asymptotic theory. We give an explicit expression for this
amount, which exhibits strong sensitivity to the true quantum state as well as
the choice of measurement.Comment: 9 pages, 4 figures, One figure (FIG. 1) is added to the previous
version, and some typos are correcte
High-sensitivity optical measurement of mechanical Brownian motion
We describe an experiment in which a laser beam is sent into a high-finesse
optical cavity with a mirror coated on a mechanical resonator. We show that the
reflected light is very sensitive to small mirror displacements. We have
observed the Brownian motion of the resonator with a very high sensitivity.Comment: 4 pages, 4 figures, RevTe
- …