33 research outputs found
a bound for the accuracy of sensors acquiring compositional data
AbstractAmong the many challenges that the Internet of Things poses, the accuracy of the sensor network and relative data flow is of the foremost importance: sensors monitor the surrounding environment of an object and give information on its position, situation or context, and an error in the acquired data can lead to inappropriate decisions and uncontrolled consequences. Given a sensor network that gathers relative data – that is data for which ratios of parts are more important than absolute values – acquired data have a compositional nature and all values need to be scaled. To analyze these data a common practice is to map bijectively compositions into the ordinary euclidean space through a suitable transformation, so that standard multivariate analysis techniques can be used. In this paper an error bound on the commonly used asymmetric log-ratio transformation is found in the Simplex. The purpose is to highlight areas of the Simplex where the transformation is ill conditioned and to isolate values for which the additive log-ratio transform cannot be accurately computed. Results show that the conditioning of the transformation is strongly affected by the closeness of the transformed values and that not negligible distortions can be generated due to the unbounded propagation of the errors. An explicit formula for the accuracy of the sensors given the maximum allowed tolerance has been derived, and the critical values in the Simplex where the transformation is component-wise ill conditioned have been isolated
Triadic Motifs in the Partitioned World Trade Web
AbstractOne of the crucial aspects of the Internet of Things that influences the effectiveness of communication among devices is the communication model, for which no universal solution exists. The actual interaction pattern can in general be represented as a directed graph, whose nodes represent the "Things" and whose directed edges represent the sent messages. Frequent patterns can identify channels or infrastructures to be strengthened and can help in choosing the most suitable message routing schema or network protocol. In general, frequent patterns have been called motifs and overrepresented motifs have been recognized to be the low-level building blocks of networks and to be useful to explain many of their properties, playing a relevant role in determining their dynamic and evolution. In this paper triadic motifs are found first partitioning a network by strength of connections and then analyzing the partitions separately. The case study is the World Trade Web (WTW), that is the directed graph connecting world Countries with trade relationships, with the aim of finding its topological characterization in terms of motifs and isolating the key factors underlying its evolution. The WTW has been split based on the weights of the graph to highlight structural differences between the big players in terms of volumes of trade and the rest of the world. As test case, the period 2003-2010 has been analyzed, to show the structural effect of the economical crisis in the year 2007
record linkage of banks and municipalities through multiple criteria and neural networks
Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors
Numerical Stability Analysis of the Centered Log-Ratio Transformation
Data have a compositional nature when the information content to be extracted and analyzed is conveyed into the ratio of parts, instead of the absolute amount. When the data are compositional, they need to be scaled so that subsequent analysis are scale-invariant, and geometrically this means to force them into the open Simplex. A common practice to analyze compositional data is to map bijectively compositions into the ordinary euclidean space through a suitable transformation, so that standard multivariate analysis techniques can be used. In this paper, the stability analysis of the Centered Log-Ratio (clr) transformation is performed. The purpose is to isolate areas of the Simplex where the clr transformation is ill conditioned and to highlight values for which the clr transformation cannot be accurately computed. Results show that the mapping accuracy is strongly affected by the closeness of the values to their geometric mean, and that in the worst case the clr can amplify the errors by an unbounded factor
Decoy clustering through graded possibilistic c-medoids
Modern methods for ab initio prediction of protein structures
typically explore multiple simulated conformations, called decoys, to
find the best native-like conformations. To limit the search space,
clustering algorithms are routinely used to group similar decoys, based
on the hypothesis that the largest group of similar decoys will be the
closest to the native state. In this paper a novel clustering algorithm,
called Graded Possibilistic c-medoids, is proposed and applied to a
decoy selection problem. As it will be shown, the added flexibility of
the graded possibilistic framework allows an effective selection of the
best decoys with respect to similar methods based on medoids - that is
on the most central points belonging to each cluster. The proposed
algorithm has been compared with other c-medoids algorithms and also
with SPICKER on real data, the large majority of times outperforming
both
Integrating rough set principles in the graded possibilistic clustering
Applied to fuzzy clustering, the graded possibilistic model allows the soft transition from probabilistic to possibilistic memberships, constraining the memberships in a region that is narrower the closer to probabilistic the memberships are. The integration of rough sets principles in the graded possibilistic clustering aims to improve the flexibility and the performance of the graded possibilistic model, providing a further option for uncertainty modeling. Through the novel concept of the Rough Feasible Region, the proposed approach differentiates the projection of memberships in the core and in the boundary of each cluster, exploiting the indiscernibility relation typical of rough sets and allowing a more robust and efficient estimation of centroids. Tests on real data confirm its viability
A.: Asymmetric Kernel Scaling for Imbalanced Data Classification
Abstract. Many critical application domains present issues related to imbalanced learning -classification from imbalanced data. Using conventional techniques produces biased results, as the over-represented class dominates the learning process and tend to naturally attract predictions. As a consequence, the false negative rate may result unacceptable and the chosen classifier unusable. We propose a classification procedure based on Support Vector Machine able to effectively cope with data imbalance. Using a first step approximate solution and then a suitable kernel transformation, we enlarge asymmetrically space around the class boundary, compensating data skewness. Results show that while in case of moderate imbalance the performances are comparable to standard SVM, in case of heavily skewed data the proposed approach outperforms its competitors
A resampling strategy based on bootstrap to reduce the effect of large blunders in GPS absolute positioning
In the absence of obstacles, a GPS device is generally able to provide continuous and accurate estimates of position, while in urban scenarios buildings can generate multipath and echo-only phenomena that severely affect the continuity and the accuracy of the provided estimates. Receiver autonomous integrity monitoring (RAIM) techniques are able to reduce the negative consequences of large blunders in urban scenarios, but require both a good redundancy and a low contamination to be effective. In this paper a resampling strategy based on bootstrap is proposed as an alternative to RAIM, in order to estimate accurately position in case of low redundancy and multiple blunders: starting with the pseudorange measurement model, at each epoch the available measurements are bootstrapped---that is random sampled with replacement---and the generated a posteriori empirical distribution is exploited to derive the final position. Compared to standard bootstrap, in this paper the sampling probabilities are not uniform, but vary according to an indicator of the measurement quality. The proposed method has been compared with two different RAIM techniques on a data set collected in critical conditions, resulting in a clear improvement on all considered figures of merit
Decoy Meta–Clustering Through Rough Graded Possibilistic C-Medoids
Current ab initio methods for structure–prediction of proteins explore multiple simulated conformations, called de- coys, to generate families of folds, one of which is the closest to the native one. To limit the exploration of the conformational space, clustering algorithms are routinely applied to group similar decoys and then finding the most plausible cluster centroid, based on the hypothesis that there are more low–energy conformations surrounding the native fold than the others; nevertheless different clustering algorithms, or different parameters, are likely to output different partitions of the input data and choosing only one of the possible solutions can be too restrictive and unreliable. meta–clustering algorithms allow to reconcile multiple clustering solutions by grouping them into meta-clusters (i.e. clusters of clusterings), so that similar partitions are grouped in the same meta–cluster. In this paper the use of meta–clustering is proposed for the selection of lowest energy decoys, testing the Rough Graded Possibilistic c-medoids clustering algorithm for both baseline clustering and meta–clustering. Preliminary tests on real data suggest that meta–clustering is effective in reducing the sensitivity to parameters of the clustering algorithm and to expand the explored space