200 research outputs found
Robust rank correlation coefficients on the basis of fuzzy
The goal of this paper is to demonstrate that established rank correlation
measures are not ideally suited for measuring rank correlation for numerical
data that are perturbed by noise. We propose to use robust rank correlation
measures based on fuzzy orderings. We demonstrate that the new measures
overcome the robustness problems of existing rank correlation coe cients. As
a rst step, this is accomplished by illustrative examples. The paper closes
with an outlook on future research and applicationsPeer Reviewe
Dynamic data assigning assessment clustering of streaming data
Discovering interesting patterns or substructures in data streams
is an important challenge in data mining. Clustering algorithm are very
often applied to identify substructures, although they are designed to
partition a data set. Another problem of clustering algorithms is that most
of them are not designed for data streams. They assume that the data set to
be analysed is already complete and will not be extended by new data. This
paper discusses an extension of an algorithm that uses ideas from cluster
analysis, but was designed to identify single clusters in large data sets
without the necessity to partition the whole data set into clusters. The new
extended version of this algorithm can applied to stream data and is able to
identify new clusters in an incoming data stream. As a case study weather
data are use
Analysis of contingency tables based on generalised median polish with power transformations and non-additive models
Contingency tables are a very common basis for the investigation of effects of different treatments or influences on a disease or the health state of patients. Many journals put a strong emphasis on p-values to support the validity of results. Therefore, even small contingency tables are analysed by techniques like t-test or ANOVA. Both these concepts are based on normality assumptions for the underlying data. For larger data sets, this assumption is not so critical, since the underlying statistics are based on sums of (independent) random variables which can be assumed to follow approximately a normal distribution, at least for a larger number of summands. But for smaller data sets, the normality assumption can often not be justified.
Robust methods like the Wilcoxon-Mann-Whitney-U test or the Kruskal-Wallis test do not lead to statistically significant p-values for small samples. Median polish is a robust alternative to analyse contingency tables providing much more insight than just a p-value.
Median polish is a technique that provides more information than just a p-value. It explains the contingency table in terms of an overall effect, row and columns effects and residuals. The underlying model for median polish is an additive model which is sometimes too restrictive. In this paper, we propose two related approach to generalise median polish. A power transformation can be applied to the values in the table, so that better results for median polish can be achieved. We propose a graphical method how to find a suitable power transformation. If the original data should be preserved, one can apply other transformations – based on so-called additive generators – that have an inverse transformation. In this way, median polish can be applied to the original data, but based on a non-additive model. The non-linearity of such a model can also be visualised to better understand the joint effects of rows and columns in a contingency table
Generalised Median Polish Based On Additive Generators
Contingency tables often arise from collecting patient data and from lab experiments. A typical question to be answered based on a contingency table is whether the rows or the columns show a significant difference. Median Polish (MP) is fast becoming a prefered way to analyse contingency tables based on a simple additive model. Often, the data need to be transformed before applying the MP algorithm to get better results. A common transformation is the logarithm which essentially changes the underlying model to a multiplicative model. In this work, we propose a novel way of applying the MP algorithm with generalised transformations that still gives reasonable results. Our approach to the underlying model leads us to transformations that are similar to additive generators of some fuzzy logic connectives. We illustrate how to choose the best transformation that give meaningful results by proposing some modified additive generators of uninorms. In this way, MP is generalied from the simple additive model to more general nonlinear connectives. The recently proposedway of identifying a suitable power transformation based on IQRoQ plots [3] also plays a central role in this wor
Clustering of nonstationary data streams: a survey of fuzzy partitional methods
YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca
On the Nodal Count Statistics for Separable Systems in any Dimension
We consider the statistics of the number of nodal domains aka nodal counts
for eigenfunctions of separable wave equations in arbitrary dimension. We give
an explicit expression for the limiting distribution of normalised nodal counts
and analyse some of its universal properties. Our results are illustrated by
detailed discussion of simple examples and numerical nodal count distributions.Comment: 21 pages, 4 figure
MyChemise: A 2D drawing program that uses morphing for visualisation purposes
MyChemise (My Chemical Structure Editor) is a new 2D structure editor. It is designed as a Java applet that enables the direct creation of structures in the Internet using a web browser. MyChemise saves files in a digital format (.cse) and the import and export of .mol files using the appropriate connection tables is also possible
Distinct gene loci control the host response to influenza H1N1 virus infection in a time-dependent manner
<p>Abstract</p> <p>Background</p> <p>There is strong but mostly circumstantial evidence that genetic factors modulate the severity of influenza infection in humans. Using genetically diverse but fully inbred strains of mice it has been shown that host sequence variants have a strong influence on the severity of influenza A disease progression. In particular, C57BL/6J, the most widely used mouse strain in biomedical research, is comparatively resistant. In contrast, DBA/2J is highly susceptible.</p> <p>Results</p> <p>To map regions of the genome responsible for differences in influenza susceptibility, we infected a family of 53 BXD-type lines derived from a cross between C57BL/6J and DBA/2J strains with influenza A virus (PR8, H1N1). We monitored body weight, survival, and mean time to death for 13 days after infection. <it>Qivr5</it> (quantitative trait for influenza virus resistance on chromosome 5) was the largest and most significant QTL for weight loss. The effect of <it>Qivr5</it> was detectable on day 2 post infection, but was most pronounced on days 5 and 6. Survival rate mapped to <it>Qivr5</it>, but additionally revealed a second significant locus on chromosome 19 (<it>Qivr19</it>). Analysis of mean time to death affirmed both <it>Qivr5</it> and <it>Qivr19</it>. In addition, we observed several regions of the genome with suggestive linkage. There are potentially complex combinatorial interactions of the parental alleles among loci. Analysis of multiple gene expression data sets and sequence variants in these strains highlights about 30 strong candidate genes across all loci that may control influenza A susceptibility and resistance.</p> <p>Conclusions</p> <p>We have mapped influenza susceptibility loci to chromosomes 2, 5, 16, 17, and 19. Body weight and survival loci have a time-dependent profile that presumably reflects the temporal dynamic of the response to infection. We highlight candidate genes in the respective intervals and review their possible biological function during infection.</p
A new urban freight distribution scheme and an optimization methodology for reducing its overall cost
The paper refers to an innovative urban freight distribution scheme, aimed at reducing the externalities connected with the freight delivery process. Both packages destined to commercial activities and to end consumers (e-commerce) are taken into account. Each package is characterized by an address and dimensions. In the proposed transport system, freight is firstly delivered to the UDC on the border of urban areas through trucks or trains which perform the long distance transport. After, freight is reorganized and consolidated into load units, i.e. the FURBOT boxes, according to packages dimensions and to the addresses of receivers. Each box is addressed to a temporary unloading bay and it is delivered there by a FURBOT vehicle. The receivers are in charge of collecting their packages in the related unloading bays where they have been delivered. The paper concerns a methodology for optimizing this freight transport system's performances. The overall methodology receives in input the actual freight demand and the road network, and finds the transport system parameters (number of required FURBOT boxes, their temporary unloading bay, the FURBOT fleet dimension and the FURBOT vehicle routing) that minimize the system overall cost. The overall cost is a sum of the users' cost, which depends on the distance they have to walk for collecting their packages in the FURBOT box, and of the operator's cost, which depends on the number of required boxes, the total distance travelled by the FURBOT vehicles and the required number of FURBOT vehicles. The overall procedure has been applied to the case study of Barreiro old town, a suburb of Lisbon, Portugal
- …