31 research outputs found
Towards cluster based adaptive learning
Accepted manuscrip
Geometry of deviation measures for triangular distributions
Triangular distributions are widely used in many applications with limited sample data, business simulations, and project management. As with other distributions, a standard way to measure deviations is to compute the standard deviation. However, the standard deviation is sensitive to outliers. In this paper, we consider and compare other deviation metrics, namely the mean absolute deviation from the mean, the median, and the quantile-based deviation. We show the simple geometric interpretations for these deviation measures and how to construct them using a compass and a straightedge. The explicit formula of mean absolute deviation from the median for triangular distribution is derived in this paper for the first time. It has a simple geometric interpretation. It is the least volatile and is always better than the standard or mean absolute deviation from the mean. Although greater than the quantile deviation, it is easier to compute with limited sample data. We present a new procedure to estimate the parameters of this distribution in terms of this deviation. This procedure is computationally simple and may be superior to other methods when dealing with limited sample data, as is often the case with triangle distributions.Published versio
Recommended from our members
A Statistical Mechanics of Some Interconnection Networks
Despite intensive research on distributed processor interconnection architectures, relatively little work has been done on the performance analysis of such systems. The reason for this, besides the complexity of the behavior of such systems, is that Queueing Theory cannot easily handle systems consisting of many tightly interacting components. An alternate approach, based upon statistical mechanics, is used. We analyze interconnection structures such as crossbar, linear array, binary tree and ring
A simple rotation strategy with sector ETFs
Accepted manuscrip
Recommended from our members
Canonical approximation in the performance analysis of distributed systems
The problem of analyzing distributed systems arises in many areas of computer science, such as communication networks, distributed databases, packet radio networks, VLSI communications and switching mechanisms. Analysis of distributed systems is difficult since one must deal with many tightly-interacting components. The number of possible state configurations typically grows exponentially with the system size, making the exact analysis intractable even for relatively small systems. For the stochastic models of these systems, whose steady-state probability is of the product form, many global performance measures of interest can be computed once one knows the normalization constant of the steady-state probability distribution. This constant, called the system partition function, is typically difficult to derive in closed form. The key difficulty in performance analysis of such models can be viewed as trying to derive a good approximation to the partition function or calculate it numerically. In this Ph.D. work we introduce a new approximation technique to analyze a variety of such models of distributed systems. This technique, which we call the method of Canonical Approximation, is similar to that developed in statistical physics to compute the partition function. The new method gives a closed-form approximation of the partition function and of the global performance measures. It is computationally simple with complexity independent of the system size, gives an excellent degree of precision for large systems, and is applicable to a wide variety of problems. The method is applied to the analysis of multihop packet radio networks, locking schemes in database systems, closed queueing networks, and interconnection networks
Teaching data science by history: Kepler's laws of planetary motion and generalized linear models
Teaching data science is challenging: it is a multidisciplinary subject that requires solid mathematical background. There are many models and approaches to consider. It is important, in our view, to present a
unified approach to teaching this subject. We believe that one of the most e ective ways to do so is to present historical examples. An interesting historical example that explains Generalized Linear Models in prediction is the quest by the German astronomer,
Johann Kepler, at the beginning of the 17-th century to find a unifying law explaining the motion of the planets in our Solar system.Accepted manuscrip
Mathematical foundation for ensemble machine learning and ensemble portfolio analysis
Accepted manuscrip
DNA methylation meta-analysis confirms the division of the genome into two functional groups
Based on a meta-analysis of human genome methylation data, we tested a theoretical model in which aging is
explained by the redistribution of limited resources in cells between two main tasks of the organism: its selfsustenance
based on the function of the Housekeeping Gene Group (HG) and functional differentiation, provided
by the (IntG) integrative gene group. A meta-analysis of methylation of 100 genes, 50 in the HG group and 50 in
IntG, showed significant differences (p<0.0001) between our groups in the level of absolute methylation values of
genes bodies and its promoters. We showed a reliable decrease of absolute methylation values in IntG with rising age
in contrast to HG, where this level remained constant. The one-sided decrease in methylation in the IntG group is
indirectly confirmed by the dispersion data analysis, which also decreased in the genes of this group. The imbalance
between HG and IntG in methylation levels suggests that this IntG-shift is a side effect of the ontogenesis grownup
program and the main cause of aging. The theoretical model of functional genome division also suggests the leading
role of slow dividing and post mitotic cells in triggering and implementing the aging process.Published versio
MAD (about median) vs. quantile-based alternatives for classical standard deviation, skewness, and kurtosis
In classical probability and statistics, one computes many measures of interest from mean and standard deviation. However, mean, and especially standard deviation, are overly sensitive to outliers. One way to address this sensitivity is by considering alternative metrics for deviation, skewness, and kurtosis using mean absolute deviations from the median (MAD). We show that the proposed measures can be computed in terms of the sub-means of the appropriate left and right sub-ranges. They can be interpreted in terms of average distances of values of these sub-ranges from their respective medians. We emphasize that these measures utilize only the first-order moment within each sub-range and, in addition, are invariant to translation or scaling. The obtained formulas are similar to the quantile measures of deviation, skewness, and kurtosis but involve computing sub-means as opposed to quantiles. While the classical skewness can be unbounded, both the MAD-based and quantile skewness always lies in the range [−1, 1]. In addition, while both the classical kurtosis and quantile-based kurtosis can be unbounded, the proposed MAD-based alternative for kurtosis lies in the range [0, 1]. We present a detailed comparison of MAD-based, quantile-based, and classical metrics for the six well-known theoretical distributions considered. We illustrate the practical utility of MAD-based metrics by considering the theoretical properties of the Pareto distribution with high concentrations of density in the upper tail, as might apply to the analysis of wealth and income. In summary, the proposed MAD-based alternatives provide a universal scale to compare deviation, skewness, and kurtosis across different distributions
A clustering-based approach to automatic harmonic analysis: an exploratory study of harmony and form in Mozart’s piano sonatas
We implement a novel approach to automatic harmonic analysis using a clustering method on pitch-class vectors (chroma vectors). The advantage of this method is its lack of top-down assumptions, allowing us to objectively validate the basic music theory premise of a chord lexicon consisting of triads and seventh chords, which is presumed by most research in automatic harmonic analysis. We use the discrete Fourier transform and hierarchical clustering to analyse features of the clustering solutions and illustrate associations between the features and the distribution of clusters over sections of the sonata forms. We also analyse the transition matrix, recovering elements of harmonic function theory.Published versio