2,591 research outputs found
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Modeling order guidelines to improve truckload utilization
Thesis (M. Eng. in Logistics)--Massachusetts Institute of Technology, Engineering Systems Division, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 36-37).Freight vehicle capacity, whether it be road, ocean or air transport, is highly underutilized. This under-utilization presents an opportunity for companies to reduce their vehicular traffic and reduce their carbon footprint through greater supply chain integration. This thesis describes the impact of ordering guidelines on the transport efficiency of a large firm and how those guidelines and associated practices can be changed in order to gain better efficiency. To that end, we present three recommendations on improving the guidelines based on the shipment data analysis. First, we discuss the redundancy of one of the company's fill metrics based on a scatter plot analysis and a chi-square independence test. Second, we explore the impact of using linear programming to allocate SKUs to different shipment, highlighting the reduction in the number of shipments through better truck mixing. Finally, we divide the SKUs into three groups: cube-constrained, neutral, and weight-constrained. Based on this segmentation, we present a basic model that mixes different SKUs and helps a shipment to achieve a much higher utilization rate. The application of the last two findings can be further explored to address under-utilization in freight carriers across different industries.by Jaya Banik and Kyle Rinehart.M.Eng.in Logistic
Combinatorics and Geometry of Transportation Polytopes: An Update
A transportation polytope consists of all multidimensional arrays or tables
of non-negative real numbers that satisfy certain sum conditions on subsets of
the entries. They arise naturally in optimization and statistics, and also have
interest for discrete mathematics because permutation matrices, latin squares,
and magic squares appear naturally as lattice points of these polytopes.
In this paper we survey advances on the understanding of the combinatorics
and geometry of these polyhedra and include some recent unpublished results on
the diameter of graphs of these polytopes. In particular, this is a thirty-year
update on the status of a list of open questions last visited in the 1984 book
by Yemelichev, Kovalev and Kravtsov and the 1986 survey paper of Vlach.Comment: 35 pages, 13 figure
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Robust Principal Component Analysis for Compositional Tables
A data table which is arranged according to two factors can often be
considered as a compositional table. An example is the number of unemployed
people, split according to gender and age classes. Analyzed as compositions,
the relevant information would consist of ratios between different cells of
such a table. This is particularly useful when analyzing several compositional
tables jointly, where the absolute numbers are in very different ranges, e.g.
if unemployment data are considered from different countries. Within the
framework of the logratio methodology, compositional tables can be decomposed
into independent and interactive parts, and orthonormal coordinates can be
assigned to these parts. However, these coordinates usually require some prior
knowledge about the data, and they are not easy to handle for exploring the
relationships between the given factors.
Here we propose a special choice of coordinates with a direct relation to
centered logratio (clr) coefficients, which are particularly useful for an
interpretation in terms of the original cells of the tables. With these
coordinates, robust principal component analysis (PCA) is performed for
dimension reduction, allowing to investigate the relationships between the
factors. The link between orthonormal coordinates and clr coefficients enables
to apply robust PCA, which would otherwise suffer from the singularity of clr
coefficients.Comment: 20 pages, 2 figure
Worker policing by egg eating in the ponerine ant Pachycondyla inversa
We investigated worker policing by egg eating in the ponerine ant Pachycondyla inversa, a species with
morphologically distinct queens and workers. Colonies were split into one half with the queen and one
half without. Workers in queenless colony fragments started laying unfertilized male eggs after three weeks.
Worker-laid eggs and queen-laid eggs were introduced into five other queenright colonies with a single
queen and three colonies with multiple queens, and their fate was observed for 30 min. Significantly more
worker-laid eggs (range of 35–62%, mean of 46%) than queen-laid eggs (range of 5–31%, mean of 15%)
were eaten by workers in single-queen colonies, and the same trend was seen in multiple-queen colonies.
This seems to be the first well-documented study of ants with a distinct caste polymorphism to show that
workers kill worker-laid eggs in preference to queen-laid eggs. Chemical analyses showed that the surfaces
of queen-laid and worker-laid eggs have different chemical profiles as a result of different relative proportions
of several hydrocarbons. Such differences might provide the information necessary for differential
treatment of eggs. One particular alkane, 3,11-dimeC27, was significantly more abundant on the surfaces
of queen-laid eggs. This substance is also the most abundant compound on the cuticles of egg layers
- …