2,591 research outputs found

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

    Modeling order guidelines to improve truckload utilization

    Get PDF
    Thesis (M. Eng. in Logistics)--Massachusetts Institute of Technology, Engineering Systems Division, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 36-37).Freight vehicle capacity, whether it be road, ocean or air transport, is highly underutilized. This under-utilization presents an opportunity for companies to reduce their vehicular traffic and reduce their carbon footprint through greater supply chain integration. This thesis describes the impact of ordering guidelines on the transport efficiency of a large firm and how those guidelines and associated practices can be changed in order to gain better efficiency. To that end, we present three recommendations on improving the guidelines based on the shipment data analysis. First, we discuss the redundancy of one of the company's fill metrics based on a scatter plot analysis and a chi-square independence test. Second, we explore the impact of using linear programming to allocate SKUs to different shipment, highlighting the reduction in the number of shipments through better truck mixing. Finally, we divide the SKUs into three groups: cube-constrained, neutral, and weight-constrained. Based on this segmentation, we present a basic model that mixes different SKUs and helps a shipment to achieve a much higher utilization rate. The application of the last two findings can be further explored to address under-utilization in freight carriers across different industries.by Jaya Banik and Kyle Rinehart.M.Eng.in Logistic

    Combinatorics and Geometry of Transportation Polytopes: An Update

    Full text link
    A transportation polytope consists of all multidimensional arrays or tables of non-negative real numbers that satisfy certain sum conditions on subsets of the entries. They arise naturally in optimization and statistics, and also have interest for discrete mathematics because permutation matrices, latin squares, and magic squares appear naturally as lattice points of these polytopes. In this paper we survey advances on the understanding of the combinatorics and geometry of these polyhedra and include some recent unpublished results on the diameter of graphs of these polytopes. In particular, this is a thirty-year update on the status of a list of open questions last visited in the 1984 book by Yemelichev, Kovalev and Kravtsov and the 1986 survey paper of Vlach.Comment: 35 pages, 13 figure

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Robust Principal Component Analysis for Compositional Tables

    Get PDF
    A data table which is arranged according to two factors can often be considered as a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information would consist of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with a direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (PCA) is performed for dimension reduction, allowing to investigate the relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply robust PCA, which would otherwise suffer from the singularity of clr coefficients.Comment: 20 pages, 2 figure

    Worker policing by egg eating in the ponerine ant Pachycondyla inversa

    Get PDF
    We investigated worker policing by egg eating in the ponerine ant Pachycondyla inversa, a species with morphologically distinct queens and workers. Colonies were split into one half with the queen and one half without. Workers in queenless colony fragments started laying unfertilized male eggs after three weeks. Worker-laid eggs and queen-laid eggs were introduced into five other queenright colonies with a single queen and three colonies with multiple queens, and their fate was observed for 30 min. Significantly more worker-laid eggs (range of 35–62%, mean of 46%) than queen-laid eggs (range of 5–31%, mean of 15%) were eaten by workers in single-queen colonies, and the same trend was seen in multiple-queen colonies. This seems to be the first well-documented study of ants with a distinct caste polymorphism to show that workers kill worker-laid eggs in preference to queen-laid eggs. Chemical analyses showed that the surfaces of queen-laid and worker-laid eggs have different chemical profiles as a result of different relative proportions of several hydrocarbons. Such differences might provide the information necessary for differential treatment of eggs. One particular alkane, 3,11-dimeC27, was significantly more abundant on the surfaces of queen-laid eggs. This substance is also the most abundant compound on the cuticles of egg layers
    corecore