3,548 research outputs found

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Modeling order guidelines to improve truckload utilization

    Get PDF
    Thesis (M. Eng. in Logistics)--Massachusetts Institute of Technology, Engineering Systems Division, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 36-37).Freight vehicle capacity, whether it be road, ocean or air transport, is highly underutilized. This under-utilization presents an opportunity for companies to reduce their vehicular traffic and reduce their carbon footprint through greater supply chain integration. This thesis describes the impact of ordering guidelines on the transport efficiency of a large firm and how those guidelines and associated practices can be changed in order to gain better efficiency. To that end, we present three recommendations on improving the guidelines based on the shipment data analysis. First, we discuss the redundancy of one of the company's fill metrics based on a scatter plot analysis and a chi-square independence test. Second, we explore the impact of using linear programming to allocate SKUs to different shipment, highlighting the reduction in the number of shipments through better truck mixing. Finally, we divide the SKUs into three groups: cube-constrained, neutral, and weight-constrained. Based on this segmentation, we present a basic model that mixes different SKUs and helps a shipment to achieve a much higher utilization rate. The application of the last two findings can be further explored to address under-utilization in freight carriers across different industries.by Jaya Banik and Kyle Rinehart.M.Eng.in Logistic

    Post-processing of association rules.

    Get PDF
    In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.

    Marginal Release Under Local Differential Privacy

    Full text link
    Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals

    Ontology Based Statistical Automated Inference - New Approach to Artificial Intelligence

    Get PDF
    Statistical analysis requires understanding the nature of the phenomenon under study, as well as understanding sense of mathematical statistics. Bridging the gap between semantic web based on knowledge representation languages, and concepts described by mathematical formula is a challenge for AI. In order to overcome this gap the ontology language P-ONT (based on directed graph) has been invented. To illustrate the capabilities of the P-ONT language, semantic web (built on the P-ONT ontology) OLAP cube, relational data bases and generalized hierarchical statistical regression models are presented

    Robust Principal Component Analysis for Compositional Tables

    Get PDF
    A data table which is arranged according to two factors can often be considered as a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information would consist of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with a direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (PCA) is performed for dimension reduction, allowing to investigate the relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply robust PCA, which would otherwise suffer from the singularity of clr coefficients.Comment: 20 pages, 2 figure

    Associations of Song Properties with Habitats for Territorial Oscine Birds of Eastern North America

    Get PDF
    To investigate adaptations for long-range acoustic communication in birds, I analyzed associations between broad categories of habitats and properties of territorial songs for eastern North American oscines. From published recordings, I obtained three frequency properties (maximal, minimal, and dominant) and three temporal properties of songs (presence of sidebands, presence of buzzes, minimal period of repeated elements). Sidebands and buzzes indicated rapid amplitude modulation of a carrier frequency. Habitats occupied by territorial males were classified into six categories (broad-leaved or mixed forest, coniferous forest, parkland or forest edge, shrubland, grassland, and marshes). Frequencies in songs correlated strongly with body size, which varied among habitats. Analysis of covariance and phylogenetic regression, after controlling for body size, revealed an association of maximal but not dominant or minimal frequencies with habitat. In contrast, the temporal properties of song were all strongly associated with habitat, even within phylogenetic groupings. These results suggest that the temporal properties of songs of many oscines have evolved to reduce the effects of reverberation in forested habitats. Exceptional species might have retained features of song subject to degradation to permit listeners to judge distances to singers. In addition, adaptations for acoustic communication in different habitats might include differences in the perception of songs
    corecore