Search CORE

3,548 research outputs found

BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

Author: Ari Eszter
Horváth Arnold
Ittzés Péter
Jakó Éena
Podani János
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

Crossref

Repository of the Academy's Library

Modeling order guidelines to improve truckload utilization

Author: Banik Jaya
Rinehart Kyle
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (M. Eng. in Logistics)--Massachusetts Institute of Technology, Engineering Systems Division, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 36-37).Freight vehicle capacity, whether it be road, ocean or air transport, is highly underutilized. This under-utilization presents an opportunity for companies to reduce their vehicular traffic and reduce their carbon footprint through greater supply chain integration. This thesis describes the impact of ordering guidelines on the transport efficiency of a large firm and how those guidelines and associated practices can be changed in order to gain better efficiency. To that end, we present three recommendations on improving the guidelines based on the shipment data analysis. First, we discuss the redundancy of one of the company's fill metrics based on a scatter plot analysis and a chi-square independence test. Second, we explore the impact of using linear programming to allocate SKUs to different shipment, highlighting the reduction in the number of shipments through better truck mixing. Finally, we divide the SKUs into three groups: cube-constrained, neutral, and weight-constrained. Based on this segmentation, we present a basic model that mixes different SKUs and helps a shipment to achieve a much higher utilization rate. The application of the last two findings can be further explored to address under-utilization in freight carriers across different industries.by Jaya Banik and Kyle Rinehart.M.Eng.in Logistic

DSpace@MIT

Post-processing of association rules.

Author: Baesens Bart
Vanthienen Jan
Viaene Stijn
Publication venue
Publication date
Field of study

In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.

Research Papers in Economics

Marginal Release Under Local Differential Privacy

Author: Bassily R.
Chaudhuri A.
Ding B.
Hardt M.
Jurafsky D.
Kairouz P.
Leen T. K.
Narayan A.
Wang T.
Publication venue
Publication date: 08/11/2017
Field of study

Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals

arXiv.org e-Print Archive

Crossref

Ontology Based Statistical Automated Inference - New Approach to Artificial Intelligence

Author: Borkowski Wlodzimierz
Mielniczuk Hanna
Publication venue: 'Lifescience Global'
Publication date: 20/12/2012
Field of study

Statistical analysis requires understanding the nature of the phenomenon under study, as well as understanding sense of mathematical statistics. Bridging the gap between semantic web based on knowledge representation languages, and concepts described by mathematical formula is a challenge for AI. In order to overcome this gap the ontology language P-ONT (based on directed graph) has been invented. To illustrate the capabilities of the P-ONT language, semantic web (built on the P-ONT ontology) OLAP cube, relational data bases and generalized hierarchical statistical regression models are presented

Publication Management System

Robust Principal Component Analysis for Compositional Tables

Author: Rendlová Julie
Hron Karel
Fačevicová Kamila
Filzmoser Peter
Publication venue
Publication date: 01/01/1993
Field of study

A data table which is arranged according to two factors can often be considered as a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information would consist of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with a direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (PCA) is performed for dimension reduction, allowing to investigate the relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply robust PCA, which would otherwise suffer from the singularity of clr coefficients.Comment: 20 pages, 2 figure

arXiv.org e-Print Archive

Yale University

Associations of Song Properties with Habitats for Territorial Oscine Birds of Eastern North America

Author: Wiley R. Haven
Publication venue
Publication date: 01/01/1991
Field of study

To investigate adaptations for long-range acoustic communication in birds, I analyzed associations between broad categories of habitats and properties of territorial songs for eastern North American oscines. From published recordings, I obtained three frequency properties (maximal, minimal, and dominant) and three temporal properties of songs (presence of sidebands, presence of buzzes, minimal period of repeated elements). Sidebands and buzzes indicated rapid amplitude modulation of a carrier frequency. Habitats occupied by territorial males were classified into six categories (broad-leaved or mixed forest, coniferous forest, parkland or forest edge, shrubland, grassland, and marshes). Frequencies in songs correlated strongly with body size, which varied among habitats. Analysis of covariance and phylogenetic regression, after controlling for body size, revealed an association of maximal but not dominant or minimal frequencies with habitat. In contrast, the temporal properties of song were all strongly associated with habitat, even within phylogenetic groupings. These results suggest that the temporal properties of songs of many oscines have evolved to reduce the effects of reverberation in forested habitats. Exceptional species might have retained features of song subject to degradation to permit listeners to judge distances to singers. In addition, adaptations for acoustic communication in different habitats might include differences in the perception of songs

Carolina Digital Repository