Search CORE

226 research outputs found

The Data Science Design Manual

Author: Steven S. Skiena
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 22/04/2020
Field of study

Open Library

Recommended from our members

On the effectiveness of run-time checks

Author: B. Meyer
H. Wasserman
M.J.P. Meulen van der
N.G. Leveson
P.A. Lee
S. Skiena
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Run-time checks are often assumed to be a cost-effective way of improving the dependability of software components, by checking required properties of their outputs and flagging an output as incorrect if it fails the check. However, evaluating how effective they are going to be in a future application is difficult, since the effectiveness of a check depends on the unknown faults of the program to which it is applied. A programming contest, providing thousands of programs written to the same specifications, gives us the opportunity to systematically test run-time checks to observe statistics of their effects on actual programs. In these examples, run-time checks turn out to be most effective for unreliable programs. For more reliable programs, the benefit is relatively low as compared to the gain that can be achieved by other (more expensive) measures, most notably multiple-version diversity

City Research Online

Crossref

Numerical Investigation of Graph Spectra and Information Interpretability of Eigenvalues

Author: A.N. Kolmogorov
D.J. Watts
G.J. Chaitin
I. Farkasa
J.-P. Delahaye
L.A. Levin
P. Erdös
S. Skiena
Publication venue
Publication date: 01/01/2015
Field of study

We undertake an extensive numerical investigation of the graph spectra of thousands regular graphs, a set of random Erd\"os-R\'enyi graphs, the two most popular types of complex networks and an evolving genetic network by using novel conceptual and experimental tools. Our objective in so doing is to contribute to an understanding of the meaning of the Eigenvalues of a graph relative to its topological and information-theoretic properties. We introduce a technique for identifying the most informative Eigenvalues of evolving networks by comparing graph spectra behavior to their algorithmic complexity. We suggest that extending techniques can be used to further investigate the behavior of evolving biological networks. In the extended version of this paper we apply these techniques to seven tissue specific regulatory networks as static example and network of a na\"ive pluripotent immune cell in the process of differentiating towards a Th17 cell as evolving example, finding the most and least informative Eigenvalues at every stage.Comment: Forthcoming in 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Lecture Notes in Bioinformatics, 201

arXiv.org e-Print Archive

Crossref

When Can You Fold a Map?

Author: Arkin Esther M.
Bender Michael A.
Demaine Erik D.
Demaine Martin L.
Mitchell Joseph S. B.
Sethia Saurabh
Skiena Steven S.
Publication venue
Publication date: 30/08/2003
Field of study

We explore the following problem: given a collection of creases on a piece of paper, each assigned a folding direction of mountain or valley, is there a flat folding by a sequence of simple folds? There are several models of simple folds; the simplest one-layer simple fold rotates a portion of paper about a crease in the paper by +-180 degrees. We first consider the analogous questions in one dimension lower -- bending a segment into a flat object -- which lead to interesting problems on strings. We develop efficient algorithms for the recognition of simply foldable 1D crease patterns, and reconstruction of a sequence of simple folds. Indeed, we prove that a 1D crease pattern is flat-foldable by any means precisely if it is by a sequence of one-layer simple folds. Next we explore simple foldability in two dimensions, and find a surprising contrast: ``map'' folding and variants are polynomial, but slight generalizations are NP-complete. Specifically, we develop a linear-time algorithm for deciding foldability of an orthogonal crease pattern on a rectangular piece of paper, and prove that it is (weakly) NP-complete to decide foldability of (1) an orthogonal crease pattern on a orthogonal piece of paper, (2) a crease pattern of axis-parallel and diagonal (45-degree) creases on a square piece of paper, and (3) crease patterns without a mountain/valley assignment.Comment: 24 pages, 19 figures. Version 3 includes several improvements thanks to referees, including formal definitions of simple folds, more figures, table summarizing results, new open problems, and additional reference

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Analysis of airplane boarding via space-time geometry and random matrix theory

Author: D Berend
Deuschel J D
E Bachmat
L Sapir
Mallows C L
Marelli S
Mehta M L
N Stolyarov
S Skiena
Vershik A
Publication venue: 'IOP Publishing'
Publication date: 05/12/2005
Field of study

We show that airplane boarding can be asymptotically modeled by 2-dimensional Lorentzian geometry. Boarding time is given by the maximal proper time among curves in the model. Discrepancies between the model and simulation results are closely related to random matrix theory. We then show how such models can be used to explain why some commonly practiced airline boarding policies are ineffective and even detrimental.Comment: 4 page

arXiv.org e-Print Archive

Crossref

CERN Document Server

Generating Abstractive Summaries from Meeting Transcripts

Author: Filippova K.
Garg N.
Hsueh P.-Y.
Lin C.-Y.
Mehdad Y.
Murray G.
Rose T.
Roth D.
Skiena S.
Wang L.
Xie S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/09/2016
Field of study

Summaries of meetings are very important as they convey the essential content of discussions in a concise form. Generally, it is time consuming to read and understand the whole documents. Therefore, summaries play an important role as the readers are interested in only the important context of discussions. In this work, we address the task of meeting document summarization. Automatic summarization systems on meeting conversations developed so far have been primarily extractive, resulting in unacceptable summaries that are hard to read. The extracted utterances contain disfluencies that affect the quality of the extractive summaries. To make summaries much more readable, we propose an approach to generating abstractive summaries by fusing important content from several utterances. We first separate meeting transcripts into various topic segments, and then identify the important utterances in each segment using a supervised learning approach. The important utterances are then combined together to generate a one-sentence summary. In the text generation step, the dependency parses of the utterances in each segment are combined together to create a directed graph. The most informative and well-formed sub-graph obtained by integer linear programming (ILP) is selected to generate a one-sentence summary for each topic segment. The ILP formulation reduces disfluencies by leveraging grammatical relations that are more prominent in non-conversational style of text, and therefore generates summaries that is comparable to human-written abstractive summaries. Experimental results show that our method can generate more informative summaries than the baselines. In addition, readability assessments by human judges as well as log-likelihood estimates obtained from the dependency parser show that our generated summaries are significantly readable and well-formed.Comment: 10 pages, Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng' 201

arXiv.org e-Print Archive

Crossref

The Lazy Bureaucrat Scheduling Problem

Author: Arkin
Barvinok
Esther M. Arkin
Garey
Goemans
Hassin
Hepner
Joseph S.B. Mitchell
Karger
Karger
Keneally
Lawler
Lawler
Michael A. Bender
Pinedo
Steven S. Skiena
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

We introduce a new class of scheduling problems in which the optimization is performed by the worker (single ``machine'') who performs the tasks. A typical worker's objective is to minimize the amount of work he does (he is ``lazy''), or more generally, to schedule as inefficiently (in some sense) as possible. The worker is subject to the constraint that he must be busy when there is work that he can do; we make this notion precise both in the preemptive and nonpreemptive settings. The resulting class of ``perverse'' scheduling problems, which we denote ``Lazy Bureaucrat Problems,'' gives rise to a rich set of new questions that explore the distinction between maximization and minimization in computing optimal schedules.Comment: 19 pages, 2 figures, Latex. To appear, Information and Computatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Elsevier - Publisher Connector

Optimal Paths in Complex Networks with Correlated Weights: The World-wide Airport Network

Author: H. Eugene Stanley
L. Dall'Asta
L. Dall'Asta
Lidia A. Braunstein
M. Molloy
R. K. Ahuja
R. Pastor-Satorras
Reuven Cohen
S. N. Dorogovtsev
S. Skiena
Shlomo Havlin
Vittoria Colizza
Zhenhua Wu
Publication venue: 'American Physical Society (APS)'
Publication date: 28/09/2006
Field of study

We study complex networks with weights,

w_{ij}

, associated with each link connecting node

i

and

j

. The weights are chosen to be correlated with the network topology in the form found in two real world examples, (a) the world-wide airport network, and (b) the {\it E. Coli} metabolic network. Here

w_{ij} \sim x_{ij} (k_i k_j)^\alpha

, where

k_i

and

k_j

are the degrees of nodes

i

and

j

x_{ij}

is a random number and

\alpha

represents the strength of the correlations. The case

\alpha > 0

represents correlation between weights and degree, while

\alpha < 0

represents anti-correlation and the case

\alpha = 0

reduces to the case of no correlations. We study the scaling of the lengths of the optimal paths,

\ell_{\rm opt}

, with the system size

N

in strong disorder for scale-free networks for different

\alpha

. We calculate the robustness of correlated scale-free networks with different

\alpha

, and find the networks with

\alpha < 0

to be the most robust networks when compared to the other values of

\alpha

. We propose an analytical method to study percolation phenomena on networks with this kind of correlation. We compare our simulation results with the real world-wide airport network, and we find good agreement

arXiv.org e-Print Archive

Crossref

CERN Document Server

Transport in weighted networks: Partition into superhighways and roads

Author: H. Eugene Stanley
J. B. Kruskal
Lidia A. Braunstein
M. Molloy
P. Erdős
R. Cohen
R. Pastor-Satorras
R. K. Ahuja
S. Skiena
S. N. Dorogovtsev
Shlomo Havlin
T. H. Cormen
Zhenhua Wu
Publication venue: 'American Physical Society (APS)'
Publication date: 03/05/2006
Field of study

Transport in weighted networks is dominated by the minimum spanning tree (MST), the tree connecting all nodes with the minimum total weight. We find that the MST can be partitioned into two distinct components, having significantly different transport properties, characterized by centrality -- number of times a node (or link) is used by transport paths. One component, the {\it superhighways}, is the infinite incipient percolation cluster; for which we find that nodes (or links) with high centrality dominate. For the other component, {\it roads}, which includes the remaining nodes, low centrality nodes dominate. We find also that the distribution of the centrality for the infinite incipient percolation cluster satisfies a power law, with an exponent smaller than that for the entire MST. The significance of this finding is that one can improve significantly the global transport by improving a tiny fraction of the network, the superhighways.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive

Crossref