114 research outputs found
Identifying almost sorted permutations from TCP buffer dynamics
Associate to each sequence of integers (intending to represent packet
IDs) a sequence of positive integers of the same length . The
'th entry of is the size (at time ) of the smallest
buffer needed to hold out-of-order packets, where space is accounted for
unreceived packets as well. Call two sequences , {\em equivalent}
(written ) if .
We prove the following result: any two permutations of the same length
with , (where SUS is the {\em shuffled-up-sequences}
reordering measure), and such that are identical.
The result (which is no longer valid if we replace the upper bound 3 by 4)
was motivated by RESTORED, a receiver-oriented model of network traffic we have
previously introduced
List Heaps
This paper presents a simple extension of the binary heap, the List Heap. We
use List Heaps to demonstrate the idea of adaptive heaps: heaps whose
performance is a function of both the size of the problem instance and the
disorder of the problem instance. We focus on the presortedness of the input
sequence as a measure of disorder for the problem instance. A number of
practical applications that rely on heaps deal with input that is not random.
Even random input contains presorted subsequences. Devising heaps that exploit
this structure may provide a means for improving practical performance. We
present some basic empirical tests to support this claim. Additionally,
adaptive heaps may provide an interesting direction for theoretical
investigation
Stationarily ordered types and the number of countable models
We introduce notions of stationarily ordered types and theories; the latter
generalizes weak o-minimality and the first is a relaxed version of weak
o-minimality localized at the locus of a single type. We show that forking, as
a binary relation on elements realizing stationarily ordered types, is an
equivalence relation and that each stationarily ordered type in a model
determines some order-type as an invariant of the model. We study weak and
forking non-orthogonality of stationarily ordered types, show that they are
equivalence relations and prove that invariants of non-orthogonal types are
closely related. The developed techniques are applied to prove that in the case
of a binary, stationarily ordered theory with fewer than
countable models, the isomorphism type of a countable model is determined by a
certain sequence of invariants of the model. In particular, we confirm Vaught's
conjecture for binary, stationarily ordered theories.Comment: Revised version accepted for publication in Annals of Pure and
Applied Logi
Restricted Patience Sorting and Barred Pattern Avoidance
Patience Sorting is a combinatorial algorithm that can be viewed as an
iterated, non-recursive form of the Schensted Insertion Algorithm. In recent
work the authors have shown that Patience Sorting provides an algorithmic
description for permutations avoiding the barred (generalized) permutation
pattern . Motivated by this and a recently formulated geometric
form for Patience Sorting in terms of certain intersecting lattice paths, we
study the related themes of restricted input and avoidance of similar barred
permutation patterns. One such result is to characterize those permutations for
which Patience Sorting is an invertible algorithm as the set of permutations
simultaneously avoiding the barred patterns and .
We then enumerate this avoidance set, which involves convolved Fibonacci
numbers.Comment: 12 pages, LaTeX, uses pstricks, needs fpsac.cls v2: final version of
extended abstract for FPSAC'0
CFOF: A Concentration Free Measure for Anomaly Detection
We present a novel notion of outlier, called the Concentration Free Outlier
Factor, or CFOF. As a main contribution, we formalize the notion of
concentration of outlier scores and theoretically prove that CFOF does not
concentrate in the Euclidean space for any arbitrary large dimensionality. To
the best of our knowledge, there are no other proposals of data analysis
measures related to the Euclidean distance for which it has been provided
theoretical evidence that they are immune to the concentration effect. We
determine the closed form of the distribution of CFOF scores in arbitrarily
large dimensionalities and show that the CFOF score of a point depends on its
squared norm standard score and on the kurtosis of the data distribution, thus
providing a clear and statistically founded characterization of this notion.
Moreover, we leverage this closed form to provide evidence that the definition
does not suffer of the hubness problem affecting other measures. We prove that
the number of CFOF outliers coming from each cluster is proportional to cluster
size and kurtosis, a property that we call semi-locality. We determine that
semi-locality characterizes existing reverse nearest neighbor-based outlier
definitions, thus clarifying the exact nature of their observed local behavior.
We also formally prove that classical distance-based and density-based outliers
concentrate both for bounded and unbounded sample sizes and for fixed and
variable values of the neighborhood parameter. We introduce the fast-CFOF
algorithm for detecting outliers in large high-dimensional dataset. The
algorithm has linear cost, supports multi-resolution analysis, and is
embarrassingly parallel. Experiments highlight that the technique is able to
efficiently process huge datasets and to deal even with large values of the
neighborhood parameter, to avoid concentration, and to obtain excellent
accuracy
Random Shuffling to Reduce Disorder in Adaptive Sorting Scheme
In this paper we present a random shuffling scheme to apply with adaptive
sorting algorithms. Adaptive sorting algorithms utilize the presortedness
present in a given sequence. We have probabilistically increased the amount of
presortedness present in a sequence by using a random shuffling technique that
requires little computation. Theoretical analysis suggests that the proposed
scheme can improve the performance of adaptive sorting. Experimental results
show that it significantly reduces the amount of disorder present in a given
sequence and improves the execution time of adaptive sorting algorithm as well.Comment: 7 pages, 2 table
Seq2Slate: Re-ranking and Slate Optimization with RNNs
Ranking is a central task in machine learning and information retrieval. In
this task, it is especially important to present the user with a slate of items
that is appealing as a whole. This in turn requires taking into account
interactions between items, since intuitively, placing an item on the slate
affects the decision of which other items should be placed alongside it. In
this work, we propose a sequence-to-sequence model for ranking called
seq2slate. At each step, the model predicts the next `best' item to place on
the slate given the items already selected. The sequential nature of the model
allows complex dependencies between the items to be captured directly in a
flexible and scalable way. We show how to learn the model end-to-end from weak
supervision in the form of easily obtained click-through data. We further
demonstrate the usefulness of our approach in experiments on standard ranking
benchmarks as well as in a real-world recommendation system
Estimation of Monge Matrices
Monge matrices and their permuted versions known as pre-Monge matrices
naturally appear in many domains across science and engineering. While the rich
structural properties of such matrices have long been leveraged for algorithmic
purposes, little is known about their impact on statistical estimation. In this
work, we propose to view this structure as a shape constraint and study the
problem of estimating a Monge matrix subject to additive random noise. More
specifically, we establish the minimax rates of estimation of Monge and
pre-Monge matrices. In the case of pre-Monge matrices, the minimax-optimal
least-squares estimator is not efficiently computable, and we propose two
efficient estimators and establish their rates of convergence. Our theoretical
findings are supported by numerical experiments.Comment: 42 pages, 3 figure
A Variable Neighborhood MOEA/D for Multiobjective Test Task Scheduling Problem
Test task scheduling problem (TTSP) is a typical combinational optimization scheduling problem. This paper proposes a variable neighborhood MOEA/D (VNM) to solve the multiobjective TTSP. Two minimization objectives, the maximal completion time (makespan) and the mean workload, are considered together. In order to make solutions obtained more close to the real Pareto Front, variable neighborhood strategy is adopted. Variable neighborhood approach is proposed to render the crossover span reasonable. Additionally, because the search space of the TTSP is so large that many duplicate solutions and local optima will exist, the Starting Mutation is applied to prevent solutions from becoming trapped in local optima. It is proved that the solutions got by VNM can converge to the global optimum by using Markov Chain and Transition Matrix, respectively. The experiments of comparisons of VNM, MOEA/D, and CNSGA (chaotic nondominated sorting genetic algorithm) indicate that VNM performs better than the MOEA/D and the CNSGA in solving the TTSP. The results demonstrate that proposed algorithm VNM is an efficient approach to solve the multiobjective TTSP
Identifying Almost Sorted Permutations from TCP Buffer Dynamics
Associate to each sequence A of integers (intending to model packet IDs in a TCP/IP stream) a sequence of positive integers of the same length M(A). The i’th entry of M(A) is the size (at time i) of the smallest buffer needed to hold out-of-order packets, where space is accounted for unreceived packets as well. Call two sequences A, B equivalent (written A≡FB B) if M(A) = M(B). For a sequence of integers A define SUS(A) to be the shuffled-up-sequences reordering measure defined as the smallest possible number of classes in a partition of the original sequence into increasing subsequences. We prove the following result: any two permutations A, B of the same length with SUS(A), SUS(B) ≤ 3 such that A ≡FB B are identical. The result is no longer valid if we replace the upper bound 3 by 4. We also consider a similar problem for permutations with repeats. In this case the uniqueness of the preimage is no longer true, but we obtain a characterization of all the preimages of a given sequence, which in particular allows us to count them in polynomial time. The results were motivated by explaining the behavior and engineering RESTORED, a receiver-oriented model of traffic we introduced and experimentally validated in earlier work
- …