524,194 research outputs found
Fast Deterministic Selection
The Median of Medians (also known as BFPRT) algorithm, although a landmark
theoretical achievement, is seldom used in practice because it and its variants
are slower than simple approaches based on sampling. The main contribution of
this paper is a fast linear-time deterministic selection algorithm
QuickselectAdaptive based on a refined definition of MedianOfMedians. The
algorithm's performance brings deterministic selection---along with its
desirable properties of reproducible runs, predictable run times, and immunity
to pathological inputs---in the range of practicality. We demonstrate results
on independent and identically distributed random inputs and on
normally-distributed inputs. Measurements show that QuickselectAdaptive is
faster than state-of-the-art baselines.Comment: Pre-publication draf
Statistical analysis of simple repeats in the human genome
The human genome contains repetitive DNA at different level of sequence
length, number and dispersion. Highly repetitive DNA is particularly rich in
homo-- and di--nucleotide repeats, while middle repetitive DNA is rich of
families of interspersed, mobile elements hundreds of base pairs (bp) long,
among which the Alu families. A link between homo- and di-polymeric tracts and
mobile elements has been recently highlighted. In particular, the mobility of
Alu repeats, which form 10% of the human genome, has been correlated with the
length of poly(A) tracts located at one end of the Alu. These tracts have a
rigid and non-bendable structure and have an inhibitory effect on nucleosomes,
which normally compact the DNA. We performed a statistical analysis of the
genome-wide distribution of lengths and inter--tract separations of poly(X) and
poly(XY) tracts in the human genome. Our study shows that in humans the length
distributions of these sequences reflect the dynamics of their expansion and
DNA replication. By means of general tools from linguistics, we show that the
latter play the role of highly-significant content-bearing terms in the DNA
text. Furthermore, we find that such tracts are positioned in a non-random
fashion, with an apparent periodicity of 150 bases. This allows us to extend
the link between repetitive, highly mobile elements such as Alus and
low-complexity words in human DNA. More precisely, we show that Alus are
sources of poly(X) tracts, which in turn affect in a subtle way the combination
and diversification of gene expression and the fixation of multigene families
Algorithmic construction of Chevalley bases
We present a new algorithm for constructing a Chevalley basis for any
Chevalley Lie algebra over a finite field. This is a necessary component for
some constructive recognition algorithms of exceptional quasisimple groups of
Lie type. When applied to a simple Chevalley Lie algebra in characteristic at
least 5, our algorithm has complexity involving the 7th power of the Lie rank,
which is likely to be close to best possible
- …