30 research outputs found
Modified Erd\H{o}s-Ginzburg-Ziv constants for
Let be a finite abelian group written additively, and let be a
multiple of its exponent. The modified Erd\H{o}s-Ginzburg-Ziv constant
is the smallest integer such that every zero-sum
sequence of length over has a zero-sum subsequence of length . We
find exact values of for .Comment: 3 page
Data Discovery and Anomaly Detection using Atypicality.
Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017
Random curves, scaling limits and Loewner evolutions
61 pages, 26 figuresIn this paper, we provide a framework of estimates for describing 2D scaling limits by Schramm's SLE curves. In particular, we show that a weak estimate on the probability of an annulus crossing implies that a random curve arising from a statistical mechanics model will have scaling limits and those will be well described by Loewner evolutions with random driving forces. Interestingly, our proofs indicate that existence of a nondegenerate observable with a conformally- invariant scaling limit seems sufficient to deduce the required condition. Our paper serves as an important step in establishing the convergence of Ising and FK Ising interfaces to SLE curves; moreover, the setup is adapted to branching interface trees, conjecturally describing the full interface picture by a collection of branching SLEs.Peer reviewe
Large margin methods for partner specific prediction of interfaces in protein complexes
2014 Spring.The study of protein interfaces and binding sites is a very important domain of research in bioinformatics. Information about the interfaces between proteins can be used not only in understanding protein function but can also be directly employed in drug design and protein engineering. However, the experimental determination of protein interfaces is cumbersome, expensive and not possible in some cases with today's technology. As a consequence, the computational prediction of protein interfaces from sequence and structure has emerged as a very active research area. A number of machine learning based techniques have been proposed for the solution to this problem. However, the prediction accuracy of most such schemes is very low. In this dissertation we present large-margin classification approaches that have been designed to directly model different aspects of protein complex formation as well as the characteristics of available data. Most existing machine learning techniques for this task are partner-independent in nature, i.e., they ignore the fact that the binding propensity of a protein to bind to another protein is dependent upon characteristics of residues in both proteins. We have developed a pairwise support vector machine classifier called PAIRpred to predict protein interfaces in a partner-specific fashion. Due to its more detailed model of the problem, PAIRpred offers state of the art accuracy in predicting both binding sites at the protein level as well as inter-protein residue contacts at the complex level. PAIRpred uses sequence and structure conservation, local structural similarity and surface geometry, residue solvent exposure and template based features derived from the unbound structures of proteins forming a protein complex. We have investigated the impact of explicitly modeling the inter-dependencies between residues that are imposed by the overall structure of a protein during the formation of a protein complex through transductive and semi-supervised learning models. We also present a novel multiple instance learning scheme called MI-1 that explicitly models imprecision in sequence-level annotations of binding sites in proteins that bind calmodulin to achieve state of the art prediction accuracy for this task
The lower tail of -pushTASEP
We study -pushTASEP, a discrete time interacting particle system whose
distribution is related to the -Whittaker measure. We prove a uniform in
lower tail bound on the fluctuation scale for the location of the
right-most particle at time when started from step initial condition. Our
argument relies on a map from the -Whittaker measure to a model of periodic
last passage percolation (LPP) with geometric weights in an infinite strip that
was recently established in [arXiv:2106.11922]. By a path routing argument we
bound the passage time in the periodic environment in terms of an infinite sum
of independent passage times for standard LPP on squares with
geometric weights whose parameters decay geometrically. To prove our tail bound
result we combine this reduction with a concentration inequality, and a crucial
new technical result -- lower tail bounds on last passage times
uniformly over all and all the geometric parameters in
. This technical result uses Widom's trick [arXiv:math/0108008] and an
adaptation of an idea of Ledoux introduced for the GUE [Led05a] to reduce the
uniform lower tail bound to uniform asymptotics for very high moments, up to
order , of the Meixner ensemble. This we accomplish by first obtaining sharp
uniform estimates for factorial moments of the Meixner ensemble from an
explicit combinatorial formula of Ledoux [Led05b], and translating them to
polynomial bounds via a further careful analysis and delicate cancellation.Comment: 47 pages, 3 figures. Reorganization and minor correction
Asymptotic optimality of speed-aware JSQ for heterogeneous service systems
The Join-the-Shortest-Queue (JSQ) load-balancing scheme is known to minimise the average delay of jobs in homogeneous systems consisting of identical servers. However, it performs poorly in heterogeneous systems where servers have different processing rates. Finding a delay optimal scheme remains an open problem for heterogeneous systems. In this paper, we consider a speed-aware version of the JSQ scheme for heterogeneous systems and show that it achieves delay optimality in the fluid limit. One of the key issues in establishing this optimality result for heterogeneous systems is to show that the sequence of steady-state distributions indexed by the system size is tight in an appropriately defined space. The usual technique for showing tightness by coupling with a suitably defined dominant system does not work for heterogeneous systems. To prove tightness, we devise a new technique that uses the drift of exponential Lyapunov functions. Using the non-negativity of the drift, we show that the stationary queue length distribution has an exponentially decaying tail - a fact we use to prove tightness. Another technical difficulty arises due to the complexity of the underlying state-space and the separation of two time-scales in the fluid limit. Due to these factors, the fluid-limit turns out to be a function of the invariant distribution of a multi-dimensional8 Markov chain which is hard to characterise. By using some properties of this invariant distribution and using the monotonicity of the system, we show that the fluid limit is has a unique and globally attractive fixed point
On conjugate loci and cut loci of compact symmetric spaces I
Let (M, g) be a compact connected Riemannian manifold. Fix a point o of M and denote by To(M) the tangent space of M at o. Let Exp: To(M)-M be the exponential map of (M, g) at o. A tangent vector X∈To(M) is called a tangential conjugate point of (M, g), ..
Message passing algorithms - methods and applications
Algorithms on graphs are used extensively in many applications and research areas. Such applications include machine learning, artificial intelligence, communications, image processing, state tracking, sensor networks, sensor fusion, distributed cooperative estimation, and distributed computation. Among the types of algorithms that employ some kind of message passing over the connections in a graph, the work in this dissertation will consider belief propagation and gossip consensus algorithms.
We begin by considering the marginalization problem on factor graphs, which is often solved or approximated with Sum-Product belief propagation (BP) over the edges of the factor graph. For the case of sensor networks, where the conservation of energy is of critical importance and communication overhead can quickly drain this valuable resource, we present techniques for specifically addressing the needs of this low power scenario. We create a number of alternatives to Sum-Product BP. The first of these is a generalization of Stochastic BP with reduced setup time. We then present Projected BP, where a subset of elements from each message is transmitted between nodes, and computational savings are realized in proportion to the reduction in size of the transmitted messages. Zoom BP is a derivative of Projected BP that focuses particularly on utilizing low bandwidth discrete channels. We give the results of experiments that show the practical advantages of our alternatives to Sum-Product BP.
We then proceed with an application of Sum-Product BP in sequential investment. We combine various insights from universal portfolios research in order to construct more sophisticated algorithms that take into account transaction costs. In particular, we use the insights of Blum and Kalai's transaction costs algorithm to take these costs into account in Cover and Ordentlich's side information portfolio and Kozat and Singer's switching portfolio. This involves carefully designing a set of causal portfolio strategies and computing a convex combination of these according to a carefully designed distribution. Universal (sublinear regret) performance bounds for each of these portfolios show that the algorithms asymptotically achieve the wealth of the best strategy from the corresponding portfolio strategy set, to first order in the exponent. The Sum-Product algorithm on factor graph representations of the universal investment algorithms provides computationally tractable approximations to the investment strategies. Finally, we present results of simulations of our algorithms and compare them to other portfolios.
We then turn our attention to gossip consensus and distributed estimation algorithms. Specifically, we consider the problem of estimating the parameters in a model of an agent's observations when it is known that the population as a whole is partitioned into a number of subpopulations, each of which has model parameters that are common among the member agents. We develop a method for determining the beneficial communication links in the network, which involves maintaining non-cooperative parameter estimates at each agent, and the distance of this estimate is compared with those of the neighbors to determine time-varying connectivity. We also study the expected squared estimation error of our algorithm, showing that estimates are asymptotically as good as centralized estimation, and we study the short term error convergence behavior.
Finally, we examine the metrics used to guide the design of data converters in the setting of digital communications. The usual analog to digital converters (ADC) performance metrics---effective number of bits (ENOB), total harmonic distortion (THD), signal to noise and distortion ratio (SNDR), and spurious free dynamic range (SFDR)---are all focused on the faithful reproduction of observed waveforms, which is not of fundamental concern if the data converter is to be used in a digital communications system. Therefore, we propose other information-centric rather than waveform-centric metrics that are better aligned with the goal of communications. We provide computational methods for calculating the values of these metrics, some of which are derived from Sum-Product BP or related algorithms. We also propose Statistics Gathering Converters (SGCs), which represent a change in perspective on data conversion for communications applications away from signal representation and towards the collection of relevant statistics for the purposes of decision making and detection. We show how to develop algorithms for the detection of transmitted data when the transmitted signal is received by an SGC. Finally, we provide evidence for the benefits of using system-level metrics and statistics gathering converters in communications applications
Universality for cokernels of random matrix products
For random integer matrices with independent entries, we study the
distribution of the cokernel of their
product. We show that this distribution converges to a universal one as for a general class of matrix entry distributions, and more generally
show universal limits for the joint distribution of
. Furthermore, we characterize the universal distributions arising
as marginals of a natural generalization of the Cohen-Lenstra measure to
sequences of abelian groups with maps between them, which weights sequences
inversely proportionally to their number of automorphisms. The proofs develop
an extension of the moment method of Wood to joint moments of multiple groups,
and rely also on the connection to Hall-Littlewood polynomials and symmetric
function identities. As a corollary we obtain an explicit universal
distribution for coranks of random matrix products over as the
matrix size tends to infinity.Comment: 50 pages. v2: some references added, minor errors and typos fixed in
response to referee comments, this version to appear in Advances in
Mathematic