19,266 research outputs found
Teaching Data Science
We describe an introductory data science course, entitled Introduction to
Data Science, offered at the University of Illinois at Urbana-Champaign. The
course introduced general programming concepts by using the Python programming
language with an emphasis on data preparation, processing, and presentation.
The course had no prerequisites, and students were not expected to have any
programming experience. This introductory course was designed to cover a wide
range of topics, from the nature of data, to storage, to visualization, to
probability and statistical analysis, to cloud and high performance computing,
without becoming overly focused on any one subject. We conclude this article
with a discussion of lessons learned and our plans to develop new data science
courses.Comment: 10 pages, 4 figures, International Conference on Computational
Science (ICCS 2016
Counting minimal generator matrices
Given a particular convolutional code C, we wish to find all minimal generator matrices G(D) which represent that code. A standard form S(D) for a minimal matrix is defined, and then all standard forms for the code C are counted (this is equivalent to counting special pre-multiplication matrices P(D)). It is shown that all the minimal generator matrices G(D) are contained within the 'ordered row permutations' of these standard forms, and that all these permutations are distinct. Finally, the result is used to place a simple upper bound on the possible number of convolutional codes
When is multidimensional screening a convex program?
A principal wishes to transact business with a multidimensional distribution
of agents whose preferences are known only in the aggregate. Assuming a twist
(= generalized Spence-Mirrlees single-crossing) hypothesis and that agents can
choose only pure strategies, we identify a structural condition on the
preference b(x,y) of agent type x for product type y -- and on the principal's
costs c(y) -- which is necessary and sufficient for reducing the profit
maximization problem faced by the principal to a convex program. This is a key
step toward making the principal's problem theoretically and computationally
tractable; in particular, it allows us to derive uniqueness and stability of
the principal's optimum strategy -- and similarly of the strategy maximizing
the expected welfare of the agents when the principal's profitability is
constrained. We call this condition non-negative cross-curvature: it is also
(i) necessary and sufficient to guarantee convexity of the set of b-convex
functions, (ii) invariant under reparametrization of agent and/or product types
by diffeomorphisms, and (iii) a strengthening of Ma, Trudinger and Wang's
necessary and sufficient condition (A3w) for continuity of the correspondence
between an exogenously prescribed distribution of agents and of products. We
derive the persistence of economic effects such as the desirability for a
monopoly to establish prices so high they effectively exclude a positive
fraction of its potential customers, in nearly the full range of non-negatively
cross-curved models.Comment: 23 page
Optimal transportation, topology and uniqueness
The Monge-Kantorovich transportation problem involves optimizing with respect
to a given a cost function. Uniqueness is a fundamental open question about
which little is known when the cost function is smooth and the landscapes
containing the goods to be transported possess (non-trivial) topology. This
question turns out to be closely linked to a delicate problem (# 111) of
Birkhoff [14]: give a necessary and sufficient condition on the support of a
joint probability to guarantee extremality among all measures which share its
marginals. Fifty years of progress on Birkhoff's question culminate in Hestir
and Williams' necessary condition which is nearly sufficient for extremality;
we relax their subtle measurability hypotheses separating necessity from
sufficiency slightly, yet demonstrate by example that to be sufficient
certainly requires some measurability. Their condition amounts to the vanishing
of the measure \gamma outside a countable alternating sequence of graphs and
antigraphs in which no two graphs (or two antigraphs) have domains that
overlap, and where the domain of each graph / antigraph in the sequence
contains the range of the succeeding antigraph (respectively, graph). Such
sequences are called numbered limb systems. We then explain how this
characterization can be used to resolve the uniqueness of Kantorovich solutions
for optimal transportation on a manifold with the topology of the sphere.Comment: 36 pages, 6 figure
Regularity of optimal transport maps on multiple products of spheres
This article addresses regularity of optimal transport maps for cost="squared
distance" on Riemannian manifolds that are products of arbitrarily many round
spheres with arbitrary sizes and dimensions. Such manifolds are known to be
non-negatively cross-curved [KM2]. Under boundedness and non-vanishing
assumptions on the transfered source and target densities we show that optimal
maps stay away from the cut-locus (where the cost exhibits singularity), and
obtain injectivity and continuity of optimal maps. This together with the
result of Liu, Trudinger and Wang [LTW] also implies higher regularity
(C^{1,\alpha}/C^\infty) of optimal maps for more smooth (C^\alpha /C^\infty))
densities. These are the first global regularity results which we are aware of
concerning optimal maps on non-flat Riemannian manifolds which possess some
vanishing sectional curvatures. Moreover, such product manifolds have potential
relevance in statistics (see [S]) and in statistical mechanics (where the state
of a system consisting of many spins is classically modeled by a point in the
phase space obtained by taking many products of spheres). For the proof we
apply and extend the method developed in [FKM1], where we showed injectivity
and continuity of optimal maps on domains in R^n for smooth non-negatively
cross-curved cost. The major obstacle in the present paper is to deal with the
non-trivial cut-locus and the presence of flat directions.Comment: 35 pages, 4 figure
Subspace Methods for Data Attack on State Estimation: A Data Driven Approach
Data attacks on state estimation modify part of system measurements such that
the tempered measurements cause incorrect system state estimates. Attack
techniques proposed in the literature often require detailed knowledge of
system parameters. Such information is difficult to acquire in practice. The
subspace methods presented in this paper, on the other hand, learn the system
operating subspace from measurements and launch attacks accordingly. Conditions
for the existence of an unobservable subspace attack are obtained under the
full and partial measurement models. Using the estimated system subspace, two
attack strategies are presented. The first strategy aims to affect the system
state directly by hiding the attack vector in the system subspace. The second
strategy misleads the bad data detection mechanism so that data not under
attack are removed. Performance of these attacks are evaluated using the IEEE
14-bus network and the IEEE 118-bus network.Comment: 12 page
- …