7,800 research outputs found
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful
patterns in transaction databases. Many different measures of interestingness
have been proposed for association rules. However, these measures fail to take
the probabilistic properties of the mined data into account. In this paper, we
start with presenting a simple probabilistic framework for transaction data
which can be used to simulate transaction data when no associations are
present. We use such data and a real-world database from a grocery outlet to
explore the behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically influenced
by the frequency of the items in the left hand side of rules and that lift
performs poorly to filter random noise in transaction data. Based on the
probabilistic framework we develop two new interest measures, hyper-lift and
hyper-confidence, which can be used to filter or order mined association rules.
The new measures show significantly better performance than lift for
applications where spurious rules are problematic
Computing Real Roots of Real Polynomials
Computing the roots of a univariate polynomial is a fundamental and
long-studied problem of computational algebra with applications in mathematics,
engineering, computer science, and the natural sciences. For isolating as well
as for approximating all complex roots, the best algorithm known is based on an
almost optimal method for approximate polynomial factorization, introduced by
Pan in 2002. Pan's factorization algorithm goes back to the splitting circle
method from Schoenhage in 1982. The main drawbacks of Pan's method are that it
is quite involved and that all roots have to be computed at the same time. For
the important special case, where only the real roots have to be computed, much
simpler methods are used in practice; however, they considerably lag behind
Pan's method with respect to complexity.
In this paper, we resolve this discrepancy by introducing a hybrid of the
Descartes method and Newton iteration, denoted ANEWDSC, which is simpler than
Pan's method, but achieves a run-time comparable to it. Our algorithm computes
isolating intervals for the real roots of any real square-free polynomial,
given by an oracle that provides arbitrary good approximations of the
polynomial's coefficients. ANEWDSC can also be used to only isolate the roots
in a given interval and to refine the isolating intervals to an arbitrary small
size; it achieves near optimal complexity for the latter task.Comment: to appear in the Journal of Symbolic Computatio
Yang-Mills streamlines and semi-classical confinement
Semi-classical configurations in Yang-Mills theory have been derived from
lattice Monte Carlo configurations using a recently proposed constrained
cooling technique which is designed to preserve every Polyakov line (at any
point in space-time in any direction). Consequently, confinement was found
sustained by the ensemble of semi-classical configurations. The existence of
gluonic and fermionic near-to-zero modes was demonstrated as a precondition for
a possible semi-classical expansion around the cooled configurations as well as
providing the gapless spectrum of the Dirac operator necessary for chiral
symmetry breaking. The cluster structure of topological charge of the
semi-classical streamline configurations was analysed and shown to support the
axial anomaly of the right size, although the structure differs from the
instanton gas or liquid. Here, we present further details on the space-time
structure and the time evolution of the streamline configurations.Comment: Invited talk presented at the conference "Quark confinement and the
hadron spectrum IX", Madrid, Aug 30 - Sept 3, 201
A computational algorithm for crack determination: The multiple crack case
An algorithm for recovering a collection of linear cracks in a homogeneous electrical conductor from boundary measurements of voltages induced by specified current fluxes is developed. The technique is a variation of Newton's method and is based on taking weighted averages of the boundary data. The method also adaptively changes the applied current flux at each iteration to maintain maximum sensitivity to the estimated locations of the cracks
TSP--Infrastructure for the Traveling Salesperson Problem
The traveling salesperson (or, salesman) problem (TSP) is a well known and important combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once and then returns to the starting city. Despite this simple problem statement, solving the TSP is difficult since it belongs to the class of NP-complete problems. The importance of the TSP arises besides from its theoretical appeal from the variety of its applications. Typical applications in operations research include vehicle routing, computer wiring, cutting wallpaper and job sequencing. The main application in statistics is combinatorial data analysis, e.g., reordering rows and columns of data matrices or identifying clusters. In this paper, we introduce the R package TSP which provides a basic infrastructure for handling and solving the traveling salesperson problem. The package features S3 classes for specifying a TSP and its (possibly optimal) solution as well as several heuristics to find good solutions. In addition, it provides an interface to Concorde, one of the best exact TSP solvers currently available.
Implications of probabilistic data modeling for rule mining
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine associations are discussed in great detail. In this paper we investigate properties of transaction data sets from a probabilistic point of view. We present a simple probabilistic framework for transaction data and its implementation using the R statistical computing environment. The framework can be used to simulate transaction data when no associations are present. We use such data to explore the ability to filter noise of confidence and lift, two popular interest measures used for rule mining. Based on the framework we develop the measure hyperlift and we compare this new measure to lift using simulated data and a real-world grocery database.Series: Research Report Series / Department of Statistics and Mathematic
More Than an Academic Question: Defining Student Ownership of Intellectual Property Rights
Intellectual property is increasingly important due to technology’s rapid development. The importance of intellectual property is also reflected within universities as traditional centers of research and expression, where students and faculty are encouraged to develop inventions and creative works throughout the educational experience. The commercialization potential of the intellectual property that emerges from these efforts has led many universities to adopt policies to determine ownership of intellectual property rights. Many of these policies take different approaches to ownership, and most students are unaware of their rights and are unlikely to consider whether the university has a claim to ownership. The purpose of this Article is to outline how intellectual property rights arise in the academic environment and to analyze how university policies determine ownership rights for students and the university. This Article concludes by urging universities and students to acknowledge the existence of these issues, adopt policies to address ownership rights, and make these policies known to members of the university community
Direct-to-Consumer Advertising in Pharmaceutical Markets
We study effects of direct-to-consumer advertising (DTCA) in a market with two pharmaceutical firms providing horizontally differentiated (branded) drugs. Patients varying in their susceptability to medication are a priori uninformed of available medication. Physicians making the prescription choice perfectly identify a patient’s most suitable drug. Firms promote drugs to physicians (detailing) to influence prescription decisions and, if allowed, to consumers (DTCA) to increase the awareness of the drug. The main findings are: Firstly, firms benefit from DTCA only if prices are regulated. On the one hand, DTCA reduces the physicians’ market power and thus detailing expenses, while, on the other, it triggers price competition as a larger share of patients are aware of the alternatives. Secondly, under price regulation DTCA is welfare improving as long as the regulated price is not too high. Under price competition, DTCA is harmful to welfare unless detailing is wasteful and the drugs are poor substitutes.Advertising; Pharmaceuticals; Oligopoly
Direct-to-Consumer Advertising in Pharmaceutical Markets
We study effects of direct-to-consumer advertising (DTCA) in a mar- ket with two pharmaceutical firms providing horizontally dierentiated (branded) drugs. Patients varying in their susceptability to medication are a prioriuninformed of available medication. Physicians making the prescription choice perfectly identify a patient's most suitable drug. Firms promote drugs to physicians (detailing) to influence prescription decisions and, if allowed, to consumers (DTCA) to increase the awareness of the drug. The main Þndings are: Firstly, Þrms beneÞt fromDTCAonlyif prices are regulated. On the one hand, DTCA reduces the physicians™ market power and thus detailing expenses, while, on the other, it triggers price competition as a larger share of patients are aware of the alternatives. Secondly, under price regulation DTCA is welfare improving as long as the regulated price is not too high. Under price competition, DTCA lowers welfare unless detailing is wasteful and the drugs are poor substitutes.Advertising; Pharmaceuticals; Oligopoly
- …