7,607 research outputs found

    New probabilistic interest measures for association rules

    Full text link
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

    Computing Real Roots of Real Polynomials

    Full text link
    Computing the roots of a univariate polynomial is a fundamental and long-studied problem of computational algebra with applications in mathematics, engineering, computer science, and the natural sciences. For isolating as well as for approximating all complex roots, the best algorithm known is based on an almost optimal method for approximate polynomial factorization, introduced by Pan in 2002. Pan's factorization algorithm goes back to the splitting circle method from Schoenhage in 1982. The main drawbacks of Pan's method are that it is quite involved and that all roots have to be computed at the same time. For the important special case, where only the real roots have to be computed, much simpler methods are used in practice; however, they considerably lag behind Pan's method with respect to complexity. In this paper, we resolve this discrepancy by introducing a hybrid of the Descartes method and Newton iteration, denoted ANEWDSC, which is simpler than Pan's method, but achieves a run-time comparable to it. Our algorithm computes isolating intervals for the real roots of any real square-free polynomial, given by an oracle that provides arbitrary good approximations of the polynomial's coefficients. ANEWDSC can also be used to only isolate the roots in a given interval and to refine the isolating intervals to an arbitrary small size; it achieves near optimal complexity for the latter task.Comment: to appear in the Journal of Symbolic Computatio

    Yang-Mills streamlines and semi-classical confinement

    Full text link
    Semi-classical configurations in Yang-Mills theory have been derived from lattice Monte Carlo configurations using a recently proposed constrained cooling technique which is designed to preserve every Polyakov line (at any point in space-time in any direction). Consequently, confinement was found sustained by the ensemble of semi-classical configurations. The existence of gluonic and fermionic near-to-zero modes was demonstrated as a precondition for a possible semi-classical expansion around the cooled configurations as well as providing the gapless spectrum of the Dirac operator necessary for chiral symmetry breaking. The cluster structure of topological charge of the semi-classical streamline configurations was analysed and shown to support the axial anomaly of the right size, although the structure differs from the instanton gas or liquid. Here, we present further details on the space-time structure and the time evolution of the streamline configurations.Comment: Invited talk presented at the conference "Quark confinement and the hadron spectrum IX", Madrid, Aug 30 - Sept 3, 201

    A computational algorithm for crack determination: The multiple crack case

    Get PDF
    An algorithm for recovering a collection of linear cracks in a homogeneous electrical conductor from boundary measurements of voltages induced by specified current fluxes is developed. The technique is a variation of Newton's method and is based on taking weighted averages of the boundary data. The method also adaptively changes the applied current flux at each iteration to maintain maximum sensitivity to the estimated locations of the cracks

    TSP--Infrastructure for the Traveling Salesperson Problem

    Get PDF
    The traveling salesperson (or, salesman) problem (TSP) is a well known and important combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once and then returns to the starting city. Despite this simple problem statement, solving the TSP is difficult since it belongs to the class of NP-complete problems. The importance of the TSP arises besides from its theoretical appeal from the variety of its applications. Typical applications in operations research include vehicle routing, computer wiring, cutting wallpaper and job sequencing. The main application in statistics is combinatorial data analysis, e.g., reordering rows and columns of data matrices or identifying clusters. In this paper, we introduce the R package TSP which provides a basic infrastructure for handling and solving the traveling salesperson problem. The package features S3 classes for specifying a TSP and its (possibly optimal) solution as well as several heuristics to find good solutions. In addition, it provides an interface to Concorde, one of the best exact TSP solvers currently available.

    Implications of probabilistic data modeling for rule mining

    Get PDF
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine associations are discussed in great detail. In this paper we investigate properties of transaction data sets from a probabilistic point of view. We present a simple probabilistic framework for transaction data and its implementation using the R statistical computing environment. The framework can be used to simulate transaction data when no associations are present. We use such data to explore the ability to filter noise of confidence and lift, two popular interest measures used for rule mining. Based on the framework we develop the measure hyperlift and we compare this new measure to lift using simulated data and a real-world grocery database.Series: Research Report Series / Department of Statistics and Mathematic

    More Than an Academic Question: Defining Student Ownership of Intellectual Property Rights

    Get PDF
    Intellectual property is increasingly important due to technology’s rapid development. The importance of intellectual property is also reflected within universities as traditional centers of research and expression, where students and faculty are encouraged to develop inventions and creative works throughout the educational experience. The commercialization potential of the intellectual property that emerges from these efforts has led many universities to adopt policies to determine ownership of intellectual property rights. Many of these policies take different approaches to ownership, and most students are unaware of their rights and are unlikely to consider whether the university has a claim to ownership. The purpose of this Article is to outline how intellectual property rights arise in the academic environment and to analyze how university policies determine ownership rights for students and the university. This Article concludes by urging universities and students to acknowledge the existence of these issues, adopt policies to address ownership rights, and make these policies known to members of the university community

    Direct-to-Consumer Advertising in Pharmaceutical Markets

    Get PDF
    We study effects of direct-to-consumer advertising (DTCA) in a market with two pharmaceutical firms providing horizontally differentiated (branded) drugs. Patients varying in their susceptability to medication are a priori uninformed of available medication. Physicians making the prescription choice perfectly identify a patient’s most suitable drug. Firms promote drugs to physicians (detailing) to influence prescription decisions and, if allowed, to consumers (DTCA) to increase the awareness of the drug. The main findings are: Firstly, firms benefit from DTCA only if prices are regulated. On the one hand, DTCA reduces the physicians’ market power and thus detailing expenses, while, on the other, it triggers price competition as a larger share of patients are aware of the alternatives. Secondly, under price regulation DTCA is welfare improving as long as the regulated price is not too high. Under price competition, DTCA is harmful to welfare unless detailing is wasteful and the drugs are poor substitutes.Advertising; Pharmaceuticals; Oligopoly

    Direct-to-Consumer Advertising in Pharmaceutical Markets

    Get PDF
    We study effects of direct-to-consumer advertising (DTCA) in a mar- ket with two pharmaceutical firms providing horizontally dierentiated (branded) drugs. Patients varying in their susceptability to medication are a prioriuninformed of available medication. Physicians making the prescription choice perfectly identify a patient's most suitable drug. Firms promote drugs to physicians (detailing) to influence prescription decisions and, if allowed, to consumers (DTCA) to increase the awareness of the drug. The main Þndings are: Firstly, Þrms beneÞt fromDTCAonlyif prices are regulated. On the one hand, DTCA reduces the physicians™ market power and thus detailing expenses, while, on the other, it triggers price competition as a larger share of patients are aware of the alternatives. Secondly, under price regulation DTCA is welfare improving as long as the regulated price is not too high. Under price competition, DTCA lowers welfare unless detailing is wasteful and the drugs are poor substitutes.Advertising; Pharmaceuticals; Oligopoly
    corecore