23,173 research outputs found

    Lie and conditional symmetries of a class of nonlinear (1+2)-dimensional boundary value problems

    Full text link
    A new definition of conditional invariance for boundary value problems involving a wide range of boundary conditions (including initial value problems as a special case) is proposed. It is shown that other definitions worked out in order to find Lie symmetries of boundary value problems with standard boundary conditions, follow as particular cases from our definition. Simple examples of direct applicability to the nonlinear problems arising in applications are demonstrated. Moreover, the successful application of the definition for the Lie and conditional symmetry classification of a class of (1+2)-dimensional nonlinear boundary value problems governed by the nonlinear diffusion equation in a semi-infinite domain is realised. In particular, it is proved that there is a special exponent, k=2k=-2, for the power diffusivity uku^k when the problem in question with non-vanishing flux on the boundary admits additional Lie symmetry operators compared to the case k2k\not=-2. In order to demonstrate the applicability of the symmetries derived, they are used for reducing the nonlinear problems with power diffusivity uku^k and a constant non-zero flux on the boundary (such problems are common in applications and describing a wide range of phenomena) to (1+1)-dimensional problems. The structure and properties of the problems obtained are briefly analysed. Finally, some results demonstrating how Lie invariance of the boundary value problem in question depends on geometry of the domain are presented.Comment: 25 pages; the main results were presented at the Conference Symmetry, Methods, Applications and Related Fields, Vancouver, Canada, May 13-16, 201

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

    Evolving artificial datasets to improve interpretable classifiers

    Get PDF
    Differential Evolution can be used to construct effective and compact artificial training datasets for machine learning algorithms. In this paper, a series of comparative experiments are performed in which two simple interpretable supervised classifiers (specifically, Naive Bayes and linear Support Vector Machines) are trained (i) directly on “real” data, as would be the normal case, and (ii) indirectly, using special artificial datasets derived from real data via evolutionary optimization. The results across several challenging test problems show that supervised classifiers trained indirectly using our novel evolution-based approach produce models with superior predictive classification performance. Besides presenting the accuracy of the learned models, we also analyze the sensitivity of our artificial data optimization process to Differential Evolution's parameters, and then we examine the statistical characteristics of the artificial data that is evolved
    corecore