468,062 research outputs found

    The LSST Data Mining Research Agenda

    Full text link
    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

    Travel to extraterrestrial bodies over time: some exploratory analyses of mission data

    Get PDF
    This paper discusses data pertaining to space missions to astronomical bodies beyond earth. The analyses provide summarizing facts and graphs obtained by mining data about (1) missions launched by all countries that go to the moon and planets, and (2) Earth satellites obtained from a Union of Concerned Scientists (UCS) dataset and lists of publically available satellite data

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning

    Full text link
    Manifold-learning techniques are routinely used in mining complex spatiotemporal data to extract useful, parsimonious data representations/parametrizations; these are, in turn, useful in nonlinear model identification tasks. We focus here on the case of time series data that can ultimately be modelled as a spatially distributed system (e.g. a partial differential equation, PDE), but where we do not know the space in which this PDE should be formulated. Hence, even the spatial coordinates for the distributed system themselves need to be identified - to emerge from - the data mining process. We will first validate this emergent space reconstruction for time series sampled without space labels in known PDEs; this brings up the issue of observability of physical space from temporal observation data, and the transition from spatially resolved to lumped (order-parameter-based) representations by tuning the scale of the data mining kernels. We will then present actual emergent space discovery illustrations. Our illustrative examples include chimera states (states of coexisting coherent and incoherent dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics, arising in partial differential equations and/or in heterogeneous networks. We also discuss how data-driven spatial coordinates can be extracted in ways invariant to the nature of the measuring instrument. Such gauge-invariant data mining can go beyond the fusion of heterogeneous observations of the same system, to the possible matching of apparently different systems

    Informatics, Data Mining, Econometrics and Financial Economics: A Connection

    Get PDF
    This short communication reviews some of the literature in econometrics and financial economics that is related to informatics and data mining. We then discuss some of the research on econometrics and financial economics that could be extended to informatics and data mining beyond the existing areas in econometrics and financial economics

    Intelligent Knowledge Beyond Data Mining: Influences of Habitual Domains

    Get PDF
    Data mining is a useful analytic method and has been increasingly used by organizations to gain insights from large-scale data. Prior studies of data mining have focused on developing automatic data mining models that belong to first-order data mining. Recently, researchers have called for more study of the second-order data mining process. Second-order data mining process is an important step to convert data mining results into intelligent knowledge, i.e., actionable knowledge. Specifically, second-order data mining refers to the post-stage of data mining projects in which humans collectively make judgments on data mining models’ performance. Understanding the second-order data mining process is valuable in addressing how data mining can be used best by organizations in order to achieve competitive advantages. Drawing on the theory of habitual domains, this study developed a conceptual model for understanding the impact of human cognition characteristics on second-order data mining. Results from a field survey study showed significant correlations between habitual domain characteristics, such as educational level and prior experience with data mining, and human judgments on classifiers’ performance

    Oceanic Games: Centralization Risks and Incentives in Blockchain Mining

    Full text link
    To participate in the distributed consensus of permissionless blockchains, prospective nodes -- or miners -- provide proof of designated, costly resources. However, in contrast to the intended decentralization, current data on blockchain mining unveils increased concentration of these resources in a few major entities, typically mining pools. To study strategic considerations in this setting, we employ the concept of Oceanic Games, Milnor and Shapley (1978). Oceanic Games have been used to analyze decision making in corporate settings with small numbers of dominant players (shareholders) and large numbers of individually insignificant players, the ocean. Unlike standard equilibrium models, they focus on measuring the value (or power) per entity and per unit of resource} in a given distribution of resources. These values are viewed as strategic components in coalition formations, mergers and resource acquisitions. Considering such issues relevant to blockchain governance and long-term sustainability, we adapt oceanic games to blockchain mining and illustrate the defined concepts via examples. The application of existing results reveals incentives for individual miners to merge in order to increase the value of their resources. This offers an alternative perspective to the observed centralization and concentration of mining power. Beyond numerical simulations, we use the model to identify issues relevant to the design of future cryptocurrencies and formulate prospective research questions.Comment: [Best Paper Award] at the International Conference on Mathematical Research for Blockchain Economy (MARBLE 2019
    corecore