468,062 research outputs found
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Travel to extraterrestrial bodies over time: some exploratory analyses of mission data
This paper discusses data pertaining to space missions to astronomical bodies beyond earth. The analyses provide summarizing facts and graphs obtained by mining data about (1) missions launched by all countries that go to the moon and planets, and (2) Earth satellites obtained from a Union of Concerned Scientists (UCS) dataset and lists of publically available satellite data
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning
Manifold-learning techniques are routinely used in mining complex
spatiotemporal data to extract useful, parsimonious data
representations/parametrizations; these are, in turn, useful in nonlinear model
identification tasks. We focus here on the case of time series data that can
ultimately be modelled as a spatially distributed system (e.g. a partial
differential equation, PDE), but where we do not know the space in which this
PDE should be formulated. Hence, even the spatial coordinates for the
distributed system themselves need to be identified - to emerge from - the data
mining process. We will first validate this emergent space reconstruction for
time series sampled without space labels in known PDEs; this brings up the
issue of observability of physical space from temporal observation data, and
the transition from spatially resolved to lumped (order-parameter-based)
representations by tuning the scale of the data mining kernels. We will then
present actual emergent space discovery illustrations. Our illustrative
examples include chimera states (states of coexisting coherent and incoherent
dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics,
arising in partial differential equations and/or in heterogeneous networks. We
also discuss how data-driven spatial coordinates can be extracted in ways
invariant to the nature of the measuring instrument. Such gauge-invariant data
mining can go beyond the fusion of heterogeneous observations of the same
system, to the possible matching of apparently different systems
Informatics, Data Mining, Econometrics and Financial Economics: A Connection
This short communication reviews some of the literature in econometrics and financial economics that is related to informatics and data mining. We then discuss some of the research on econometrics and financial economics that could be extended to informatics and data mining beyond the existing areas in econometrics and financial economics
Intelligent Knowledge Beyond Data Mining: Influences of Habitual Domains
Data mining is a useful analytic method and has been increasingly used by organizations to gain insights from large-scale data. Prior studies of data mining have focused on developing automatic data mining models that belong to first-order data mining. Recently, researchers have called for more study of the second-order data mining process. Second-order data mining process is an important step to convert data mining results into intelligent knowledge, i.e., actionable knowledge. Specifically, second-order data mining refers to the post-stage of data mining projects in which humans collectively make judgments on data mining models’ performance. Understanding the second-order data mining process is valuable in addressing how data mining can be used best by organizations in order to achieve competitive advantages. Drawing on the theory of habitual domains, this study developed a conceptual model for understanding the impact of human cognition characteristics on second-order data mining. Results from a field survey study showed significant correlations between habitual domain characteristics, such as educational level and prior experience with data mining, and human judgments on classifiers’ performance
Oceanic Games: Centralization Risks and Incentives in Blockchain Mining
To participate in the distributed consensus of permissionless blockchains,
prospective nodes -- or miners -- provide proof of designated, costly
resources. However, in contrast to the intended decentralization, current data
on blockchain mining unveils increased concentration of these resources in a
few major entities, typically mining pools. To study strategic considerations
in this setting, we employ the concept of Oceanic Games, Milnor and Shapley
(1978). Oceanic Games have been used to analyze decision making in corporate
settings with small numbers of dominant players (shareholders) and large
numbers of individually insignificant players, the ocean. Unlike standard
equilibrium models, they focus on measuring the value (or power) per entity and
per unit of resource} in a given distribution of resources. These values are
viewed as strategic components in coalition formations, mergers and resource
acquisitions. Considering such issues relevant to blockchain governance and
long-term sustainability, we adapt oceanic games to blockchain mining and
illustrate the defined concepts via examples. The application of existing
results reveals incentives for individual miners to merge in order to increase
the value of their resources. This offers an alternative perspective to the
observed centralization and concentration of mining power. Beyond numerical
simulations, we use the model to identify issues relevant to the design of
future cryptocurrencies and formulate prospective research questions.Comment: [Best Paper Award] at the International Conference on Mathematical
Research for Blockchain Economy (MARBLE 2019
- …