3,206,762 research outputs found
Modelling decision tables from data.
On most datasets induction algorithms can generate very accurate classifiers. Sometimes, however, these classifiers are very hard to understand for humans. Therefore, in this paper it is investigated how we can present the extracted knowledge to the user by means of decision tables. Decision tables are very easy to understand. Furthermore, decision tables provide interesting facilities to check the extracted knowledge on consistency and completeness. In this paper, it is demonstrated how a consistent and complete DT can be modelled starting from raw data. The proposed method is empirically validated on several benchmarking datasets. It is shown that the modelling decision tables are sufficiently small. This allows easy consultation of the represented knowledge.Data;
Data Tables for Lorentz and CPT Violation
This work tabulates measured and derived values of coefficients for Lorentz
and CPT violation in the Standard-Model Extension. Summary tables are extracted
listing maximal attained sensitivities in the matter, photon, neutrino, and
gravity sectors. Tables presenting definitions and properties are also
compiled.Comment: 122 pages, 2020 editio
Evaluation of missing data mechanisms in two and three dimensional incomplete tables
The analysis of incomplete contingency tables is a practical and an
interesting problem. In this paper, we provide characterizations for the
various missing mechanisms of a variable in terms of response and non-response
odds for two and three dimensional incomplete tables. Log-linear
parametrization and some distinctive properties of the missing data models for
the above tables are discussed. All possible cases in which data on one, two or
all variables may be missing are considered. We study the missingness of each
variable in a model, which is more insightful for analyzing cross-classified
data than the missingness of the outcome vector. For sensitivity analysis of
the incomplete tables, we propose easily verifiable procedures to evaluate the
missing at random (MAR), missing completely at random (MCAR) and not missing at
random (NMAR) assumptions of the missing data models. These methods depend only
on joint and marginal odds computed from fully and partially observed counts in
the tables, respectively. Finally, some real-life datasets are analyzed to
illustrate our results, which are confirmed based on simulation studies
Is a Dataframe Just a Table?
Querying data is core to databases and data science. However, the two communities have seemingly different concepts and use cases. As a result, both designers and users of the query languages disagree on whether the core abstractions - dataframes (data science) and tables (databases) - and the operations are the same. To investigate the difference from a PL-HCI perspective, we identify the basic affordances provided by tables and dataframes and how programming experiences over tables and dataframes differ. We show that the data structures nudge programmers to query and store their data in different ways. We hope the case study could clarify confusions, dispel misinformation, increase cross-pollination between the two communities, and identify open PL-HCI questions
Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods
A pair of ecological tables is made of one table containing environmental
variables (in columns) and another table containing species data (in columns).
The rows of these two tables are identical and correspond to the sites where
environmental variables and species data have been measured. Such data are used
to analyze the relationships between species and their environment. If sampling
is repeated over time for both tables, one obtains a sequence of pairs of
ecological tables. Analyzing this type of data is a way to assess changes in
species-environment relationships, which can be important for conservation
Ecology or for global change studies. We present a new data analysis method
adapted to the study of this type of data, and we compare it with two other
methods on the same data set. All three methods are implemented in the ade4
package for the R environment.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS372 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …