3,206,762 research outputs found

    Modelling decision tables from data.

    Get PDF
    On most datasets induction algorithms can generate very accurate classifiers. Sometimes, however, these classifiers are very hard to understand for humans. Therefore, in this paper it is investigated how we can present the extracted knowledge to the user by means of decision tables. Decision tables are very easy to understand. Furthermore, decision tables provide interesting facilities to check the extracted knowledge on consistency and completeness. In this paper, it is demonstrated how a consistent and complete DT can be modelled starting from raw data. The proposed method is empirically validated on several benchmarking datasets. It is shown that the modelling decision tables are sufficiently small. This allows easy consultation of the represented knowledge.Data;

    Data Tables for Lorentz and CPT Violation

    Get PDF
    This work tabulates measured and derived values of coefficients for Lorentz and CPT violation in the Standard-Model Extension. Summary tables are extracted listing maximal attained sensitivities in the matter, photon, neutrino, and gravity sectors. Tables presenting definitions and properties are also compiled.Comment: 122 pages, 2020 editio

    Evaluation of missing data mechanisms in two and three dimensional incomplete tables

    Full text link
    The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies

    Is a Dataframe Just a Table?

    Get PDF
    Querying data is core to databases and data science. However, the two communities have seemingly different concepts and use cases. As a result, both designers and users of the query languages disagree on whether the core abstractions - dataframes (data science) and tables (databases) - and the operations are the same. To investigate the difference from a PL-HCI perspective, we identify the basic affordances provided by tables and dataframes and how programming experiences over tables and dataframes differ. We show that the data structures nudge programmers to query and store their data in different ways. We hope the case study could clarify confusions, dispel misinformation, increase cross-pollination between the two communities, and identify open PL-HCI questions

    Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods

    Full text link
    A pair of ecological tables is made of one table containing environmental variables (in columns) and another table containing species data (in columns). The rows of these two tables are identical and correspond to the sites where environmental variables and species data have been measured. Such data are used to analyze the relationships between species and their environment. If sampling is repeated over time for both tables, one obtains a sequence of pairs of ecological tables. Analyzing this type of data is a way to assess changes in species-environment relationships, which can be important for conservation Ecology or for global change studies. We present a new data analysis method adapted to the study of this type of data, and we compare it with two other methods on the same data set. All three methods are implemented in the ade4 package for the R environment.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS372 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore