1,285,331 research outputs found

    Methods for Classifying Nonprofit Organizations According to their Field of Activity: A Report on Semi-automated Methods Based on Text

    Get PDF
    There are various methods for classifying nonprofit organizations (NPOs) according to their field of activity. We report our experiences using two semi-automated methods based on textual data: rule-based classification and machine learning with curated keywords. We use those methods to classify Austrian nonprofit organizations based on the International Classification of Nonprofit Organizations. Those methods can provide a solution to the widespread research problem that quantitative data on the activities of NPOs are needed but not readily available from administrative data, long high-quality texts describing NPOs' activities are mostly unavailable, and human labor resources are limited. We find that in such a setting, rule-based classification performs about as well as manual human coding in terms of precision and sensitivity, while being much more labor-saving. Hence, we share our insights on how to efficiently implement such a rule-based approach. To address scholars with a background in data analytics as well as those without, we provide non-technical explanations and open-source sample code that is free to use and adapt

    Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning

    Full text link
    Reasoning is essential for the development of large knowledge graphs, especially for completion, which aims to infer new triples based on existing ones. Both rules and embeddings can be used for knowledge graph reasoning and they have their own advantages and difficulties. Rule-based reasoning is accurate and explainable but rule learning with searching over the graph always suffers from efficiency due to huge search space. Embedding-based reasoning is more scalable and efficient as the reasoning is conducted via computation between embeddings, but it has difficulty learning good representations for sparse entities because a good embedding relies heavily on data richness. Based on this observation, in this paper we explore how embedding and rule learning can be combined together and complement each other's difficulties with their advantages. We propose a novel framework IterE iteratively learning embeddings and rules, in which rules are learned from embeddings with proper pruning strategy and embeddings are learned from existing triples and new triples inferred by rules. Evaluations on embedding qualities of IterE show that rules help improve the quality of sparse entity embeddings and their link prediction results. We also evaluate the efficiency of rule learning and quality of rules from IterE compared with AMIE+, showing that IterE is capable of generating high quality rules more efficiently. Experiments show that iteratively learning embeddings and rules benefit each other during learning and prediction.Comment: This paper is accepted by WWW'1

    Influence of observations on the misclassification probability in quadratic discriminant analysis.

    Get PDF
    In this paper it is analyzed how observations in the training sample affect the misclassification probability of a quadratic discriminant rule. An approach based on partial influence functions is followed. It allows to quantify the effect of observations in the training sample on the quality of the associated classification rule. Focus is more on the effect on the future misclassification rate, than on the influence on the parameters of the quadratic discriminant rule. The expression for the influence function is then used to construct a diagnostic tool for detecting influential observations. Applications on real data sets are provided.Applications; Classification; Data; Diagnostics; Discriminant analysis; Functions; Influence function; Misclassification probability; Outliers; Partial influence functions; Probability; Quadratic discriminant analysis; Quality; Robust covariance estimation; Robust regression; Training;

    Mining for Useful Association Rules Using the ATMS

    Get PDF
    Association rule mining has made many achievements in the area of knowledge discovery in databases. Recent years, the quality of the extracted association rules has drawn more and more attention from researchers in data mining community. One big concern is with the size of the extracted rule set. Very often tens of thousands of association rules are extracted among which many are redundant thus useless. In this paper, we first analyze the redundancy problem in association rules and then propose a novel ATMS-based method for extracting non-redundant association rules
    • 

    corecore