6 research outputs found

    Patterns for Programming in the Semantic Web

    Get PDF
    Originally proposed in the mid-90s, design patterns for software development played a key role in object-oriented programming not only in increasing software quality, but also by giving a better understanding of the power and limitations of this paradigm. Since then, several authors have endorsed a similar task for other programming paradigms, in the hope of achieving similar benefits. In this paper we discuss design patterns for the Semantic Web, giving new insights on how existing programming frameworks can be used in a systematic way to design large-scale systems. The common denominator between these frameworks is the combination between different reasoning systems, namely description logics and logic programming. Therefore, we chose to work in a generalization of dl-programs that supports several (possibly different) description logics, expecting that our results will be easily adapted to other existing frameworks such as multi-context systems. This study also suggests new constructs to enforce legibility and internal structure of logic-based Semantic Web programs

    The ghosts of forgotten things: A study on size after forgetting

    Full text link
    Forgetting is removing variables from a logical formula while preserving the constraints on the other variables. In spite of being a form of reduction, it does not always decrease the size of the formula and may sometimes increase it. This article discusses the implications of such an increase and analyzes the computational properties of the phenomenon. Given a propositional Horn formula, a set of variables and a maximum allowed size, deciding whether forgetting the variables from the formula can be expressed in that size is DpD^p-hard in Σ2p\Sigma^p_2. The same problem for unrestricted propositional formulae is D2pD^p_2-hard in Σ3p\Sigma^p_3. The hardness results employ superredundancy: a superirredundant clause is in all formulae of minimal size equivalent to a given one. This concept may be useful outside forgetting

    Feature Ranking for Text Classifiers

    Get PDF
    Feature selection based on feature ranking has received much attention by researchers in the field of text classification. The major reasons are their scalability, ease of use, and fast computation. %, However, compared to the search-based feature selection methods such as wrappers and filters, they suffer from poor performance. This is linked to their major deficiencies, including: (i) feature ranking is problem-dependent; (ii) they ignore term dependencies, including redundancies and correlation; and (iii) they usually fail in unbalanced data. While using feature ranking methods for dimensionality reduction, we should be aware of these drawbacks, which arise from the function of feature ranking methods. In this thesis, a set of solutions is proposed to handle the drawbacks of feature ranking and boost their performance. First, an evaluation framework called feature meta-ranking is proposed to evaluate ranking measures. The framework is based on a newly proposed Differential Filter Level Performance (DFLP) measure. It was proved that, in ideal cases, the performance of text classifier is a monotonic, non-decreasing function of the number of features. Then we theoretically and empirically validate the effectiveness of DFLP as a meta-ranking measure to evaluate and compare feature ranking methods. The meta-ranking framework is also examined by a stopword extraction problem. We use the framework to select appropriate feature ranking measure for building domain-specific stoplists. The proposed framework is evaluated by SVM and Rocchio text classifiers on six benchmark data. The meta-ranking method suggests that in searching for a proper feature ranking measure, the backward feature ranking is as important as the forward one. Second, we show that the destructive effect of term redundancy gets worse as we decrease the feature ranking threshold. It implies that for aggressive feature selection, an effective redundancy reduction should be performed as well as feature ranking. An algorithm based on extracting term dependency links using an information theoretic inclusion index is proposed to detect and handle term dependencies. The dependency links are visualized by a tree structure called a term dependency tree. By grouping the nodes of the tree into two categories, including hub and link nodes, a heuristic algorithm is proposed to handle the term dependencies by merging or removing the link nodes. The proposed method of redundancy reduction is evaluated by SVM and Rocchio classifiers for four benchmark data sets. According to the results, redundancy reduction is more effective on weak classifiers since they are more sensitive to term redundancies. It also suggests that in those feature ranking methods which compact the information in a small number of features, aggressive feature selection is not recommended. Finally, to deal with class imbalance in feature level using ranking methods, a local feature ranking scheme called reverse discrimination approach is proposed. The proposed method is applied to a highly unbalanced social network discovery problem. In this case study, the problem of learning a social network is translated into a text classification problem using newly proposed actor and relationship modeling. Since social networks are usually sparse structures, the corresponding text classifiers become highly unbalanced. Experimental assessment of the reverse discrimination approach validates the effectiveness of the local feature ranking method to improve the classifier performance when dealing with unbalanced data. The application itself suggests a new approach to learn social structures from textual data
    corecore