11 research outputs found

    Named Entity Disambiguation using Hierarchical Text Categorization

    Get PDF
    Named entity extraction is an important step in natural language processing. It aims at finding the entities which are present in text such as organizations, places or persons. Named entities extraction is of a paramount importance when it comes to automatic translation as different named entities are translated differently. Named entities are also very useful for advanced search engines which aim at searching for a detailed information regarding a specific entity. Named entity extraction is a difficult problem as it usually requires a disambiguation step as the same word might belong to different named entities depending on the context. This work has been conducted on the ANERCorp named entities database. This Arabic database contains four different named entities: person, organization, location and miscellaneous. The database contains 6099 sentences, out of which 60% are used for training 20% for validation and 20% for testing. Our method for named entity extraction contains two main steps: the first step predicts the list of named entities which are present at the sentence level. The second step predicts the named entity of each word of the sentence. The prediction of the list of named entities at the sentence level is done through separating the document into sentences using punctuation marks. Subsequently, a binary relation between the set of sentences (x) and the set of words (y) is created from the obtained list of sentences. A relation exists between the sentence (x) and the word (y) if, and only if, (x) contains (y). A binary relation is created for each category of named entities (person, organization, location and miscellaneous). If a sentence contains several named entities, it is duplicated in the relation corresponding to each one of them. Our method then extracts keywords from the obtained binary relations using the hyper concept method [1]. This method decomposes the original relation into non-overlapping rectangles and highlights for each rectangle the most representative keyword. The output is a list of keywords sorted in a hierarchical ordering of importance. The obtained keyword list associated with each category of named entities are fed into a random forest classifier of 10000 random trees in order to predict the list of named entities associated with each sentence. The random forest classifier produces for each sentence the list of probabilities corresponding to the existence of each category of named entities within the sentence. Random Forest [sentence(i)] = (P(Person),P(Organization),P(Location),P(miscellaneous)). Subsequently, the sentence is associated with the named entities for which the corresponding probability is larger than a threshold set empirically on the validation set. In the second step, we create a lookup table associating to each word in the database, the list of named entities to which it corresponds in the training set. For unseen sentences of the test set, the list of named entities predicted at the sentence level is produced, and for each word, the list of predicted named entities is also produced using the lookup table previously built. Ultimately, for each word, the intersection between the two predicted lists of named entities (at the sentence and the word level) will give the final predicted named entity. In the case where more than one named entity is produced at this stage, the one with the maximum probability is kept. We obtained an accuracy of 76.58% when only considering lookup tables of named entities produced at the word level. When performing the intersection with the list produced at the sentence level the accuracy reaches 77.96%. In conclusion, the hierarchical named entity extraction leads to improved results over direct extraction. Future work includes the use of other linguist features and larger lookup table in order to improve the results. Validation on other state of the art databases is also considered. Acknowledgements This contribution was made possible by NPRP grant #06-1220-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.qscienc

    Inconsistency detection in Islamic advisory opinions using multilevel text categorization

    No full text
    Inconsistency detection is a large research area that has many applications. In the scope of Islamic content mining, this topic is of a particular interest because of the continuously increasing content and the need of people to find out more about its authenticity. Inconsistency detection is usually performed using linguistic analysis as well as the application of logic rules. We propose here a new method for inconsistency detection based on multilevel text categorization. For each categorization level, discriminative keywords are extracted using the hyper rectangular decomposition method which outputs the keywords in a hierarchical rank of importance. Then, those keywords are fed into the random forest classifier which automatically detects the category of each advisory opinion. Inconsistency detection is performed using an algorithm that detects inconsistent paths of advisory opinions. This study has been validated on a set of Islamic advisory opinions related to vows. The results are very interesting and show that our method is very promising in the field.This contribution was made possible by NPRP grant 06-1220-1-233 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu

    Using conceptual reasoning for inconsistencies detection in islamic advisory opinion (Fatwas)

    No full text
    The Islamic websites play an important role in disseminating Islamic knowledge and information about Islamic ruling. Their number and the content they provide is continuously increasing which require in-depth investigations in content evaluation automation. In this paper, we are proposing the use of conceptual reasoning for detecting inconsistencies in case of Fatwas evaluation. Inconsistencies are detected from propositional logic point-of-view based on Truth table binary relation.Scopu

    Inconsistencies Detection In Islamic Texts Of Law Interpretations ["fatawas"]

    Get PDF
    Islamic web content offers a very convenient way for people to learn more about Islam religion and the correct practices. For instance, via these web sites they could ask for fatwas (Islamic advisory opinion) with more facilities and serenity. Regarding the sensitivity of the subject, large communities of researchers are working on the evaluation of these web sites according to several criteria. In particular there is a huge effort to check the consistency of the content with respect to the Islamic shariaa (or Islamic law). In this work we are proposing a semiautomatic approach for evaluating the web sites Islamic content, in terms of inconsistency detection, composed of the following steps: (i) Domain selection and definition: It consists of identifying the most relevant named entities related to the selected domain as well as their corresponding values or keywords (NEV). At that stage, we have started building the Fatwas ontology by analyzed around 100 fatwas extracted from the online system. (ii) Formal representation of the Islamic content: It consists of representing the content as formal context relating fatwas to NEV. Here, each named entity is split into different attributes in the database where each attribute is associated to a possible instantiation of the named entity. (iii) Rules extraction: by applying the ConImp tools, we extract a set of implications (or rules) reflecting cause-effect relations between NEV. As an extended option aiming to provide more precise analysis, we have proposed the inclusion of negative attributes. For example for word "licit", we may associate "not licit" or "forbidden", for word "recommended" we associated "not recommended", etc. At that stage by using an extension of Galois Connection we are able to find different logical associations in a minimal way by using the same tool ConImp. (iv) Conceptual reasoning: the objective is to detect a possibly inconsistency between the rules and evaluate their relevance. Each rule is mapped to a binary table in a relational database model. By joining obtained tables we are able to detect inconsistencies. We may also check if a new law is not contradicting existing set of laws by mapping the law into a logical expression. By creating a new table corresponding to its negation we have been able to prove automatically its consistencies as soon as we obtain an empty join of the total set of joins. This preliminary study showed that the logical representation of fatwas gives promising results in detecting inconsistencies within fatwa ontology. Future work includes using automatic named entity extraction and automatic transformation of law into a formatted database; we should be able to build a global system for inconsistencies detection for the domain.qscienc

    ConProve: A conceptual prover system

    No full text
    ConProve is an automated prover for propositional logic. It takes, as an input, a set of propositional formulas and proves whether a goal holds or not. ConProve converts each formula to its corresponding Truth Table Binary Relation (TTBR) considered also as a formal context (FC). The objects in FC correspond to all possible formulas interpretations (in terms of their truth value assignments), and the properties in FC correspond to the terms. When the function the 'BuildContext' function, ConProve starts the new goal proving. Firstly, it adds the goal negation to the set of formulas and constructs the formal contexts (FCs) relating formulas to terms. Secondly, it makes the FCs grouping and deduces, based on the conceptual reasoning, if the goal holds. The tool offers a user-friendly interface allowing the editing of the set of formulas as well as the visualization of the reasoning steps. Besides the tool, the paper illustrates the importance of the conceptual reasoning in deriving new conclusions as well as in discovering new, possibly implications by applying the extended Galois Connection.Qatar National Research Fund NPRP 04-1109-1-174.Scopu

    Inference engine based on closure and join operators over Truth Table Binary Relations

    No full text
    Abstract We propose a conceptual reasoning method for an inference engine. Starting from a knowledge base made of decision rules, we first map each rule to its corresponding Truth Table Binary Relation (TTBR), considered as a formal context. Objects in the domain of TTBR correspond to all possible rule interpretations (in terms of their truth value assignments), and elements in the range of TTBR correspond to the attributes. By using the 'natural join' operator in the 'ContextCombine' Algorithm, we combine all truth tables into a global relation which has the advantage of containing the complete knowledge of all deducible rules. By conceptual reasoning using closure operators, from the initial rules we obtain all possible conclusions with respect to the global relation. We may then check if expected goals are among these possible conclusions. We also provide an approximate solution for the exponential growth of the global relation, by proposing modular and cooperative conceptual reasoning. We finally present experimental results for two case studies and discuss the effectiveness of our approach.Qatar National Research Fund NPRP 04-1109-1-174.Scopu
    corecore