460 research outputs found

    Extracting fine-grained economic events from business news

    Get PDF
    Based on a recently developed fine-grained event extraction dataset for the economic domain, we present in a pilot study for supervised economic event extraction. We investigate how a state-of-the-art model for event extraction performs on the trigger and argument identification and classification. While F1-scores of above 50{%} are obtained on the task of trigger identification, we observe a large gap in performance compared to results on the benchmark ACE05 dataset. We show that single-token triggers do not provide sufficient discriminative information for a fine-grained event detection setup in a closed domain such as economics, since many classes have a large degree of lexico-semantic and contextual overlap

    Enhanced ontology-based text classification algorithm for structurally organized documents

    Get PDF
    Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC

    Knowledge representation and text mining in biomedical, healthcare, and political domains

    Get PDF
    Knowledge representation and text mining can be employed to discover new knowledge and develop services by using the massive amounts of text gathered by modern information systems. The applied methods should take into account the domain-specific nature of knowledge. This thesis explores knowledge representation and text mining in three application domains. Biomolecular events can be described very precisely and concisely with appropriate representation schemes. Protein–protein interactions are commonly modelled in biological databases as binary relationships, whereas the complex relationships used in text mining are rich in information. The experimental results of this thesis show that complex relationships can be reduced to binary relationships and that it is possible to reconstruct complex relationships from mixtures of linguistically similar relationships. This encourages the extraction of complex relationships from the scientific literature even if binary relationships are required by the application at hand. The experimental results on cross-validation schemes for pair-input data help to understand how existing knowledge regarding dependent instances (such those concerning protein–protein pairs) can be leveraged to improve the generalisation performance estimates of learned models. Healthcare documents and news articles contain knowledge that is more difficult to model than biomolecular events and tend to have larger vocabularies than biomedical scientific articles. This thesis describes an ontology that models patient education documents and their content in order to improve the availability and quality of such documents. The experimental results of this thesis also show that the Recall-Oriented Understudy for Gisting Evaluation measures are a viable option for the automatic evaluation of textual patient record summarisation methods and that the area under the receiver operating characteristic curve can be used in a large-scale sentiment analysis. The sentiment analysis of Reuters news corpora suggests that the Western mainstream media portrays China negatively in politics-related articles but not in general, which provides new evidence to consider in the debate over the image of China in the Western media

    Semantic Frame-based Statistical Approach for Topic Detection

    Get PDF

    Ontological approach to development of computing with words based systems

    Get PDF
    AbstractComputing with words introduced by Zadeh becomes a very important concept in processing of knowledge represented in the form of propositions. Two aspects of this concept – approximation and personalization – are essential to the process of building intelligent systems for human-centric computing.For the last several years, Artificial Intelligence community has used ontology as a means for representing knowledge. Recently, the development of a new Internet paradigm – the Semantic Web – has led to introduction of another form of ontology. It allows for defining concepts, identifying relationships among these concepts, and representing concrete information. In other words, an ontology has become a very powerful way of representing not only information but also its semantics.The paper proposes an application of ontology, in the sense of the Semantic Web, for development of computing with words based systems capable of performing operations on propositions including their semantics. The ontology-based approach is very flexible and provides a rich environment for expressing different types of information including perceptions. It also provides a simple way of personalization of propositions. An architecture of computing with words based system is proposed. A prototype of such a system is described

    VICA, a visual counseling agent for emotional distress

    Get PDF
    We present VICA, a Visual Counseling Agent designed to create an engaging multimedia face-to-face interaction. VICA is a human-friendly agent equipped with high-performance voice conversation designed to help psychologically stressed users, to offload their emotional burden. Such users specifically include non-computer-savvy elderly persons or clients. Our agent builds replies exploiting interlocutor\u2019s utterances expressing such as wishes, obstacles, emotions, etc. Statements asking for confirmation, details, emotional summary, or relations among such expressions are added to the utterances. We claim that VICA is suitable for positive counseling scenarios where multimedia specifically high-performance voice communication is instrumental for even the old or digital divided users to continue dialogue towards their self-awareness. To prove this claim, VICA\u2019s effect is evaluated with respect to a previous text-based counseling agent CRECA and ELIZA including its successors. An experiment involving 14 subjects shows VICA effects as follows: (i) the dialogue continuation (CPS: Conversation-turns Per Session) of VICA for the older half (age > 40) substantially improved 53% to CRECA and 71% to ELIZA. (ii) VICA\u2019s capability to foster peace of mind and other positive feelings was assessed with a very high score of 5 or 6 mostly, out of 7 stages of the Likert scale, again by the older. Compared on average, such capability of VICA for the older is 5.14 while CRECA (all subjects are young students, age < 25) is 4.50, ELIZA is 3.50, and the best of ELIZA\u2019s successors for the older (> 25) is 4.41

    A tree based keyphrase extraction technique for academic literature

    Get PDF
    Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, this thesis proposes a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent, employs limited statistical knowledge, and requires no train data. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final keyphrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with total 244 train and test articles, and compared with other relevant unsupervised techniques by taking the representatives from both statistical (such as Term Frequency-Inverse Document Frequency and YAKE) and graph-based techniques (PositionRank, CollabRank (SingleRank), TopicRank, and MultipartiteRank) into account. Three evaluation metrics, namely precision, recall and F1 score are taken into consideration during the experiments. The obtained results demonstrate the improved performance of the proposed technique over other similar techniques in terms of precision, recall, and F1 scores
    • …
    corecore