939 research outputs found

    Developing conceptual glossaries for the Latin vulgate bible.

    Get PDF
    A conceptual glossary is a textual reference work that combines the features of a thesaurus and an index verborum. In it, the word occurrences within a given text are classified, disambiguated, and indexed according to their membership of a set of conceptual (i.e. semantic) fields. Since 1994, we have been working towards building a set of conceptual glossaries for the Latin Vulgate Bible. So far, we have published a conceptual glossary to the Gospel according to John and are at present completing the analysis of the Gospel according to Mark and the minor epistles. This paper describes the background to our project and outlines the steps by which the glossaries are developed within a relational database framework

    A Survey on Important Aspects of Information Retrieval

    Get PDF
    Information retrieval has become an important field of study and research under computer science due to the explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive survey discussing not only the emergence and evolution of information retrieval but also include different information retrieval models and some important aspects such as document representation, similarity measure and query expansion

    Compound key word generation from document databases using a hierarchical clustering art model

    Get PDF
    The growing availability of databases on the information highways motivates the development of new processing tools able to deal with a heterogeneous and changing information environment. A highly desirable feature of data processing systems handling this type of information is the ability to automatically extract its own key words. In this paper we address the specific problem of creating semantic term associations from a text database. The proposed method uses a hierarchical model made up of Fuzzy Adaptive Resonance Theory (ART) neural networks. First, the system uses several Fuzzy ART modules to cluster isolated words into semantic classes, starting from the database raw text. Next, this knowledge is used together with coocurrence information to extract semantically meaningful term associations. These associations are asymmetric and one-to-many due to the polisemy phenomenon. The strength of the associations between words can be measured numerically. Besides this, they implicitly define a hierarchy between descriptors. The underlying algorithm is appropriate for employment on large databases. The operation of the system is illustrated on several real databases

    An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text

    Get PDF
    Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods

    CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

    Get PDF
    Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE

    Topic and language specific internet search engine

    Get PDF
    In this paper we present the result of our project that aims to build a categorization-based topic-oriented Internet search engine. Particularly, we focus on the economic related electronic materials available on the Internet in Hungarian. We present our search service that harvests, stores and makes searchable the publicly available contents of the subject domain. The paper describes the search facilities and the structure of the implemented system with special emphasis on intelligent search algorithms and document processing methods

    Bridging the semantic gap in content-based image retrieval.

    Get PDF
    To manage large image databases, Content-Based Image Retrieval (CBIR) emerged as a new research subject. CBIR involves the development of automated methods to use visual features in searching and retrieving. Unfortunately, the performance of most CBIR systems is inherently constrained by the low-level visual features because they cannot adequately express the user\u27s high-level concepts. This is known as the semantic gap problem. This dissertation introduces a new approach to CBIR that attempts to bridge the semantic gap. Our approach includes four components. The first one learns a multi-modal thesaurus that associates low-level visual profiles with high-level keywords. This is accomplished through image segmentation, feature extraction, and clustering of image regions. The second component uses the thesaurus to annotate images in an unsupervised way. This is accomplished through fuzzy membership functions to label new regions based on their proximity to the profiles in the thesaurus. The third component consists of an efficient and effective method for fusing the retrieval results from the multi-modal features. Our method is based on learning and adapting fuzzy membership functions to the distribution of the features\u27 distances and assigning a degree of worthiness to each feature. The fourth component provides the user with the option to perform hybrid querying and query expansion. This allows the enrichment of a visual query with textual data extracted from the automatically labeled images in the database. The four components are integrated into a complete CBIR system that can run in three different and complementary modes. The first mode allows the user to query using an example image. The second mode allows the user to specify positive and/or negative sample regions that should or should not be included in the retrieved images. The third mode uses a Graphical Text Interface to allow the user to browse the database interactively using a combination of low-level features and high-level concepts. The proposed system and ail of its components and modes are implemented and validated using a large data collection for accuracy, performance, and improvement over traditional CBIR techniques

    Conceptual Representations for Computational Concept Creation

    Get PDF
    Computational creativity seeks to understand computational mechanisms that can be characterized as creative. The creation of new concepts is a central challenge for any creative system. In this article, we outline different approaches to computational concept creation and then review conceptual representations relevant to concept creation, and therefore to computational creativity. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. Additionally, conceptual representations used in particular creative domains, such as language, music, image and emotion, are reviewed separately. For every representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.Peer reviewe

    A Novel Approach for Text Classification

    Get PDF
    Abstract Text Classification (TC) is the process of associating text documents with the classes considered most appropriate, thereby distinguishing topics such as particle physics from optical physics. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. In this paper, a new Fuzzy Similarity Based Concept Mining Model (FSCMM) is proposed to classify a set of text documents into pre -defined Category Groups (CG) by providing them training and preparing on the sentence, document and integrated corpora levels along with feature reduction, ambiguity removal on each level to achieve high system performance. Fuzzy Feature Category Similarity Analyzer (FFCSA) is used to analyze each extracted feature of Integrated Corpora Feature Vector (ICFV) with the corresponding categories or classes. This model uses Support Vector Machine Classifier (SVMC) to classify correctly the training data patterns into two groups; i. e., + 1 and -1, thereby producing accurate and correct results. The proposed model works efficiently and effectively with great performance and high -accuracy results

    Kategorisasi Dokumen Teks secara Hierarkis dengan Fuzzy Relational Thesaurus (FRT)

    Get PDF
    ABSTRAKSI: Pada hakekatnya sebuah dokumen teks merupakan suatu jenis basis data yang tidak terstruktur, karena tidak memiliki field-field seperti halnya basis data konvensional. Berbedanya topik sebuah dokumen dengan dokumen yang lain bisa diartikan bahwa dokumen-dokumen tersebut menyimpan informasi yang berbedaKategorisasi dokumen text merupakan upaya untuk mengelompokan dokumen kedalam kelompok-kelompok yang sudah terdefinisi. Saat ini sudah banyak metode kategorisasi/klasifikasi dokumen seperti classifier K-Nearest Neighbor (KNN), Bayesian Classifier, Decision Tree dan metode Rocchio yang biasa digunakan utuk permasalahan klasifikasi.Permasalahan kategorisasi secara hirarkis sebenarnya lebih banyak ditemui dalam dunia nyata, seperti penyimpanan file-file digital dalam folder-folder yang biasanya tersusun hirarkis. Salah satu implementasi kategorisasi ini adalah dengan memanfaatkan Fuzzy Relational Thesaurus (FRT) sebagai struktur hirarki kelas dalam kategorisasi.Tugas akhir ini mengimplementasikan metode klasifikasi dokumen teks yang menggunakan FRT sebagai hirarki topiknya dan memanfaatkan metoda Rocchio sebagai metode bantuan pembentuk klassifiernya. Hasil pengujian menunjukan proses training untuk memilih fitur terbaik dalam metode FRT bisa menghasilkan classifier yang lebih baik dari classifier metode Rocchio.Kata Kunci : kategorisasi teks hirarkis, Metode Rocchio, Fuzzy Relational ThesaurusABSTRACT: Intrinsically a text document is a kind of unstructured data base because it doesn’t has fields such as conventional database. Difference in topic of a document with another document mean that those documents contain different information.Text categorization is task to assign document into a predefined set of category. Nowadays there are a lot of text categorization method such as K-Nearest Neighbor (KNN), Bayesian Classifier, Decision Tree and Rocchio method aplicable to classification problem.Hierarchical categorization problems are found a lot in the real life, such as storing digital files into a hierarchically structured folders. One of implementation in categorization is by utilize Fuzzy Relational Thesaurus (FRT) as a class hierarchy structure in categorization.This final project has implemented a text classification method that uses FRT as topic hierarchy and Rocchio method as an assist method to build its classifier. Test result showed that training process to select the best feature in FRT method could produce a better classifier than Rocchio method’s.Keyword: hierarchical text categorization, Roccio method, Fuzzy Relational Thesaurus
    • …
    corecore