298,321 research outputs found

    Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    Get PDF
    Background. Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider. com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/ chemlist

    On document filing based upon predicates

    Get PDF
    This dissertation presents a formal approach to modeling documents in a personal office environment, proposes a heterogeneous algebraic query language to manipulating objects (folders) in the document model, and investigates a predicate-driven document filing system for automatically filing documents. The document model was initially proposed in [38] which adopts a very natural view for describing the office documents using the relational and object-oriented paradigms. The model employs a dual approach to classifying and categorizing office documents by defining both a document type hierarchy and a folder organization. This dissertation extends and specifies formally the document model. Documents are partitioned into different classes, each document class being represented by frame template which describes the properties of the documents of the class. A particular office document, summarized from the view point of its frame template, yields a synopsis of the document which is called frame instances. Frame instances are grouped into a folder on the basis of user-defined criteria, specified as predicates, which determine whether a frame instance belongs to a folder. Folders, each of which is a heterogeneous set of frame instances, can be naturally organized into a folder organization. The folder organization specifying the document filing view is then defined using predicates and a directed graph. However, some operators in the algebraic query language [38] do not support the heterogeneous property. This dissertation proposes an algebra-based query language that gives full support to this heterogeneous property. We investigate the construction problem of a folder organization: does it allow a user to add a new folder with an arbitrary local predicate? Given a folder organization, creating a new folder with arbitrarily defined predicate may cause two abnormalities: inapplicable edges (filing paths) and redundant folders. To deal such abnormalities in the process of constructing a folder organization, the concept of predicate consistency is discussed and an algorithm is proposed for determining whether the predicate of a new folder is consistent with the existing folder organization. The global predicate of a folder governs the content of the folder. However, the predicates of folders (that is, global predicates) do not uniquely specify a folder organization. Then, we investigate the reconstruction problem: under what circumstance can we uniquely recover the folder organization from its global predicates? The problem is solved in terms of graph-theoretic concepts such as associated digraphs, transitive closure, and redundant/non-redundant filing paths. A transitive closure inversion algorithm is then presented which efficiently recovers a folder organization digraph from its associated digraph. After defining a folder organization, we can file a frame instance into the folder organization. A document filing algorithm describes the procedure of filing a frame instance. However, the critical issue of the algorithm is how to evaluate whether a frame instance satisfies the predicate of a folder in a folder organization. In order to solve this issue, a thesaurus, an association dictionary and a knowledge base are then introduced. The thesaurus specifies the association relationship among the key terms that are actually residing in the system and terms that are used by users. An association dictionary gives the association relationship between an attribute of a predicate and a frame template defined in a folder organization. A knowledge base represents background knowledge in a certain application domain

    Structuring Knowledge Bases with AI and Machine Learning

    Get PDF
    In today's world, it is no longer enough for a company to have a better product or service than its competitors to survive and grow. A customer-obsessed attitude is required for businesses to survive and grow. As we all know, competitors can typically quickly duplicate any new market position and even do it better than the organization that started the idea. The more business knowledge your team members have about the products and services your consumers use, the more successful your company will be. You, as an organization, should be swift in responding to your customers' demands to react to the competitive changes in the market. One of the key tools needed for a quick response to this effect is the Knowledge Base System (KBS). This tool can be an internal tool for your employees or an external tool for your customers. This will support the decision-making process, information sharing, products, services, etc. Most organizations have this tool but are not well structured. There is no single correct way to build a knowledge base, but there are multiple methods, each with its own set of advantages and quirks. However, if you follow some basic guidelines, you can be sure that your customers or employees will not get lost in the process. The most basic content format in the knowledge base is an article with text. However, it can include screenshots, photos, videos, audio, and infographics. We can further implement a knowledge-based system with artificial intelligence (AI), which gives room for more productivity in an organization. One thing that constitutes or destroys your knowledge base is its structure. Just like a dictionary won't serve its purpose unless it’s organized alphabetically, a cluttered or disorganized knowledge base will confuse your customers and your employees rather than lead them to a solution. You can convert knowledge base articles into FAQs, product manuals, troubleshooting guides, etc. A knowledge-based system might be a game changer for your organization if you want to make your clients happy. In this article, I will walk through the objectives, scope, strategy, and all you should keep in mind when you're building a knowledge base system for your organization. Keywords: Artificial intelligence (AI), knowledge, information, decision-making, and organizational DOI: 10.7176/IKM/13-1-03 Publication date: January 31st 202

    Interlingual Lexical Organisation for Multilingual Lexical Databases in NADIA

    Full text link
    We propose a lexical organisation for multilingual lexical databases (MLDB). This organisation is based on acceptions (word-senses). We detail this lexical organisation and show a mock-up built to experiment with it. We also present our current work in defining and prototyping a specialised system for the management of acception-based MLDB. Keywords: multilingual lexical database, acception, linguistic structure.Comment: 5 pages, Macintosh Postscript, published in COLING-94, pp. 278-28

    Creativity out of chaos

    Get PDF
    Creativity is said to be highly desired in post-modern and post-industrial organizations Creativity and anarchy on the one hand, and managerialism, on the other, can be seen as different forms of knowledge, two opposed ideals. In many organizational as well as societal reforms we currently observe it is the managerialist ideal that wins over the anarchic. In this paper, we wonder if people fear anarchy? We reflect on the possible reasons for the fear, and we also try to explain why we believe that anarchic organizing should not be avoided or feared
    • …
    corecore