6,432 research outputs found

    POS Tagging and its Applications for Mathematics

    Full text link
    Content analysis of scientific publications is a nontrivial task, but a useful and important one for scientific information services. In the Gutenberg era it was a domain of human experts; in the digital age many machine-based methods, e.g., graph analysis tools and machine-learning techniques, have been developed for it. Natural Language Processing (NLP) is a powerful machine-learning approach to semiautomatic speech and language processing, which is also applicable to mathematics. The well established methods of NLP have to be adjusted for the special needs of mathematics, in particular for handling mathematical formulae. We demonstrate a mathematics-aware part of speech tagger and give a short overview about our adaptation of NLP methods for mathematical publications. We show the use of the tools developed for key phrase extraction and classification in the database zbMATH

    Extracting corpus specific knowledge bases from Wikipedia

    Get PDF
    Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval

    Towards a Global Learning Commons: ccLearn

    Get PDF
    Though open educational resources (OER) promise to transform the conditions for teaching and learning worldwide, there are many barriers to the full realization of this vision. Among other things, much of what is currently considered "free and open" is legally, technically, and/or culturally incompatible. Herein, the authors give a brief history of open education, outline some key problems, and offer some possible solutionsThis article was originally published in Educational Technology 4(6). Nov-Dec 2007

    Structure Selection from Streaming Relational Data

    Full text link
    Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster

    Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes

    Get PDF
    In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10

    WikiSense: Supersense Tagging of Wikipedia Named Entities Based WordNet

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    ICONCLASS - Klasifikacijski sustav za umjetnost i ikonografiju

    Get PDF
    Documenting is a crucial activity for any museum or art institution. Today, that importance is growing for the metadata museum provides us with, is essential in retrieving information in the vast amount of data of the modern world. The goal of this study is to discuss the design of thesauri, how they work and what is their purpose in documenting museum objects. It further discusses content indexing together with aboutness, isness and ofness, to draw a parallel with Panofsky’s categories in iconography. The central focus of the work falls onto analyzing Iconclass, its features, and usage. Additionally, it concentrates on new developments in machine learning within artificial intelligence, which use Iconclass to generate and automatize new data and connections. Finally, it gives a brief overview of folksonomy and social tagging.Dokumentiranje je ključna aktivnost svakog muzeja ili umjetničke institucije. Danas ta važnost raste jer metapodaci koje nam muzej pruža igraju bitnu ulogu u pronalaženju informacija u ogromnoj količini podataka suvremenog svijeta. Cilj ovog rada je predstaviti i raspravljati o dizajnu tezaurusa, kako oni rade i koja je njihova svrha u dokumentiranju muzejskih objekata. Nadalje se takodjer predstavlja sadržajnu obradu zajedno s sustinom, postojanoscu i svojstvom (aboutness, isness, ofness) kako bi se usporedila s Panofskijevim kategorijama u ikonografiji. Središnji fokus rada je analiziranje Iconclass-a, njegovih značajki i upotrebe. Osim toga, rad se usredotočuje na nove razvoje u strojnom učenju preko umjetne inteligencije, koji koriste Iconclass za generiranje i automatizaciju novih podataka i veza. Na kraju, daje se kratak pregled folksonomije i socijalnog označavanja

    Contemporary models of indexing and classifying the knowledge on the folksonomy and tagging example as mechanisms of the bottom-up indexing information

    Get PDF
    Artykuł podejmuje kwestie związane ze społecznościowym klasyfikowaniem wiedzy. Autorka zwraca uwagę, iż zmiany społeczno-kulturowe generowane przez nowe technologie implikują redefiniowanie takich terminów jak: wiedza, autorytet i mądrość. W świecie wikinomii stopnie naukowe, afiliacje, czy przynależność do grona tracą swój monopol i autorytatywność w określaniu tego, co stanowi rzetelną, niepodważalną wiedzę. Istotne znaczenie w procesie tworzenia wiedzy odgrywa mechanizm społecznego tagowania treści (social tagging), który nie tylko ułatwia proces klasyfikacji, ale również w znaczący sposób determinuje ich społeczną wartość. Folksonomia rozumiana jako organiczny, oddolny mechanizm klasyfikowania informacji ułatwia porządkowanie internetowych zasobów, stając się sposobem na okiełznanie internetowego chaosu spowodowanego nadmiarem informacji.This article aims to show the transformations taking place within the formation, classification and legitimization of knowledge. The author describes a bottom-up mechanisms for indexing knowledge which we can find in the web space. The population of the Internet users reject the representatives of the objective knowledge – scientists, specialists, gatekeepers, believing rather collective intelligence functioning through knowledge mechanism based on skills and abilities of individuals cooperating with one other. Thanks to the joint actions of individuals, the collective increases the level of knowledge and expertise of its members through the extensive cooperation and debate