93 research outputs found

    Knowledge organization systems in mathematics and in libraries

    Get PDF
    Based on the project activities planned in the context of the Specialized Information Service for Mathematics (TIB Hannover, FAU Erlangen, L3S, SUB Göttingen) we give an overview over the history and interplay of subject cataloguing in libraries, the development of computerized methods for metadata processing and the rise of the Semantic Web. We survey various knowledge organization systems such as the Mathematics Subject Classification, the German Authority File, the clustering International Authority File VIAF, and lexical databases such as WordNet and their potential use for mathematics in education and research. We briefly address the difference between thesauri and ontologies and the relations they typically contain from a linguistic perspective. We will then discuss with the audience how the current efforts to represent and handle mathematical theories as semantic objects can help deflect the decline of semantic resource annotation in libraries that has been predicted by some due to the existence of highly performant retrieval algorithms (based on statistical, neuronal, or other big data methods). We will also explore the potential characteristics of a fruitful symbiosis between carefully cultivated kernels of semantic structure and automated methods in order to scale those structures up to the level that is necessary in order to cope with the amounts of digital data found in libraries and in (mathematical) research (e.g., in simulations) today

    Tradition and Technology: A Design-Based Prototype of an Online Ginan Semantization Tool

    Get PDF
    The heritage of ginans of the Nizari Ismaili community comprises over 1,000 individual hymn-like poems of varying lengths and languages. The ginans were originally composed to spread the teachings of the Satpanth Ismaili faith and served as scriptural texts that guided the normative understanding of the community in South Asia. The emotive melodies of the ginans continue to enchant the members of the community in the diaspora who do not necessarily understand the language of the ginans. The language of the ginans is mixed and borrows vocabulary from Indo-Aryan and Perso-Arabic dialects. With deliberate and purposeful use of information technology, the online tool blends the Western best practices of language learning with the traditional transmission methods and materials of the Ismaili community. This study is based on the premise that for the teachings of the ginans to survive in the Euro-American diaspora, the successive generations must learn and understand the vocabulary of the ginans. The process through which humans learn and master vocabulary is called semantization, which refers to the process of learning and understand various senses and uses of words in a language. To this end, a sample ginan corpus was chosen and semantically analyzed to develop an online ginan lexicon. This lexicon was then used to enrich ginan texts with online glosses to facilitate semantization of ginan vocabulary. The design based-research methodology for prototyping the tool comprised two design iterations of analysis, design, and review. In the first iteration, the initial design of the prototype was based on the multidisciplinary literature review and an in-depth semantic analysis of ginan materials. The initial design was then reviewed by community ginan experts and teachers to inform the next design iteration. In the second design iteration, the initial design was enhanced into a functional prototype by adding features based on the expert suggestions as well as the needs of community learners gathered by surveying a convenience sample of 515 community members across the globe. The analysis of the survey data revealed that over 90% of the survey participants preferred English materials for learning and understanding the language of the ginans. In addition, having online access to ginan materials was expressed as a dire need for the community to engage with the ginans. The development and dissemination of curriculum-based educational programs and supporting resources for the ginans emerged as the most urgent and unmet expectations of the community. The study also confirmed that the wide availability of an online ginan learning tool, such as the one designed in this study, is highly desirable by English-speaking community members who want to learn and understand the tradition and teachings of ginans. However, such a tool is only a part of the solution for fostering sustainable community engagement for the preservation of ginans. To ensure that the tradition is carried forward by the future generations with compassion and understanding, the community institutions must make ginans an educational priority and ensure educational resources for ginans are widely available to community members

    Semantic approaches to domain template construction and opinion mining from natural language

    Get PDF
    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology. Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results

    Semantic approaches to domain template construction and opinion mining from natural language

    Get PDF
    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology.\ud Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results

    An Overview On Web Scraping Techniques And Tools

    Get PDF
    From the evolution of WWW, the scenario of internet user and data exchange is fastly changes. As common people join the internet and start to use it, lots of new techniques are promoted to boost up the network. At the same time, to enhance computers and network facility new technologies were introduces which results into automatically decreasing in cost of hardware and website�s related costs. Due to all these changes, large number of users are joined and use the internet facilities. Daily use of internet cose in to a tremendous data is available on internet. Business, academician, researchers all are share their advertisements, information on internet so that they can be connected to people fastly and easily. As a result of exchange, share and store data on internet, a new problem is arise that how to handle such data overload and how the user will get or access the best information in least efforts. To solve this issues, researcher spotout new technique called Web Scraping. Web scraping is very imperative technique which is used to generate structured data on the basis of available unstructured data on the web. Scaping generated structured data then stored in central database and analyze in spreadsheets. Traditional copy-and-paste, Text grapping and regular expression matching, HTTP programming, HTML parsing, DOM parsing, Webscraping software, Vertical aggregation platforms, Semantic annotation recognizing and Computer vision web-page analyzers are some of the common techniques used for data scraping. Previously most user uses the common copy-pest technique for gathering and analyzing data on the internet, but it is a tedious technique where lot of data copied by the user and store on computer files. As compared to this technique web scraping software is easiest scraping technique. Now a days, there are lots of software are available in the market for web scraping. Our paper is focused on the overview on the information extraction technique i.e. web scraping, different techniques of web scraping and some of the recent tools used for a web scraping

    From point cloud to BIM: a survey of existing approaches

    Get PDF
    International audienceIn order to handle more efficiently projects of restoration, documentation and maintenance of historical buildings, it is essentialto rely on a 3D enriched model for the building. Today, the concept of Building Information Modelling (BIM) is widely adoptedfor the semantization of digital mockups and few research focused on the value of this concept in the field of cultural heritage.In addition historical buildings are already built, so it is necessary to develop a performing approach, based on a first step ofbuilding survey, to develop a semantically enriched digital model. For these reasons, this paper focuses on this chain startingwith a point cloud and leading to the well-structured final BIM; and proposes an analysis and a survey of existing approacheson the topics of: acquisition, segmentation and BIM creation. It also, presents a critical analysis on the application of this chainin the field of cultural heritag

    Review of the “ as-buit BIM ” approaches

    Get PDF
    International audienceToday, we need 3D models of heritage buildings in order to handle more efficiently projects of restoration, documentation and maintenance. In this context, developing a performing approach, based on a first phase of building survey, is a necessary step in order to build a semantically enriched digital model. For this purpose, the Building Information Modeling is an efficient tool for storing and exchanging knowledge about buildings. In order to create such a model, there are three fundamental steps: acquisition, segmentation and modeling. For these reasons, it is essential to understand and analyze this entire chain that leads to a well- structured and enriched 3D digital model. This paper proposes a survey and an analysis of the existing approaches on these topics and tries to define a new approach of semantic structuring taking into account the complexity of this chain

    Detecting New Word Meanings: A Comparison of Word Embedding Models in Spanish

    Full text link
    Semantic neologisms (SN) are defined as words that acquire a new word meaning while maintaining their form. Given the nature of this kind of neologisms, the task of identifying these new word meanings is currently performed manually by specialists at observatories of neology. To detect SN in a semi-automatic way, we developed a system that implements a combination of the following strategies: topic modeling, keyword extraction, and word sense disambiguation. The role of topic modeling is to detect the themes that are treated in the input text. Themes within a text give clues about the particular meaning of the words that are used, for example: viral has one meaning in the context of computer science (CS) and another when talking about health. To extract keywords, we used TextRank with POS tag filtering. With this method, we can obtain relevant words that are already part of the Spanish lexicon. We use a deep learning model to determine if a given keyword could have a new meaning. Embeddings that are different from all the known meanings (or topics) indicate that a word might be a valid SN candidate. In this study, we examine the following word embedding models: Word2Vec, Sense2Vec, and FastText. The models were trained with equivalent parameters using Wikipedia in Spanish as corpora. Then we used a list of words and their concordances (obtained from our database of neologisms) to show the different embeddings that each model yields. Finally, we present a comparison of these outcomes with the concordances of each word to show how we can determine if a word could be a valid candidate for SN.Comment: 16 pages, 3 figure
    corecore