228,345 research outputs found

    A Shared Ontology Approach to Semantic Representation of BIM Data

    Get PDF
    Architecture, engineering, construction and facility management (AEC-FM) projects involve a large number of participants that must exchange information and combine their knowledge for successful completion of a project. Currently, most of the AEC-FM domains store their information about a project in text documents or use XML, relational, or object-oriented formats that make information integration difficult. The AEC-FM industry is not taking advantage of the full potential of the Semantic Web for streamlining sharing, connecting, and combining information from different domains. The Semantic Web is designed to solve the information integration problem by creating a web of structured and connected data that can be processed by machines. It allows combining information from different sources with different underlying schemas distributed over the Internet. In the Semantic Web, all data instances and data schema are stored in a graph data store, which makes it easy to merge data from different sources. This paper presents a shared ontology approach to semantic representation of building information. The semantic representation of building information facilitates finding and integrating building information distributed in several knowledge bases. A case study demonstrates the development of a semantic based building design knowledge base

    KnowNet: A proposal for building highly connected and dense knowledge bases from the web

    Get PDF
    This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate nowledge-based Word Sense Disambiguation algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPreprint (author's version

    Empowering Knowledge Bases: a Machine Learning Perspective

    Get PDF
    The construction of Knowledge Bases requires quite often the intervention of knowledge engineering and domain experts, resulting in a time consuming task. Alternative approaches have been developed for building knowledge bases from existing sources of information such as web pages and crowdsourcing; seminal examples are NELL, DBPedia, YAGO and several others. With the goal of building very large sources of knowledge, as recently for the case of Knowledge Graphs, even more complex integration processes have been set up, involving multiple sources of information, human expert intervention, crowdsourcing. Despite signi - cant e orts for making Knowledge Graphs as comprehensive and reliable as possible, they tend to su er of incompleteness and noise, due to the complex building process. Nevertheless, even for highly human curated knowledge bases, cases of incompleteness can be found, for instance with disjointness axioms missing quite often. Machine learning methods have been proposed with the purpose of re ning, enriching, completing and possibly raising potential issues in existing knowledge bases while showing the ability to cope with noise. The talk will concentrate on classes of mostly symbol-based machine learning methods, speci cally focusing on concept learning, rule learning and disjointness axioms learning problems, showing how the developed methods can be exploited for enriching existing knowledge bases. During the talk it will be highlighted as, a key element of the illustrated solutions, is represented by the integration of: background knowledge, deductive reasoning and the evidence coming from the mass of the data. The last part of the talk will be devoted to the presentation of an approach for injecting background knowledge into numeric-based embedding models to be used for predictive tasks on Knowledge Graphs

    The Cognitive Atlas: Employing Interaction Design Processes to Facilitate Collaborative Ontology Creation

    Get PDF
    The Cognitive Atlas is a collaborative knowledge-building project that aims to develop an ontology that characterizes the current conceptual framework among researchers in cognitive science and neuroscience. The project objectives from the beginning focused on usability, simplicity, and utility for end users. Support for Semantic Web technologies was also a priority in order to support interoperability with other neuroscience projects and knowledge bases. Current off-the-shelf semantic web or semantic wiki technologies, however, do not often lend themselves to simple user interaction designs for non-technical researchers and practitioners; the abstract nature and complexity of these systems acts as point of friction for user interaction, inhibiting usability and utility. Instead, we take an alternate interaction design approach driven by user centered design processes rather than a base set of semantic technologies. This paper reviews the initial two rounds of design and development of the Cognitive Atlas system, including interactive design decisions and their implementation as guided by current industry practices for the development of complex interactive systems

    A comparison study on algorithms of detecting long forms for short forms in biomedical text

    Get PDF
    <p>Abstract</p> <p>Motivation</p> <p>With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definitions of acronyms, abbreviations, and symbols in biomedical text. We denote acronyms, abbreviations, and symbols as short forms (SFs) and their corresponding definitions as long forms (LFs). The study was designed to answer the following questions; i) how well a system performs in detecting LFs from novel text, ii) what the coverage is for various terminological knowledge bases in including SFs as synonyms of their LFs, and iii) how to combine results from various SF knowledge bases.</p> <p>Method</p> <p>We evaluated the following three publicly available detection systems in detecting LFs for SFs: i) a handcrafted pattern/rule based system by Ao and Takagi, ALICE, ii) a machine learning system by Chang et al., and iii) a simple alignment-based program by Schwartz and Hearst. In addition, we investigated the conceptual coverage of two terminological knowledge bases: i) the UMLS (the Unified Medical Language System), and ii) the BioThesaurus (a thesaurus of names for all UniProt protein records). We also implemented a web interface that provides a virtual integration of various SF knowledge bases.</p> <p>Results</p> <p>We found that detection systems agree with each other on most cases, and the existing terminological knowledge bases have a good coverage of synonymous relationship for frequently defined LFs. The web interface allows people to detect SF definitions from text and to search several SF knowledge bases.</p> <p>Availability</p> <p>The web site is <url>http://gauss.dbb.georgetown.edu/liblab/SFThesaurus</url>.</p

    Web knowledge bases

    Get PDF
    Knowledge is key to natural language understanding. References to specific people, places and things in text are crucial to resolving ambiguity and extracting meaning. Knowledge Bases (KBs) codify this information for automated systems — enabling applications such as entity-based search and question answering. This thesis explores the idea that sites on the web may act as a KB, even if that is not their primary intent. Dedicated kbs like Wikipedia are a rich source of entity information, but are built and maintained at an ongoing cost in human effort. As a result, they are generally limited in terms of the breadth and depth of knowledge they index about entities. Web knowledge bases offer a distributed solution to the problem of aggregating entity knowledge. Social networks aggregate content about people, news sites describe events with tags for organizations and locations, and a diverse assortment of web directories aggregate statistics and summaries for long-tail entities notable within niche movie, musical and sporting domains. We aim to develop the potential of these resources for both web-centric entity Information Extraction (IE) and structured KB population. We first investigate the problem of Named Entity Linking (NEL), where systems must resolve ambiguous mentions of entities in text to their corresponding node in a structured KB. We demonstrate that entity disambiguation models derived from inbound web links to Wikipedia are able to complement and in some cases completely replace the role of resources typically derived from the KB. Building on this work, we observe that any page on the web which reliably disambiguates inbound web links may act as an aggregation point for entity knowledge. To uncover these resources, we formalize the task of Web Knowledge Base Discovery (KBD) and develop a system to automatically infer the existence of KB-like endpoints on the web. While extending our framework to multiple KBs increases the breadth of available entity knowledge, we must still consolidate references to the same entity across different web KBs. We investigate this task of Cross-KB Coreference Resolution (KB-Coref) and develop models for efficiently clustering coreferent endpoints across web-scale document collections. Finally, assessing the gap between unstructured web knowledge resources and those of a typical KB, we develop a neural machine translation approach which transforms entity knowledge between unstructured textual mentions and traditional KB structures. The web has great potential as a source of entity knowledge. In this thesis we aim to first discover, distill and finally transform this knowledge into forms which will ultimately be useful in downstream language understanding tasks

    Multilingual evaluation of KnowNet

    Get PDF
    Este artículo presenta un nuevo método totalmente automático de construcción de bases de conocimiento muy densas y precisas a partir de recursos semánticos preexistentes. Básicamente, el método usa un algoritmo de Interpretación Semántica de las palabras preciso y de amplia cobertura para asignar el sentido mas apropiado a grandes conjuntos de palabras de un mismo tópico que han sido obtenidas de la web. KnowNet, la base de conocimiento resultante que conecta grandes conjuntos de conceptos semánticamente relacionados es un paso importante hacia la adquisición automática de conocimiento a partir de corpus. De hecho, KnowNet es varias veces mas grande que cualquier otro recurso de conocimiento disponible que codifique relaciones entre sentidos, y el conocimiento que KnowNet contiene supera cualquier otro recurso cuando es empíricamente evaluado en un marco multilingüe común. This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate knowledge-based Word Sense Disambiguation Algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPostprint (published version

    ARCA. Semantic exploration of a bookstore

    Get PDF
    In this demo paper, we present ARCA, a visual-search based system that allows the semantic exploration of a bookstore. Navigating a domain-specific knowledge graph, students and researchers alike can start from any specific concept and reach any other related concept, discovering associated books and information. To achieve this paradigm of interaction we built a prototype system, flexible and adaptable to multiple contexts of use, that extracts semantic information from the contents of a books' corpus, building a dedicated knowledge graph that is linked to external knowledge bases. The web-based user interface of ARCA integrates text-based search, visual knowledge graph navigation, and linear visualization of filtered books (ordered according to multiple criteria) in a comprehensive coordinated view aimed at exploiting the underlying data while avoiding information overload and unnecessary cluttering. A proof-of-concept of ARCA is available online at http://arca.diag.uniroma1.i

    Highlighting relevant concepts from Topic Signatures

    Get PDF
    This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm is the personalized PageRank algorithm implemented in UKB. This new method improves by automatic means the current content of WordNet by creating large volumes of new and accurate semantic relations between synsets. KnowNet was our first attempt towards the acquisition of large volumes of semantic relations. However, KnowNet had some limitations that have been overcomed with deepKnowNet. deepKnowNet disambiguates the first hundred words of all Topic Signatures from the web (TSWEB). In this case, the method highlights the most relevant word senses of each Topic Signature and filter out the ones that are not so related to the topic. In fact, the knowledge it contains outperforms any other resource when is empirically evaluated in a common framework based on a similarity task annotated with human judgementsPostprint (published version

    Integrating Distributed Sources of Information for Construction Cost Estimating using Semantic Web and Semantic Web Service technologies

    Get PDF
    A construction project requires collaboration of several organizations such as owner, designer, contractor, and material supplier organizations. These organizations need to exchange information to enhance their teamwork. Understanding the information received from other organizations requires specialized human resources. Construction cost estimating is one of the processes that requires information from several sources including a building information model (BIM) created by designers, estimating assembly and work item information maintained by contractors, and construction material cost data provided by material suppliers. Currently, it is not easy to integrate the information necessary for cost estimating over the Internet. This paper discusses a new approach to construction cost estimating that uses Semantic Web technology. Semantic Web technology provides an infrastructure and a data modeling format that enables accessing, combining, and sharing information over the Internet in a machine processable format. The estimating approach presented in this paper relies on BIM, estimating knowledge, and construction material cost data expressed in a web ontology language. The approach presented in this paper makes the various sources of estimating data accessible as Simple Protocol and Resource Description Framework Query Language (SPARQL) endpoints or Semantic Web Services. We present an estimating application that integrates distributed information provided by project designers, contractors, and material suppliers for preparing cost estimates. The purpose of this paper is not to fully automate the estimating process but to streamline it by reducing human involvement in repetitive cost estimating activities
    • …
    corecore