131 research outputs found

    Document Clustering based on Topic Maps

    Full text link
    Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality

    The Intuitive Appeal of Explainable Machines

    Get PDF
    Algorithmic decision-making has become synonymous with inexplicable decision-making, but what makes algorithms so difficult to explain? This Article examines what sets machine learning apart from other ways of developing rules for decision-making and the problem these properties pose for explanation. We show that machine learning models can be both inscrutable and nonintuitive and that these are related, but distinct, properties. Calls for explanation have treated these problems as one and the same, but disentangling the two reveals that they demand very different responses. Dealing with inscrutability requires providing a sensible description of the rules; addressing nonintuitiveness requires providing a satisfying explanation for why the rules are what they are. Existing laws like the Fair Credit Reporting Act (FCRA), the Equal Credit Opportunity Act (ECOA), and the General Data Protection Regulation (GDPR), as well as techniques within machine learning, are focused almost entirely on the problem of inscrutability. While such techniques could allow a machine learning system to comply with existing law, doing so may not help if the goal is to assess whether the basis for decision-making is normatively defensible. In most cases, intuition serves as the unacknowledged bridge between a descriptive account and a normative evaluation. But because machine learning is often valued for its ability to uncover statistical relationships that defy intuition, relying on intuition is not a satisfying approach. This Article thus argues for other mechanisms for normative evaluation. To know why the rules are what they are, one must seek explanations of the process behind a model’s development, not just explanations of the model itself

    PMET: Precise Model Editing in a Transformer

    Full text link
    Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET.git.Comment: Preprint. Under revie

    Topic Maps and library and information science : an exploratory study of Topic Maps principles from a Knowledge and Information Organization perspective

    Get PDF
    Purpose: This master thesis attempts to present a ‘state of the art’ of the placement of Topic Maps (ISO13250) in Library and Information Science, through an extensive literature review and a synthesis based on their principles. It was sited from a Knowledge and Information Organization perspective, represented by the work by Elain Svenonius The Intellectual Foundation of Information Organization and some of the concepts of Knowledge Organization. This thesis also intends to present a conceptual and theoretical framework for future research. Design/methodology/approach: The study under review presents a qualitative approach based on Grounded Theory principles to analyse the literature and build the conceptual framework for its analysis. The literature reviewed consisted of more than sixty documents, which included, among others, journal articles, conference presentations and papers, student reports and thesis, as well as a book chapter. Moreover, this was complemented with information obtained from mailing lists, blog postings and websites, and some unstructured interviews. Findings: Topic Maps appears to be a development aligned within the tradition of Knowledge and Information Organization but is completely adapted to the context of the Web and the digital environments. In a LIS perspective, it is bibliographic meta-language able to represent, extend and mostly integrate all the existing Knowledge Organization Systems in a standards-based generic model applicable to digital content and online presentation. Conceptually, Topic Maps is in the borders of the LIS discipline with Knowledge Representation and Computer Science, where LIS conceptual models play the role of intermediaries by providing the ontologies to the ‘bibliographic universe’. Topic Maps questions traditional LIS views and principles. Even though some of them still remain the same, as the meaning-based identification of entities, the notions of ‘document’ and ‘subject’ require further studies. Some important applications give account of the capabilities and potentials for further developments and research on Topic Maps in LIS. The main field of application is the Digital Humanities and TEIcodified texts presentation.Joint Master Degree in Digital Library Learning (DILL

    Topic maps : da sintaxe à semântica

    Get PDF
    Dissertação de doutoramento em Informática.Segundo a definição proposta em Topic Maps Data Model (Garshol and Moore, 2005), Topic Maps são estruturas abstractas que podem codificar o conhecimento, conectando-o com recursos de informação relevantes. Os Topic Maps permitem a estruturação da informação através de uma rede semântica composta por tópicos associados. Actualmente, a maior parte dos Topic Maps são construídos manualmente. Este tipo de edição acarreta custos de ordem temporal e financeira, pois apesar de haver ferramentas propícias para a sua edição, as mesmas perdem eficiência quando o topic map atinge um número considerável de tópicos e associações. Acresce ainda o facto de que o utilizador tem dificuldade em verificar se a semântica do topic map condiz com o seu interesse. Os Topic Maps possuem uma característica muito importante: a liberdade de representação de um universo de discurso, pois a definição de um tópico é muito ampla. Porém, esta liberdade pode representar um potencial perigo para a consistência do topic map. Para garantir essa consistência, um conjunto de condições contextuais (restrições semânticas) deve ser imposto ao topic map. A norma Topic Maps não fornece nenhum tipo de mecanismo para validar a semântica de documentos topic maps de acordo com regras especificadas por utilizadores. Por isso, urge completar a norma com um suporte à definição de restrições contextuais e criar um mecanismo de validação automática. O principal contributo deste doutoramento é uma linguagem de restrições para topic maps, denominada XTche, e o respectivo processador. A linguagem XTche – baseada nos requisitos propostos recentemente em TMCL (Topic Map Constraint Language) (Nishikawa, Moore, and Bogachev, 2004) – permite a descrição da estrutura da rede semântica formada pelos tópicos e associações e a definição de restrições semânticas através de regras de esquema, regras contextuais e regras de existência. Baseado nisto, decidiu-se pelo projecto e desenvolvimento de um ambiente que fosse capaz de extrair dados de recursos de informação e construir um topic map de acordo com uma especificação, validá-lo e permitir uma navegação conceptual sobre o conhecimento representado no topic map. Resultou desta decisão o outro contributo deste doutoramento: o Metamorphosis, que é formado por um conjunto de linguagens de especificação e ferramentas que permitem criar uma interface para integração de informação oriunda de diversas fontes, através do uso de uma ontologia representada em Topic Maps. A partir da descrição das fontes heterogéneas de informação e da especificação da ontologia, o Oveia (um dos componentes do Metamorphosis) extraí automaticamente o respectivo topic map. Depois de guardado – num documento XTM (XML Topic Maps), ou numa base de dados – este topic map será validado sintáctica e semanticamente (face a um conjunto de restrições especificadas numa linguagem apropriada) pelo Processador de XTche (outro dos componentes). Por fim, a componente Ulisses gera uma interface Web para manipular o topic map extraído, a partir da descrição XTM válida. Estas componentes, algumas das quais com implementações alternativas ou mais que uma versão funcional, têm a particularidade de poderem ser usadas separadamente, tal foi comprovado nos casos de estudos realizados.According to Topic Map Data Model (Garshol and Moore, 2005), Topic Maps are abstract structures that can encode knowledge and connect this encoded knowledge to relevant information resources. Topic Maps allow a domain knowledge representation in semantic networks, composed of topics and associations. Nowadays, almost all topic maps are built by hand. This kind of edition is time consuming and has important financial costs. There are several tools for topic map edition but they have some limitations like the lack of a topic map semantic validator. In order to cope with a broad range of scenarios, a topic is a very wide concept. On one hand, this makes Topic Maps a convenient model for knowledge representation; but on the other hand, this can also put in risk the topic map consistency. A set of semantic constraints must be imposed to the topic map in order to grant its consistency. The Topic Maps standard does not provide language constructors to specify the semantics. So it is not possible to derive from the standard mechanisms to validate a topic maps against the contextual rules. Therefore it is necessary to improve the ISO 13250 standard adding a support for constraints definition enabling the creation of a processor for topic map automatic validation. The main contribute of this thesis is a constraint language for topic maps called XTche and its processor. XTche language is TMCL-based (Topic Map Constraint Language) (Nishikawa, Moore, and Bogachev, 2004). This language allows to complement the description of the semantic network structure (composed of topic and associations) with schema, contextual, and existence constraints, thus defining the semantics of topic maps that should be preserved. Metamorphosis – an environment that can extract data from information resources and build a topic map according to a specification, validate it, and generate a conceptual navigation over the topic map knowledge – is another contribution of this thesis. Metamorphosis – a Topic Maps oriented environment – generates conceptual navigators for heterogenous information resources providing the desired interoperability. Metamorphosis’ architecture is composed of: (1) Oveia, a processor that builds topic maps. Its core is a processor that extracts the topics instances from the information resources and builds a topic map. It reads and processes the XSDS and XS4TM specifications. The topic map generated by Oveia is stored as an XTM file or alternatively as a relational database following the OntologyDB approach; (2) XTche processor, that consumes the previous XTM file and validates the topic map according to a set of constraints defined in XTche language; (3) Ulisses processor, that produces a whole semantic website based on a valid topic map; this website is a set of pages that displays all the information concerned with topics and associations and provides a conceptual navigation over the semantic network (the topic map)

    Design of a CMDB with integrated knowledge management based on Topic Maps

    Get PDF
    Configuration management databases have gained popularity in enterprises due to their role in providing efficient IT Resource and Service Management. Enterprises are becoming more competitive through increasing of resource utilization to support their business services. Existing configuration management database implantations are known to have serious problems, introducing security and maintenance issues. They use a centralized approach implemented via a complex logical database model. This complexity reduces the possibility for enterprises to achieve competitive advantage. Apart from this, implementing such a complex model requires time. There is room for a new logical database model. Cfengine’s approach to logical database is not as a traditional inventory, but rather as a knowledgebase semantic web of information that connects various aspects of configuration management. The thesis considers designing of a logical database model, and its topic map model for Cfengine 3, which is a machine-learning approach. The developed model is characterized of being easily manageable, easy to implement, extensible, and optimized for updating proces

    Survey over Existing Query and Transformation Languages

    Get PDF
    A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

    Topic Maps : a bibliometric study

    Get PDF
    Topic Maps is an international standard (ISO/IEC 13250) to describe and encode knowledge structures and associating them with relevant information resources. This thesis seeks to investigate what has been written about Topic Maps from year 2000 to 2011, as well as finding out the research and publication trend in Topic Maps. This study was based on quantitative methodology, which was bibliometric analysis. The data was collected from Scopus and Web of Knowledge databases. Search keywords used are “topic map”, “topic maps” and “ISO/IEC 13250”. A total of 356 publications (265 conference papers, 91 journal articles) from 2001 to 2011 taken into data analysis. The findings revealed that Topic Maps researchers had a preference to present their findings in conference rather than in journal. The authorship pattern was more towards coauthorship. Most researchers were coauthored locally, as international collaboration was very low. Computer science and library and information science related journals were the favourite publishing venue. Majority of the conferences were computer science and education related. The focus of the topic maps was on data integration and interoperability (2001-2004), information theory (2005 – 2008), knowledge and intelligent based system (2009 – 2011). Also, there were five themes identified, namely content management, repository, ontology, information architecture, retrieval and navigation, and semantic web. The future research areas will possibly be collaborative e-learning system, knowledge visualization system, visualization construction, semantic metadata creation from a relational database, knowledge navigation and retrieval improvement, intelligent topic map, distributed knowledge management based on extended topic maps, knowledge service system, knowledge representation modeling, and multi granularity and multi-level knowledge.Joint Master Degree in Digital Library Learning (DILL

    Enriching open-world knowledge graphs with expressive negative statements

    Get PDF
    Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation fin-dings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation fin-dings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.Wissensgraphen über Entitäten und ihre Attribute sind eine wichtige Komponente vieler KI-Anwendungen. Wissensgraphen im Webmaßstab speichern fast nur positive Aussagen und übersehen negative Aussagen. Aufgrund der Unvollständigkeit von Open-World-Wissensgraphen werden fehlende Aussagen als unbekannt und nicht als falsch betrachtet. Diese Dissertation plädiert dafür, Wissensgraphen mit informativen Aussagen anzureichern, die nicht gelten, und so ihren Mehrwert für Anwendungen wie die Beantwortung von Fragen und die Zusammenfassung von Entitäten zu verbessern. Mit potenziell Milliarden negativer Aussagen von Kandidaten bewältigen wir vier Hauptherausforderungen. 1. Korrektheit (oder Plausibilität) negativer Aussagen: Unter der Open-World-Annahme (OWA) reicht es nicht aus, zu prüfen, ob ein negativer Kandidat im Wissensgraphen nicht explizit als positiv angegeben ist, da es sich möglicherweise um eine fehlende Aussage handeln kann. Von entscheidender Bedeutung sind Methoden zur Prüfung großer Kandidatengruppen, und zur Beseitigung falsch positiver Ergebnisse. 2. Bedeutung negativer Aussagen: Die Menge korrekter negativer Aussagen ist sehr groß, aber voller trivialer oder unsinniger Aussagen, z. B. “Eine Katze kann keine Daten speichern.”. Es sind Methoden zur Quantifizierung der Aussagekraft von Negativen erforderlich. 3. Abdeckung der Themen: Abhängig von der Datenquelle und den Methoden zum Abrufen von Kandidaten erhalten einige Themen oder Entitäten in demWissensgraphen möglicherweise keine negativen Kandidaten. Methoden müssen die Fähigkeit gewährleisten, Negative über fast jede bestehende Entität zu entdecken. 4. Komplexe negative Aussagen: In manchen Fällen erfordert das Ausdrücken einer Negation mehr als ein Wissensgraphen-Tripel. Beispielsweise ist “Einstein hat keine Ausbildung erhalten” eine inkorrekte Negation, aber “Einstein hat keine Ausbildung an einer US-amerikanischen Universität erhalten” ist korrekt. Es werden Methoden zur Erzeugung komplexer Negationen benötigt. Diese Dissertation geht diese Herausforderungen wie folgt an. 1. Wir plädieren zunächst für die selektive Materialisierung negativer Aussagen über Entitäten in enzyklopädischen (gut kanonisierten) Open-World-Wissensgraphen, und definieren formal drei Arten negativer Aussagen: fundiert, universell abwesend und konditionierte negative Aussagen. Wir stellen die Peer-basierte Negationsinferenz-Methode vor, um Listen hervorstechender Negationen über Entitäten zu erstellen. Die Methode berechnet relevante Peers für eine bestimmte Eingabeentität und verwendet ihre positiven Eigenschaften, um Erwartungen für die Eingabeentität festzulegen. Eine Erwartung, die nicht erfüllt ist, ist ein unmittelbar negativer Kandidat und wird dann anhand von Häufigkeits-, Wichtigkeits- und Unerwartetheitsmetriken bewertet. 2. Wir schlagen die Methode musterbasierte Abfrageprotokollextraktion vor, um hervorstechende Negationen aus umfangreichen Textquellen zu extrahieren. Diese Methode extrahiert hervorstechende Negationen über eine Entität, indem sie große Korpora, z.B., die Anfrageprotokolle von Suchmaschinen, unter Verwendung einiger handgefertigter Muster mit negativen Schlüsselwörtern sammelt. 3. Wir führen die UnCommonsense-Methode ein, um hervorstechende negative Phrasen über alltägliche Konzepte in weniger kanonisierten commonsense-KGs zu generieren. Diese Methode ist für die Negationsinferenz, Prüfung und Einstufung kurzer Phrasen in natürlicher Sprache konzipiert. Sie berechnet vergleichbare Konzepte für ein bestimmtes Zielkonzept, leitet aus dem Vergleich ihrer positiven Kandidaten Negationen ab, und prüft diese Kandidaten im Vergleich zum Wissensgraphen selbst, sowie mit Sprachmodellen (LMs) als externer Wissensquelle. Schließlich werden die Kandidaten mithilfe semantischer Ähnlichkeitserkennungshäufigkeitsmaßen eingestuft. 4. Um die Exploration unserer Methoden und ihrer Ergebnisse zu erleichtern, implementieren wir zwei Prototypensysteme. In Wikinegata wird ein System zur Präsentation der Peer-basierten Methode entwickelt, mit dem Benutzer negative Aussagen über 500K Entitäten aus 11 Klassen untersuchen und verschiedene Parameter der Peer-basierten Inferenzmethode anpassen können. Sie können den Wissensgraphen auch mithilfe einer Suchmaske mit negierten Prädikaten befragen. Im UnCommonsense-System können Benutzer genau prüfen, was die Methode bei jedem Schritt hervorbringt, sowie Negationen zu 8K alltäglichen Konzepten durchsuchen. Darüber hinaus erstellen wir mithilfe der Peer-basierten Negationsinferenzmethode den ersten groß angelegten Datensatz zu Demografie und Ausreißern in Interessengemeinschaften und zeigen dessen Nützlichkeit in Anwendungsfällen wie der Identifizierung unterrepräsentierter Gruppen. 5. Wir veröffentlichen alle in diesen Projekten erstellten Datensätze und Quellcodes unter https://www.mpi-inf.mpg.de/negation-in-kbs und https://www.mpi-inf.mpg.de/Uncommonsense
    corecore