4,332 research outputs found

    Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

    Get PDF
    The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) – a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs

    Dissimilarity algorithm on conceptual graphs to mine text outliers

    Get PDF
    The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents.As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining.In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs.In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF).In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation.We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators.For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work.Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function.Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs

    Legal compliance by design (LCbD) and through design (LCtD) : preliminary survey

    Get PDF
    1st Workshop on Technologies for Regulatory Compliance co-located with the 30th International Conference on Legal Knowledge and Information Systems (JURIX 2017). The purpose of this paper is twofold: (i) carrying out a preliminary survey of the literature and research projects on Compliance by Design (CbD); and (ii) clarifying the double process of (a) extending business managing techniques to other regulatory fields, and (b) converging trends in legal theory, legal technology and Artificial Intelligence. The paper highlights the connections and differences we found across different domains and proposals. We distinguish three different policydriven types of CbD: (i) business, (ii) regulatory, (iii) and legal. The recent deployment of ethical views, and the implementation of general principles of privacy and data protection lead to the conclusion that, in order to appropriately define legal compliance, Compliance through Design (CtD) should be differentiated from CbD

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    Continuous Process Auditing (CPA): an Audit Rule Ontology Approach to Compliance and Operational Audits

    Get PDF
    Continuous Auditing (CA) has been investigated over time and it is, somewhat, in practice within nancial and transactional auditing as a part of continuous assurance and monitoring. Enterprise Information Systems (EIS) that run their activities in the form of processes require continuous auditing of a process that invokes the action(s) speci ed in the policies and rules in a continuous manner and/or sometimes in real-time. This leads to the question: How much could continuous auditing mimic the actual auditing procedures performed by auditing professionals? We investigate some of these questions through Continuous Process Auditing (CPA) relying on heterogeneous activities of processes in the EIS, as well as detecting exceptions and evidence in current and historic databases to provide audit assurance

    Development of linguistic linked open data resources for collaborative data-intensive research in the language sciences

    Get PDF
    Making diverse data in linguistics and the language sciences open, distributed, and accessible: perspectives from language/language acquistiion researchers and technical LOD (linked open data) researchers. This volume examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives of language researchers and technical LOD (linked open data) researchers. Reporting on both active research needs in the field of language acquisition and technical advances in the development of data interoperability, the book demonstrates the advantages of an international infrastructure for scholarship in the field of language sciences. With contributions by researchers who produce complex data content and scholars involved in both the technology and the conceptual foundations of LLOD (linguistics linked open data), the book focuses on the area of language acquisition because it involves complex and diverse data sets, cross-linguistic analyses, and urgent collaborative research. The contributors discuss a variety of research methods, resources, and infrastructures. Contributors Isabelle Barrière, Nan Bernstein Ratner, Steven Bird, Maria Blume, Ted Caldwell, Christian Chiarcos, Cristina Dye, Suzanne Flynn, Claire Foley, Nancy Ide, Carissa Kang, D. Terence Langendoen, Barbara Lust, Brian MacWhinney, Jonathan Masci, Steven Moran, Antonio Pareja-Lora, Jim Reidy, Oya Y. Rieger, Gary F. Simons, Thorsten Trippel, Kara Warburton, Sue Ellen Wright, Claus Zin

    Managing Information System Integration Technologies--A Study of Text Mined Industry White Papers

    Get PDF
    Industry white papers are increasingly being used to explain the philosophy and operation of a product in marketplace or technology context. This explanation is used by senior managers for strategic planning in an organization. This research explores the effectiveness of white papers and strategies for managers to learn about technologies using white papers. The research is conducted by collecting industry white papers in the area of Information System Integration and gleaned relevant information through text-mining tool, Vantage Point. The text mined information is analyzed to provide solutions for practical problems in systems integration market. The indirect findings of the research are New System Integration Business Models, Methods for Calculating ROI of System Integration Project, and Managing Implementation Failures
    corecore