315,636 research outputs found

    Kaon-nucleon scattering lengths from kaonic deuterium experiments

    Get PDF
    The extraction of the S-wave kaon-nucleon scattering lengths a0 and a1 from a combined analysis of existing kaonic hydrogen and synthetic deuterium data has been carried out within the framework of a low-energy effective field theory. It turns out that with the present DEAR central values for the kaonic hydrogen ground-state energy and width, a solution for a0 and a1 exists only in a restricted domain of input values for the kaon-deuteron scattering length. Consequently, measuring this scattering length imposes stringent constraints on the theoretical description of the kaon-deuteron interactions at low energies.Comment: 9 pages, 2 postscript figure

    Relational Data Mining Through Extraction of Representative Exemplars

    Full text link
    With the growing interest on Network Analysis, Relational Data Mining is becoming an emphasized domain of Data Mining. This paper addresses the problem of extracting representative elements from a relational dataset. After defining the notion of degree of representativeness, computed using the Borda aggregation procedure, we present the extraction of exemplars which are the representative elements of the dataset. We use these concepts to build a network on the dataset. We expose the main properties of these notions and we propose two typical applications of our framework. The first application consists in resuming and structuring a set of binary images and the second in mining co-authoring relation in a research team

    Visual querying and analysis of large software repositories

    Get PDF
    We present a software framework for mining software repositories. Our extensible framework enables the integration of data extraction from repositories with data analysis and interactive visualization. We demonstrate the applicability of the framework by presenting several case studies performed on industry-size software repositories. In each study we use the framework to give answers to one or several software engineering questions addressing a specific project. Next, we validate the answers by comparing them with existing project documentation, by interviewing domain experts and by detailed analyses of the source code. The results show that our framework can be used both for supporting case studies on mining software repository techniques and for building end-user tools for software maintenanc

    A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain

    Get PDF
    The paper presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. Performance is evaluated via the Gold Standard method. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries.Relation Extraction performance benefits from a syntactic based definition of relation extraction patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive NLP modules relating to word-sense disambiguation, negation detection and noun phrase validation, together with controlled thesaurus expansion.The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven relation extraction rules for the recognition of semantic relationships from phrases of unstructured text. The semantic annotations have proven capable of supporting semantic query, document study and cross-searching via the ontology framework

    Document Layout Analysis and Recognition Systems

    Get PDF
    Automatic extraction of relevant knowledge to domain-specific questions from Optical Character Recognition (OCR) documents is critical for developing intelligent systems, such as document search engines, sentiment analysis, and information retrieval, since hands-on knowledge extraction by a domain expert with a large volume of documents is intensive, unscalable, and time-consuming. There have been a number of studies that have automatically extracted relevant knowledge from OCR documents, such as ABBY and Sandford Natural Language Processing (NLP). Despite the progress, there are still limitations yet-to-be solved. For instance, NLP often fails to analyze a large document. In this thesis, we propose a knowledge extraction framework, which takes domain-specific questions as input and provides the most relevant sentence/paragraph to the given questions in the document. Overall, our proposed framework has two phases. First, an OCR document is reconstructed into a semi-structured document (a document with hierarchical structure of (sub)sections and paragraphs). Then, relevant sentence/paragraph for a given question is identified from the reconstructed semi structured document. Specifically, we proposed (1) a method that converts an OCR document into a semi structured document using text attributes such as font size, font height, and boldface (in Chapter 2), (2) an image-based machine learning method that extracts Table of Contents (TOC) to provide an overall structure of the document (in Chapter 3), (3) a document texture-based deep learning method (DoT-Net) that classifies types of blocks such as text, image, and table (in Chapter 4), and (4) a Question & Answer (Q&A) system that retrieves most relevant sentence/paragraph for a domain-specific question. A large number of document intelligent systems can benefit from our proposed automatic knowledge extraction system to construct a Q&A system for OCR documents. Our Q&A system has applied to extract domain specific information from business contracts at GE Power

    Local Binary Patterns in Focal-Plane Processing. Analysis and Applications

    Get PDF
    Feature extraction is the part of pattern recognition, where the sensor data is transformed into a more suitable form for the machine to interpret. The purpose of this step is also to reduce the amount of information passed to the next stages of the system, and to preserve the essential information in the view of discriminating the data into different classes. For instance, in the case of image analysis the actual image intensities are vulnerable to various environmental effects, such as lighting changes and the feature extraction can be used as means for detecting features, which are invariant to certain types of illumination changes. Finally, classification tries to make decisions based on the previously transformed data. The main focus of this thesis is on developing new methods for the embedded feature extraction based on local non-parametric image descriptors. Also, feature analysis is carried out for the selected image features. Low-level Local Binary Pattern (LBP) based features are in a main role in the analysis. In the embedded domain, the pattern recognition system must usually meet strict performance constraints, such as high speed, compact size and low power consumption. The characteristics of the final system can be seen as a trade-off between these metrics, which is largely affected by the decisions made during the implementation phase. The implementation alternatives of the LBP based feature extraction are explored in the embedded domain in the context of focal-plane vision processors. In particular, the thesis demonstrates the LBP extraction with MIPA4k massively parallel focal-plane processor IC. Also higher level processing is incorporated to this framework, by means of a framework for implementing a single chip face recognition system. Furthermore, a new method for determining optical flow based on LBPs, designed in particular to the embedded domain is presented. Inspired by some of the principles observed through the feature analysis of the Local Binary Patterns, an extension to the well known non-parametric rank transform is proposed, and its performance is evaluated in face recognition experiments with a standard dataset. Finally, an a priori model where the LBPs are seen as combinations of n-tuples is also presentedSiirretty Doriast

    Syntactic and Semantic Patterns of Domain-specific Multiword Units in Marine Accident Investigation Reports

    Get PDF
    The present study is a systematic corpus-based investigation of the domain-specific multiword units (henceforth MWUs) in marine accident investigation reports (henceforth MAIR), with a view to characterizing their most prominent syntactic, semantic and functional features. To achieve these principal objectives, the target MWUs were first identified by applying a new approach, which incorporates the notion of ‘meaning’ into statistical-based measures. This method ensures the domain-specific MWU extraction to the largest extent and provides valid data for the subsequent analysis. Through proposing a three-dimensional analytical framework, this study has obtained the following findings: First, the domain-specific MWUs are largely composed of two-word sequences, while the occurrences of 4- and 5-word MWUs are relatively rare. Among all the target MWUs, only 1.10% of the expressions occur very commonly within the genre (˚1,000 times). By contrast, the majority of the expressions (70.97%) occur with the frequency less than 100 times. The skewed distribution indicates that MAIR genre tends to employ a wide variety of domain-specific MWUs rather than repetition of a small number of common expressions. Second, in terms of the syntactic features of the domain-specific MWUs, NP structure is the most commonly employed grammatical type. The abundant use of this structure implies that the domain-specific meaning of MAIR genre is largely carried in the nominal group. Apart from NP structure, there is also a marked prevalence of VP structures among the domain-specific MWUs in MAIR genre and these MWUs present structural variation. Of all the VP-based patterns, the ‘verb phrase with active verb’ pattern stands out since it incorporates a large number of action verbs, which are used to describe the actions done by people. The wide use of these phrases implies that MAIR genre tends to highlight the people’s roles during the accidents, with particular attention to the information about what or who caused or performed the activity. Similarly, PP structures were also frequently adopted by the domain-specific MWUs, especially the pattern beginning with preposition of. This pattern was mostly used to specify possessions. It thus can be inferred that the information that provided in MAIR genre tends to be concrete and specific. Third, by conducting a functional analysis of the target MWUs, it was found that the primary function of the domain-specific MWUs is to express referential meanings and contribute to the thematic development. Furthermore, due to their multifunctional nature, some referential MWUs also perform the function of stance and discourse organizing. When expressing stance, most MWUs express impersonal epistemic stance, with the purpose of minimizing the imposition of the reporters’ opinions. Other word sequences appear to be deontic in nature, as they are mainly realized by the MWUs incorporating with require or modal verbs. The primary function of these MWUs is to set out the obligations and issue suggestions for the agents according to certain norms and regulations. When functioning as discourse organizer, the domain-specific MWUs usually adopt the pattern of ‘that-clause controlled by main verbs in active voice’ to introduce the topics. Unlikely, when using for elaborating the topics, they tend to clarify the logical relationships, especially the causative-resultative relation, rather than providing additional information in MAIR genre. Fourth, the distinctive semantic features of the domain-specific MWUs can be best reflected when these MWUs perform the functions of activity identification and specification. For instance, most domain-specific MWUs used for describing activities are of general nature, but they convey specialized meaning in MAIR genre. Similarly, when domain-specific MWUs are used to provide tangible or intangible frames for specifying certain attributes, the use of these MWUs in MAIR genre is significantly deviant from their use in general English register. In all, by gaining insights into the salient features of the domain-specific MWUs in MAIR genre, the present study may make contributions and implications in the following aspects: the construction of extraction method for domain-specific MWUs, the compilation of maritime-specific MWU list, the teaching and learning of maritime English, especially the maritime-specific MWUs, and providing reference for writing MAIR to the experts who are from non-native English speaking countries.Abstract i List of Tables v List of Figures vii Chapter 1 Introduction 1 1.1. Background of this study 1 1.2. Objectives of this study 3 1.3. Significance of this study 4 1.4. Terminological issues 5 1.5. Organization of this dissertation 6 Chapter 2 Theoretical background 8 2.1. Understanding the notions of phraseology 8 2.2.1. An overview of influential notions of phraseology 9 2.1.2. Parameters of defining MWUs 13 2.1.3. Operational definition of MWUs 17 2.1.4. An overview of influential taxonomy of phraseology 19 2.2. Theoretical discussion of MWUs 23 2.2.1. Theoretical framework of this study 23 2.2.2. Nature of multiword units 25 2.2.3. Previous studies of phraseology 29 Chapter 3 Analytical framework and research design 37 3.1. Analytical framework 37 3.1.1 Analytical framework for syntactic features of domain-specific MWUs 38 3.1.2. Analytical framework for semantic features of domain-specific MWUs 40 3.1.3. Analytical framework for functional features of domain-specificMWUs 42 3.2. Research questions 43 3.3. Corpora used in this study 44 3.3.1. Corpus of Marine Accident Investigation Reports (COMAIR) 44 3.3.2. British National Corpus Baby (BNC Baby) 47 3.4. Tools and procedures for data analysis 48 3.4.1. Tools for data processing 48 3.4.2. Procedures for data analysis 49 3.4.3. Inter-rater reliability 50 3.5. Summary 51 Chapter 4 Identification of domain-specific MWUs in the COMAIR 52 4.1. Current approaches to MWU extraction 52 4.2. My proposed approach to domain-specific MWU extraction 53 4.3. The detailed process of domain-specific MWU extraction 55 4.3.1. Step 1: N-gram retrieval 55 4.3.2. Step 2: Keyword-gram extraction 56 4.3.3. Step 3: Measuring the association strength of keyword-grams 58 4.3.4. Step 4: Filtering out process 66 4.3.5. Step 5: Domain-specific MWU identification 70 Chapter 5 Frequency distributions and syntactic features of domain-specific MWUs 72 5.1. Frequency distributions of domain-specific MWUs 72 5.1.1. Frequency distributions of domain-specific MWUs in various lengths 72 5.1.2. Overall frequency distribution across different frequency bands 74 5.2. Syntactic features of domain-specific MWUs 76 Chapter 6 Functional and semantic features of domain-specific MWUs 80 6.1. Distributions across primary discourse functions 80 6.2. Multiple functioning 82 6.3. Stance MWUs 84 6.3.1. Notion of stance MWUs 84 6.3.2. Stance MWUs in COMAIR 84 6.4. Discourse organizing MWUs 90 6.4.1. Notion of discourse organizing MWUs 90 6.4.2. Discourse organizing MWUs in COMAIR 90 6.5. Referential MWUs 96 6.5.1. Notion of referential MWUs 97 6.5.2. Referential MWUs in COMAIR 97 6.6. Summary 112 Chapter 7 Conclusions and implications 113 7.1. Summary of the major findings 113 7.2. Implications of this study 116 7.3. Limitations of this study 117 References 118 Appendix 132Docto
    corecore