Search CORE

389,955 research outputs found

Corpus-Driven Knowledge Acquisition for Discourse Analysis

Author: Lehnert Wendy
Soderland Stephen
Publication venue
Publication date: 01/01/1994
Field of study

The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher levels as well. In this paper we will show how ML techniques can be used to support knowledge acquisition for information extraction systems. It is often very difficult to specify an explicit domain model for many information extraction applications, and it is always labor intensive to implement hand-coded heuristics for each new domain. We have discovered that it is nevertheless possible to use ML algorithms in order to capture knowledge that is only implicitly present in a representative text corpus. Our work addresses issues traditionally associated with discourse analysis and intersentential inference generation, and demonstrates the utility of ML algorithms at this higher level of language analysis. The benefits of our work address the portability and scalability of information extraction (IE) technologies. When hand-coded heuristics are used to manage discourse analysis in an information extraction system, months of programming effort are easily needed to port a successful IE system to a new domain. We will show how ML algorithms can reduce thisComment: 6 pages, AAAI-9

arXiv.org e-Print Archive

CiteSeerX

Knowledge Extraction from Work Instructions through Text Processing and Analysis

Author: Koneru Abhiram
Publication venue: Clemson University Libraries
Publication date: 01/12/2013
Field of study

The objective of this thesis is to design, develop and implement an automated approach to support processing of historical assembly data to extract useful knowledge about assembly instructions and time studies to facilitate the development of decision support systems, for a large automotive original equipment manufacturer (OEM). At a conceptual level, this research establishes a framework for sustainable and scalable approach to extract knowledge from big data using techniques from Natural Language Processing (NLP) and Machine Learning (ML). Process sheets are text documents that contain detailed instructions to assemble a portion of the vehicle, specification of parts and tools to be used, and time study. To maintain consistency in the authorship process, assembly process sheets are required to be written in a standardized structure using controlled language. To realize this goal, 567 work instructions from 236 process sheets are parsed using Stanford parser using Natural Language Toolkit (NLTK) as a platform and a standard vocabulary consisting of 31 verbs is formed. Time study is the process of estimating assembly times from a predetermined motion time system, known as MTM, based on factors such as the activity performed by the associate, difficulty in assembling, parts and tools used, distance covered. The MTM compromises of a set of tables, constructed through statistical analysis and best-suited for batch production. These MTM tables are suggested based on the activity described in the work instruction text. The process of performing time studies for the process sheets is time consuming, labor intensive and error-prone. A set of (IF AND THEN ) rules are developed, by analyzing 1019 time study steps from 236 process sheets, that guide the user to an appropriate MTM table. These rules are computationally generated by a decision tree algorithm, J48, in WEKA, a machine learning software package. A decision support tool is developed to enable testing of the MTM mapping rules. The tool demonstrates how NLP techniques can be used to read work instructions authored in free-form text and provides MTM table suggestions to the planner. The accuracy of the MTM mapping rules is found to be 84.6%

Clemson University: TigerPrints

Text Analytics for Android Project

Author: Abul Seoud
Anholt
BusinessAnalytics
Butleranalytics
Fagan
Gartner
He
Hettne
Kaklauskas
Kaklauskas
Kamel Boulos
Liew
Machine Learning Market
Marwick
Miner
Mostafa
Nguyen
Piedra
Stackoverflow
Truyens
Witten
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Most advanced text analytics and text mining tasks include text classification, text clustering, building ontology, concept/entity extraction, summarization, deriving patterns within the structured data, production of granular taxonomies, sentiment and emotion analysis, document summarization, entity relation modelling, interpretation of the output. Already existing text analytics and text mining cannot develop text material alternatives (perform a multivariant design), perform multiple criteria analysis, automatically select the most effective variant according to different aspects (citation index of papers (Scopus, ScienceDirect, Google Scholar) and authors (Scopus, ScienceDirect, Google Scholar), Top 25 papers, impact factor of journals, supporting phrases, document name and contents, density of keywords), calculate utility degree and market value. However, the Text Analytics for Android Project can perform the aforementioned functions. To the best of the knowledge herein, these functions have not been previously implemented; thus this is the first attempt to do so. The Text Analytics for Android Project is briefly described in this article

Crossref

University of Huddersfield Repository

Huddersfield Research Portal

Dal testo alla conoscenza e ritorno: estrazione terminologica e annotazione semantica di basi documentali di dominio.

Author: Dell\u27Orletta Felice
Lenci Alessandro
Marchi Simone
Montemagni Simonetta
Pirrelli Vito
Venturi Giulia
Publication venue: Associazione Italiana per la Documentazione Avanzata
Publication date: 01/01/2008
Field of study

The paper focuses on the automatic extraction of domain knowledge from Italian legal texts and presents a fully-implemented ontology learning system (T2K, Text-2-Knowledge) that includes a battery of tools for Natural Language Processing, statistical text analysis and machine learning. Evaluated results show the considerable potential of systems like T2K, exploiting an incremental interleaving of NLP and machine learning techniques for accurate large-scale semi-automatic extraction and structuring of domain-specific knowledge

Archivio della Ricerca - Università di Pisa

PUblication MAnagement

First glances at the transversity parton distribution through dihadron fragmentation functions

Author: Alessandro Bacchetta
Aurore Courtoy
J. C. Collins
Marco Radici
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2011
Field of study

We present first observations of the transversity parton distribution based on an analysis of pion-pair production in deep inelastic scattering off transversely polarized targets. The extraction of transversity relies on the knowledge of dihadron fragmentation functions, which we take from electron-positron annihilation measurements. This is the first attempt to determine the transversity distribution in the framework of collinear factorization.Comment: Minor changes in text and reference

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Open Repository and Bibliography - Liège

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

Author: Khairova N. F.
Lewoniewski Włodzimierz
Mamyrbayev Orken
Mukhsina Kuralay
Petrasova S. V.
Publication venue: MDPI AG, Switzerland
Publication date: 01/01/2018
Field of study

Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities

Directory of Open Access Journals

Electronic National Technical University "Kharkiv Polytechnic Institute" Institutional Repository (eNTUKhPIIR)