463 research outputs found

    Constructive Ontology Engineering

    Get PDF
    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in literature have been used in creating ontologies from various data sources such as structured data in databases or unstructured text found in text documents or HTML documents. Various data mining techniques, natural language processing methods, syntactical analysis, machine learning methods, and other techniques have been used in building ontologies with automated and semi-automated processes. Due to the vast amount of unstructured text and its continued proliferation, the problem of constructing ontologies from text has attracted considerable attention for research. However, the constructed ontologies may be noisy, with missing and incorrect knowledge. Thus ontology construction continues to be a challenging research problem. The goal of this research is to investigate a new method for guiding a process of extracting and assembling candidate terms into domain specific concepts and relationships. The process is part of an overall semi automated system for creating ontologies from unstructured text sources and is driven by the user’s goals in an incremental process. The system applies natural language processing techniques and uses a series of syntactical analysis tools for extracting grammatical relations from a list of text terms representing the parts of speech of a sentence. The extraction process focuses on evaluating the subject predicate-object sequences of the text for potential concept-relation-concept triples to be built into an ontology. Users can guide the system by selecting seedling concept-relation-concept triples to assist building concepts from the extracted domain specific terms. As a result, the ontology building process develops into an incremental one that allows the user to interact with the system, to guide the development of an ontology, and to tailor the ontology for the user’s application needs. The main contribution of this work is the implementation and evaluation of a new semi- automated methodology for constructing domain specific ontologies from unstructured text corpus

    Social Learning Systems: The Design of Evolutionary, Highly Scalable, Socially Curated Knowledge Systems

    Get PDF
    In recent times, great strides have been made towards the advancement of automated reasoning and knowledge management applications, along with their associated methodologies. The introduction of the World Wide Web peaked academicians’ interest in harnessing the power of linked, online documents for the purpose of developing machine learning corpora, providing dynamical knowledge bases for question answering systems, fueling automated entity extraction applications, and performing graph analytic evaluations, such as uncovering the inherent structural semantics of linked pages. Even more recently, substantial attention in the wider computer science and information systems disciplines has been focused on the evolving study of social computing phenomena, primarily those associated with the use, development, and analysis of online social networks (OSN\u27s). This work followed an independent effort to develop an evolutionary knowledge management system, and outlines a model for integrating the wisdom of the crowd into the process of collecting, analyzing, and curating data for dynamical knowledge systems. Throughout, we examine how relational data modeling, automated reasoning, crowdsourcing, and social curation techniques have been exploited to extend the utility of web-based, transactional knowledge management systems, creating a new breed of knowledge-based system in the process: the Social Learning System (SLS). The key questions this work has explored by way of elucidating the SLS model include considerations for 1) how it is possible to unify Web and OSN mining techniques to conform to a versatile, structured, and computationally-efficient ontological framework, and 2) how large-scale knowledge projects may incorporate tiered collaborative editing systems in an effort to elicit knowledge contributions and curation activities from a diverse, participatory audience

    e-DOCSPROS : exploring TEXPROS into e-business era

    Get PDF
    Document processing is a critical element of office automation. TEXPROS (TEXt PROcessing System) is a knowledge-based system designed to manage personal documents. However, as the Internet and e-Business changed the way offices operate, there is a need to re-envision document processing, storage, retrieval, and sharing. In the current environment, people must be able to access documents remotely and to share those documents with others. e-DOCPROS (e-DOCument PROcessing System) is a new document processing system that takes advantage of many of TEXPROS\u27s structures but adapts the system to this new environment. The new system is built to serve e-businesses, takes advantage of Internet protocols, and to give remote access and document sharing. e-DOCPROS meets the challenge to provide wider usage, and eventually will improve the efficiency and effectiveness of office automation. It allows end users to access their data through any Web browser with Internet access, even a wireless network, which will evolutionarily change the way we manage information. The application of e-DOCPROS to e-Business is considered. Four types of business models re considered here. The first is the Business-to-Business (B2B) model, which performs business-to-business transactions through an Extranet. The Extranet consists of multiple Intranets connected via the Internet.The second is the Business-to-Consumer (B2Q model, which performs business-to-consumer transactions through the Internet. The third is the Intranet model, which performs transactions within an organization through the organization\u27s network. The fourth is the Consumer-to-Consumer (C2C) model, which performs consumer-to consumer transactions through the Internet. A triple model is proposed in this dissertation to integrate organization type hierarchy and document type hierarchy together into folder organization. e-DOCPROS introduces new features into TEXPROS to support those four business models and to accommodate the system requirements. Extensible Markup Language (XML), an industrial standard protocol for data exchange, is employed to achieve the goal of information exchange between e-DOCPROS and the other systems, and also among the subsystems within e-DOCPROS. Document Object Model (DOM) specification is followed throughout the implementation of e-DOCPROS to achieve portability. Agent-based Application Service Provider (ASP) implementation is employed in e-DOCPROS system to achieve cost-effectiveness and accessibility

    Semantic Relevance Analysis of Subject-Predicate-Object (SPO) Triples

    Get PDF
    The goal of this thesis is to explore and integrate several existing measurements for ranking the relevance of a set of subject-predicate-object (SPO) triples to a given concept. As we are inundated with information from multiple sources on the World-Wide-Web, SPO similarity measures play a progressively important role in information extraction, information retrieval, document clustering and ontology learning. This thesis is applied in the Cyber Security Domain for identifying and understanding the factors and elements of sociopolitical events relevant to cyberattacks. Our efforts are towards developing an algorithm that begins with an analysis of news articles by taking into account the semantic information and word order information in the SPOs extracted from the articles. The semantic cohesiveness of a user provided concept and the extracted SPOs will then be calculated using semantic similarity measures derived from 1) structured lexical databases; and 2) our own corpus statistics. The use of a lexical database will enable our method to model human common sense knowledge, while the incorporation of our own corpus statistics allows our method to be adaptable to the Cyber Security domain. The model can be extended to other domains by simply changing the local corpus. The integration of different measures will help us triangulate the ranking of SPOs from multiple dimensions of semantic cohesiveness. Our results are compared to rankings gathered from surveys of human users, where each respondent ranks a list of SPO based on their common knowledge and understanding of the relevance evaluations to a given concept. The comparison demonstrates that our integrated SPO similarity ranking scheme closely reflects the human common sense knowledge in a specific domain it addresses

    Collaborative software agents support for the texpros document management system

    Get PDF
    This dissertation investigates the use of active rules that are embedded in markup documents. Active rules are used in a markup representation by integrating Collaborative Software Agents with TEXPROS (abbreviation for TEXt PROcessing System) [Liu and Ng 1996] to create a powerful distributed document management system. Such markup documents with embedded active rules are called Active Documents. For fast retrieval purposes, when we need to generate a customized Internet folder organization, we first define the Folder Organization Query Language (FO-QL) to solve data categorization problems. FO-QL defines the folder organization query process that automatically retrieves links of documents deposited into folders and then constructs a folder organization in either a centralized document repository or multiple distributed document repositories. Traditional documents are stored as static data that do not provide any dynamic capabilities for accessing or interacting with the document environment. The dynamic and distributed nature of both markup data and markup rules do not merely respond to requests for information, but intelligently anticipate, adapt, and actively seek ways to support the computing processes. This outcome feature conquers the static nature of the traditional documents. An Office Automation Definition Language (OADL) with active rules is defined for constructing the TEXPROS \u27s dual modeling approach and workflow events representation. Active Documents are such agent-supported OADL documents. With embedded rules and self-describing data features, Active Documents provide capability of collaborative interactions with software agents. Data transformation and data integration are both data processing problems but little research has focused on the markup documents to generate a versatile folder organization. Some of the research merely provides manual browsing in a document repository to find the right document. This browsing is time consuming and unrealistic, especially in multiple document repositories. With FO-QL, one can create a customized folder organization on demand

    Automatic document classification and extraction system (ADoCES)

    Get PDF
    Document processing is a critical element of office automation. Document image processing begins from the Optical Character Recognition (OCR) phase with complex processing for document classification and extraction. Document classification is a process that classifies an incoming document into a particular predefined document type. Document extraction is a process that extracts information pertinent to the users from the content of a document and assigns the information as the values of the “logical structure” of the document type. Therefore, after document classification and extraction, a paper document will be represented in its digital form instead of its original image file format, which is called a frame instance. A frame instance is an operable and efficient form that can be processed and manipulated during document filing and retrieval. This dissertation describes a system to support a complete procedure, which begins with the scanning of the paper document into the system and ends with the output of an effective digital form of the original document. This is a general-purpose system with “learning” ability and, therefore, it can be adapted easily to many application domains. In this dissertation, the “logical closeness” segmentation method is proposed. A novel representation of document layout structure - Labeled Directed Weighted Graph (LDWG) and a methodology of transforming document segmentation into LDWG representation are described. To find a match between two LDWGs, string representation matching is applied first instead of doing graph comparison directly, which reduces the time necessary to make the comparison. Applying artificial intelligence, the system is able to learn from experiences and build samples of LDWGs to represent each document type. In addition, the concept of frame templates is used for the document logical structure representation. The concept of Document Type Hierarchy (DTH) is also enhanced to express the hierarchical relation over the logical structures existing among the documents

    Measuring Semantic Similarity of Documents by Using Named Entity Recognition Methods

    Get PDF
    The work presented in this thesis was born from the desire to map documents with similar semantic concepts between them. We decided to address this problem as a named entity recognition task, where we have identified key concepts in the texts we use, and we have categorized them. So, we can apply named entity recognition techniques and automatically recognize these key concepts inside other documents. However, we propose the use of a classification method based on the recognition of named entities or key phrases, where the method can detect similarities between key concepts of the texts to be analyzed, and through the use of Poincaré embeddings, the model can associate the existing relationship between these concepts. Thanks to the Poincaré Embeddings’ ability to capture relationships between words, we were able to implement this feature in our classifier. Consequently for each word in a text we check if there are words close to it that are also close to the words that make up the key phrases that we use as Gold Standard. Therefore when detecting potential close words that make up a named entity, the classifier then applies a series of characteristics to classify it. The methodology used performed better than when we only considered the POS structure of the named entities and their n-grams. However, determining the POS structure and the n-grams were important to improve the recognition of named entities in our research. By improving time to recognize similar key phrases between documents, some common tasks in large companies can have a notorious benefit. An important example is the evaluation of resumes, to determine the best professional for a specific position. This task is characterized by consuming a lot of time to find the best profiles for a position, but our contribution in this research work considerably reduces that time, finding the best profiles for a job. Here the experiments are shown considering job descriptions and real resumes, and the methodology used to determine the representation of each of these documents through their key phrases is explained

    A Latent Profile Analysis of Health-related Quality of Life Domains in Cancer Survivors

    Get PDF
    PurposeThe aim of this research was to examine heterogeneity of Health-related Quality of Life (HrQOL) in Cancer Survivors (both undergoing and completed treatment) using latent profile analysis and to determine whether these groups differed by demographic and health characteristics.MethodsParticipants(n=229) recruited through an oncology day ward and outpatient department in a local hospital, completed height, weight and handgrip measures as well as the validated patient generated subjective global assessment and EORTC-QLQ-C30 questionnaires. A latent profile analysis was performed to identify subgroups based on HrQOL domain scores. Multinominal Logistic Regression was conducted to determine the relationship between these subgroups and demographic and health characteristics. ResultsThree latent subtypes were identified: (1)high quality of life(n=122, 52.8%); (2)compromised quality of life(n=79, 34.2%) and (3)low quality of life(n=30, 12.99%). All subtypes scored lower for functioning scales (with the exception of the higher quality of life group for physical, role and emotional functioning) and higher for symptom scales then the reference norm population. There were large clinically meaningful differences between the high quality of life group and the low quality of life group for all HrQOL scales. Those in the low quality of life group were slightly younger than those in the high quality of life group(OR = 0.956, p &lt; .05, CI = 0.917– 0.998). Workers were &gt;7 times more likely to be in low quality of life than the high quality of life group. Compared to the high quality of life group, the odds of belonging to the compromised quality of life group decreased significantly by having higher handgrip strength (OR = .955, p &lt; .05, CI = .924 - .988). The odds of belonging to the low quality of life group increased significantly for those with higher number of nutrition impact symptoms (NIS) (OR = 1.375, p &lt; .05, CI = 1.004 – 1.883).ConclusionsThis is the first study to examine heterogeneity of HrQOL using latent profile analysis in Irish Cancer Survivors. In clinical practice understanding how aspects of HrQOL group together may allow clinicians to better understand and treat cancer survivors, informing more individualised nutrition care.<br/

    A Latent Class Analysis of Nutrition Impact Symptoms in Cancer Survivors

    Get PDF
    Purpose: Those with a cancer diagnosis report experiencing a wide range of nutrition impact symptoms with prevalence varying by study, group and cancer type. We aimed to identify groups of cancer survivors with specific patterns of nutrition impact symptoms.Methods: 229 individuals attending oncology day ward and outpatient clinics completed a series of questionnaires and physical measurements. A latent class analysis was performed to identify subgroups based on 13 nutrition impact symptoms taken from the Patient Generated Subjective Global Assessment Short Form. The identified classes were subsequently compared using analysis of variance and chi-square tests, by sociodemographic, clinical and nutritional variables as well as by Global health status (GHS) and five functioning scales determined using the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ-C30). Results: Three latent subtypes were identified: (1) Fatigue (n=58, 28%); (2) Low Symptom Burden (n=146, 64%) and (3) High Symptom Burden (n=25, 11%). Those in the High Symptom Burden group were more likely to be female, currently receiving any form of treatment and have consumed less food than usual in the last month compared to those in the Low Symptom Burden group. Those in the Fatigue group were more likely were more likely to have reported consuming less food in the previous month and less likely to have reported their food intake to be unchanged than those in the Low Symptom Burden group. Those who received their diagnosis two years+ ago were most likely to be classed in the Fatigue group. The EORTC-QLQ-C30 functioning and GHS scores were all significantly different between the three nutrition impact symptoms classes (p&lt;0.001)Conclusion: This is the first study to examine heterogeneity of nutrition impact symptoms in Irish Cancer Survivors. The findings of this work will inform and allow for more individualised nutrition care.<br/

    Strategies for Managing Linked Enterprise Data

    Get PDF
    Data, information and knowledge become key assets of our 21st century economy. As a result, data and knowledge management become key tasks with regard to sustainable development and business success. Often, knowledge is not explicitly represented residing in the minds of people or scattered among a variety of data sources. Knowledge is inherently associated with semantics that conveys its meaning to a human or machine agent. The Linked Data concept facilitates the semantic integration of heterogeneous data sources. However, we still lack an effective knowledge integration strategy applicable to enterprise scenarios, which balances between large amounts of data stored in legacy information systems and data lakes as well as tailored domain specific ontologies that formally describe real-world concepts. In this thesis we investigate strategies for managing linked enterprise data analyzing how actionable knowledge can be derived from enterprise data leveraging knowledge graphs. Actionable knowledge provides valuable insights, supports decision makers with clear interpretable arguments, and keeps its inference processes explainable. The benefits of employing actionable knowledge and its coherent management strategy span from a holistic semantic representation layer of enterprise data, i.e., representing numerous data sources as one, consistent, and integrated knowledge source, to unified interaction mechanisms with other systems that are able to effectively and efficiently leverage such an actionable knowledge. Several challenges have to be addressed on different conceptual levels pursuing this goal, i.e., means for representing knowledge, semantic data integration of raw data sources and subsequent knowledge extraction, communication interfaces, and implementation. In order to tackle those challenges we present the concept of Enterprise Knowledge Graphs (EKGs), describe their characteristics and advantages compared to existing approaches. We study each challenge with regard to using EKGs and demonstrate their efficiency. In particular, EKGs are able to reduce the semantic data integration effort when processing large-scale heterogeneous datasets. Then, having built a consistent logical integration layer with heterogeneity behind the scenes, EKGs unify query processing and enable effective communication interfaces for other enterprise systems. The achieved results allow us to conclude that strategies for managing linked enterprise data based on EKGs exhibit reasonable performance, comply with enterprise requirements, and ensure integrated data and knowledge management throughout its life cycle
    • …
    corecore