155 research outputs found

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    Log file anomaly detection using knowledge graphs and graph neural networks

    Get PDF
    Log files contain valuable information for detecting abnormal behavior. To detect anomalies, researchers have proposed representing log files as knowledge graphs (KGs) and using KG completion (KGC) techniques to predict new facts. However, current research in this area is limited, and there is no end-to-end system that includes both KG generation and KGC for log-based anomaly detection. In this study, we present an end-to-end system that utilizes graph neural networks (GNNs) and KGC to detect anomalies in log files. The proposed system has two main components. The first component employs templates to generate a KG from logs that capture normal behavior. The second component applies KG embedding models enhanced with GNN layers to the generated KG and employs KGC to determine suspiciousness of new information through binary classification. We evaluated the proposed method using two public datasets with standard KGC metrics. The experimental results demonstrate its promising potential

    Metadata for Resource Description on Corporate Intranets

    Get PDF
    Since resource discovery has become a difficult and time-consuming task, some corporations are instituting metadata initiatives to alleviate these problems. This paper reports on an exploratory study of the metadata schemas supporting corporate intranets. The study consists of two parts; an examination of the metadata elements currently in use for corporate intranets and a comparison of these elements to the schema developed by the Dublin Core Metadata Initiative. Schemas were collected from ten such corporations and the aggregate data was examined to uncover what types of elements are likely to be important for resource description on a corporate intranet. The results found that thirteen of the fifteen metadata elements most commonly used by schemas in this study are supported by the Dublin Core. An additional 20 elements not supported by the Dublin Core were also found. These additional elements show how corporations can enhance the Dublin Core to meet the needs of their own intranet

    Semantic data integration for supply chain management: with a specific focus on applications in the semiconductor industry

    Get PDF
    Supply Chain Management (SCM) is essential to monitor, control, and enhance the performance of SCs. Increasing globalization and diversity of Supply Chains (SC)s lead to complex SC structures, limited visibility among SC partners, and challenging collaboration caused by dispersed data silos. Digitalization is responsible for driving and transforming SCs of fundamental sectors such as the semiconductor industry. This is further accelerated due to the inevitable role that semiconductor products play in electronics, IoT, and security systems. Semiconductor SCM is unique as the SC operations exhibit special features, e.g., long production lead times and short product life. Hence, systematic SCM is required to establish information exchange, overcome inefficiency resulting from incompatibility, and adapt to industry-specific challenges. The Semantic Web is designed for linking data and establishing information exchange. Semantic models provide high-level descriptions of the domain that enable interoperability. Semantic data integration consolidates the heterogeneous data into meaningful and valuable information. The main goal of this thesis is to investigate Semantic Web Technologies (SWT) for SCM with a specific focus on applications in the semiconductor industry. As part of SCM, End-to-End SC modeling ensures visibility of SC partners and flows. Existing models are limited in the way they represent operational SC relationships beyond one-to-one structures. The scarcity of empirical data from multiple SC partners hinders the analysis of the impact of supply network partners on each other and the benchmarking of the overall SC performance. In our work, we investigate (i) how semantic models can be used to standardize and benchmark SCs. Moreover, in a volatile and unpredictable environment, SC experts require methodical and efficient approaches to integrate various data sources for informed decision-making regarding SC behavior. Thus, this work addresses (ii) how semantic data integration can help make SCs more efficient and resilient. Moreover, to secure a good position in a competitive market, semiconductor SCs strive to implement operational strategies to control demand variation, i.e., bullwhip, while maintaining sustainable relationships with customers. We examine (iii) how we can apply semantic technologies to specifically support semiconductor SCs. In this thesis, we provide semantic models that integrate, in a standardized way, SC processes, structure, and flows, ensuring both an elaborate understanding of the holistic SCs and including granular operational details. We demonstrate that these models enable the instantiation of a synthetic SC for benchmarking. We contribute with semantic data integration applications to enable interoperability and make SCs more efficient and resilient. Moreover, we leverage ontologies and KGs to implement customer-oriented bullwhip-taming strategies. We create semantic-based approaches intertwined with Artificial Intelligence (AI) algorithms to address semiconductor industry specifics and ensure operational excellence. The results prove that relying on semantic technologies contributes to achieving rigorous and systematic SCM. We deem that better standardization, simulation, benchmarking, and analysis, as elaborated in the contributions, will help master more complex SC scenarios. SCs stakeholders can increasingly understand the domain and thus are better equipped with effective control strategies to restrain disruption accelerators, such as the bullwhip effect. In essence, the proposed Sematic Web Technology-based strategies unlock the potential to increase the efficiency, resilience, and operational excellence of supply networks and the semiconductor SC in particular

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone

    What CIOs and CTOs Need to Know About Big Data and Data-Intensive Computing

    Get PDF
    This paper was completed as part of the final research component in the University of Oregon Applied Information Management Master's Degree Program [see htpp://aim.uoregon.edu].The nature of business computing is changing due to the proliferation of massive data sets referred to as big data, that can be used to produce business analytics (Borkar, Carey, & Li, 2012). This annotated bibliography presents literature published between 2000 and 2012. It provides information to CIOs and CTOs about big data by: (a) identifying business examples, (b) describing the relationship to data-intensive computing, (c) exploring opportunities and limitations, and (d) identifying cost factors
    • …
    corecore