1,708 research outputs found

    Fuzzy rule based profiling approach for enterprise information seeking and retrieval

    Get PDF
    With the exponential growth of information available on the Internet and various organisational intranets there is a need for profile based information seeking and retrieval (IS&R) systems. These systems should be able to support users with their context-aware information needs. This paper presents a new approach for enterprise IS&R systems using fuzzy logic to develop task, user and document profiles to model user information seeking behaviour. Relevance feedback was captured from real users engaged in IS&R tasks. The feedback was used to develop a linear regression model for predicting document relevancy based on implicit relevance indicators. Fuzzy relevance profiles were created using Term Frequency and Inverse Document Frequency (TF/IDF) analysis for the successful user queries. Fuzzy rule based summarisation was used to integrate the three profiles into a unified index reflecting the semantic weight of the query terms related to the task, user and document. The unified index was used to select the most relevant documents and experts related to the query topic. The overall performance of the system was evaluated based on standard precision and recall metrics which show significant improvements in retrieving relevant documents in response to user queries

    Automating data preparation with statistical analysis

    Get PDF
    Data preparation is the process of transforming raw data into a clean and consumable format. It is widely known as the bottleneck to extract value and insights from data, due to the number of possible tasks in the pipeline and factors that can largely affect the results, such as human expertise, application scenarios, and solution methodology. Researchers and practitioners devised a great variety of techniques and tools over the decades, while many of them still place a significant burden on humanā€™s side to configure the suitable input rules and parameters. In this thesis, with the goal of reducing human manual effort, we explore using the power of statistical analysis techniques to automate three subtasks in the data preparation pipeline: data enrichment, error detection, and entity matching. Statistical analysis is the process of discovering underlying patterns and trends from data and deducing properties of an underlying distribution of probability from a sample, for example, testing hypotheses and deriving estimates. We first discuss CrawlEnrich, which automatically figures out the queries for data enrichment via web API data, by estimating the potential benefit of issuing a certain query. Then we study how to derive reusable error detection configuration rules from a web table corpus, so that end-users get results with no efforts. Finally, we introduce AutoML-EM, aiming to automate the entity matching model development process. Entity matching is to find the identical entities in real-world. Our work provides powerful angles to automate the process of various data preparation steps, and we conclude this thesis by discussing future directions

    An Adaptive Fuzzy Based Recommender System For Enterprise Search

    Get PDF

    i3MAGE: Incremental, Interactive, Inter-Model Mapping Generation

    Full text link
    Data integration is a highly important prerequisite for most enterprise data analyses. While hard in general, a particular concern is about human effort for designing a global integration schema, authoring queries against that schema, and creating mappings to connect data sources with the global schema. Ontology-based data integration (OBDI), which employs ontologies as a target model, reduces the effort for schema design and usage. On the other side, it requires mappings that are particularly difficult to create. Architects who work with OBDI hence need systems to support the process of mapping development. One key type of tooling to support mapping development is automatic or semi-automatic generation of mapping suggestions. While many such tools exist in the wider sphere of data integration, few are built to work in the case of OBDI, where the inter-model gap between relational input schemata and a target ontology has to be bridged. Among those that support OBDI at all, none so far are fully optimized for this specific case by performing a truly inter-model matching while also leveraging distinct but corresponding aspects of both models. We propose i3MAGE, an approach and a system for automatic and semi-automatic generation of mappings in OBDI. The system is built on generic inter-model matching, and it is optimized in various ways for matching relational source schemata to target ontology schemata. To be truly semi-automatic in every respect, i3MAGE works both incrementally, building mappings pay-as-you-go, and interactively in exchange with a human user. We introduce a specialized benchmark and evaluate i3MAGE against a number of other approaches. In addition, we provide examples, where i3MAGE can be deployed in holistic data integration environments

    Content And Multimedia Database Management Systems

    Get PDF
    A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ā€˜database approachā€™ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objectsā€™ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems

    Direct Manipulation Querying of Database Systems.

    Full text link
    Database systems are tremendously powerful and useful, as evidenced by their popularity in modern business. Unfortunately, for non-expert users, to use a database is still a daunting task due to its poor usability. This PhD dissertation examines stages in the information seeking process and proposes techniques to help users interact with the database through direct manipulation, which has been proven a natural interaction paradigm. For the first stage of information seeking, query formulation, we proposed a spreadsheet algebra upon which a direct manipulation interface for database querying can be built. We developed a spreadsheet algebra that is powerful (capable of expressing at least all single-block SQL queries) and can be intuitively implemented in a spreadsheet. In addition, we proposed assisted querying by browsing, where we help users query the database through browsing. For the second stage, result review, instead of asking users to review possibly many results in a flat table, we proposed a hierarchical navigation scheme that allows users to browse the results through representatives with easy drill-down and filtering capabilities. We proposed an efficient tree-based method for generating the representatives. For the query refinement stage, we proposed and implemented a provenance-based automatic refinement framework. Users label a set of output tuples and our framework produces a ranked list of changes that best improve the query. This dissertation significantly lowers the barrier for non-expert users and reduces the effort for expert users to use a database.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86282/1/binliu_1.pd

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources

    Get PDF
    The manual construction of formal domain conceptualizations (ontologies) is labor-intensive. Ontology learning, by contrast, provides (semi-)automatic ontology generation from input data such as domain text. This thesis proposes a novel approach for learning labels of non-taxonomic ontology relations. It combines corpus-based techniques with reasoning on Semantic Web data. Corpus-based methods apply vector space similarity of verbs co-occurring with labeled and unlabeled relations to calculate relation label suggestions from a set of candidates. A meta ontology in combination with Semantic Web sources such as DBpedia and OpenCyc allows reasoning to improve the suggested labels. An extensive formal evaluation demonstrates the superior accuracy of the presented hybrid approach
    • ā€¦
    corecore