18 research outputs found

    Bootstrapping Uncertainty in Schema Covering

    Get PDF
    Schema covering is the process of representing large and complex schemas by easily comprehensible common objects. This task is done by identifying a set of common concepts from a repository called concept repository and generating a cover to describe the schema by the concepts. Traditional schema covering approach has two shortcomings: it does not model the uncertainty in the covering process, and it requires user to state an ambiguity constraint which is hard to define. We remedy this problem by incorporating probabilistic model into schema covering to generate probabilistic schema cover. The integrated probabilities not only enhance the coverage of cover results but also eliminate the need of defining the ambiguity parameter. Experiments on real-datasets show the competitive performance of our approach

    Joint Fusion Learning of Multiple Time Series Prediction

    Get PDF
    Accurate traffic density estimations is essential for numerous purposes like the developing successful transit policies or to forecast future traffic conditions for navigation. Current developments in the machine learning and computer systems bring the transportation industry numerous possibilities to improve their operations using data analyses on traffic flow sensor data . However, even state-of-art algorithms for time series forecasting perform well on some transportation problems, they still fail to solve some critical tasks. In particular, existing traffic flow forecasting methods that are not utilising causality relations between different data sources are still unsatisfying for many real-world applications . In this report, we have focused on a new method named joint fusion learning that uses underlying causality in time series. We test our method in a very detailed synthetic environment that we specially developed to imitate real-world traffic flow dataset. In the end, we use our joint-fusion learning on a historical traffic flow dataset for Thessaloniki, Greece which is published by Hellenic Institute of Transport (HIT) . We obtained better results on the short-term forecasts compared the widely-used benchmarks models that uses single time series to forecast the future

    Probabilistic Schema Covering

    Get PDF
    Schema covering is the process of representing large and complex schemas by easily comprehensible common objects. This task is done by identifying a set of common concepts from a repository called concept repository and generating a cover to describe the schema by the concepts. Traditional schema covering approach has two shortcomings: it does not model the uncertainty in the covering process, and it requires user to state an ambiguity constraint which is hard to define. We remedy this problem by incorporating probabilistic model into schema covering to generate probabilistic schema cover. The integrated probabilities not only enhance the coverage of cover results but also eliminate the need of defining the ambiguity parameter. Both probabilistic schema covering and traditional schema covering run on top of a concept repository. Experiments on real-datasets show the competitive performance of our approach

    Team Integration

    Get PDF
    We leverage theoretical advances and the multi-user nature of \emph{argumentation}. The overall contributions of our work are as follows. We model the schema matching network and the reconciliation process, where we relate the experts' assertions and the constraints of the matching network to an \emph{argumentation framework}. Our representation not only captures the experts' belief and their explanations, but also enables to reason about these captured inputs. On top of this representation, we develop support techniques for experts to detect conflicts in a set of their assertions. Then we guide the conflict resolution by offering two primitives: \emph{conflict-structure interpretation} and \emph{what-if analysis}. While the former presents meaningful interpretations for the conflicts and various heuristic metrics, the latter can greatly help the experts to understand the consequences of their own decisions as well as those of others. Last but not least, we implement an argumentation-based negotiation support tool for schema matching (ArgSM), which realizes our methods to help the experts in the collaborative task

    Crowdsourcing Literal Review

    Get PDF
    Our user feedback framework requires some robust techniques in order to tackle the scalability issue of schema matching network. One approach is employing crowd-sourcing/human computation models. Crowdsourcing is one of cutting-edge research areas which involves human computers to perform pre-defined tasks. In this literal review, we try to explore some certain concepts such as task, work-flow, feedback aggregation, quality control and reward system. We show that there are a lot of aspects which can be integrated into our user feedback framework

    A Survey of Privacy on Data Integration

    Get PDF
    This survey is an integrated view of other surveys on privacy preserving for data integration. First, we review the database context and challenges and research questions. Second, we formulate the privacy problems for schema matching and data matching. Next, we introduce the elements of privacy models. Then, we summarize the existing privacy techniques and the analysis (proofs) of privacy guarantees. Finally, we describe the privacy frameworks and their applications

    Fighting Rumours on Social Media

    Get PDF
    With the advance of social platforms, people are sharing contents in an unprecedented scale. This makes social platforms an ideal place for spreading rumors. As rumors may have negative impacts on the real world, many rumor detection techniques have been proposed. In this proposal, we summarize several works that focus on two important steps of rumor detection. The first step involves detecting controversial events from the data streams which are candidates for rumors. The aim of the second step is to find out the truth values of these events i.e. whether they are rumors or not. Although some techniques are able to achieve state-of-the-art results, they do not cope well with the streaming nature of social platforms. In addition, they usually leverage only one type of information available on social platforms such as only the posts. To overcome these limitations, we propose two research directions that emphasize on 1) detecting rumors in a progressive manner and 2) combining different types of information for better detection

    Managing Quality of Crowdsourced Data

    Get PDF
    The Web is the central medium for discovering knowledge via various sources such as blogs, social media, and wikis. It facilitates access to contents provided by a large number of users, regardless of their geographical locations or cultural backgrounds. Such user-generated content is often referred to as \emph{crowdsourced data}, which provides informational benefit in terms of variety and scale. Yet, the quality of the crowdsourced data is hard to manage, due to the inherent uncertainty and heterogeneity of the Web. In this proposal, we summarize prior work on crowdsourced data that studies quality dimensions and techniques to assess data quality. However, they often lack mechanisms to collect data with high quality guarantee and to improve data quality. To overcome such limitations, we propose a research direction that emphasises on (1) guaranteeing the data quality at collection time, and (2) using expert knowledge to improve data quality for the cases where data is already collected

    Provenance-based Reconciliation In Conflicting Data

    Get PDF
    Data fusion is the process of resolving conflicting data from multiple data sources. As the data sources are inherently heterogenous, there is a need for an expert to resolve the conflicting data. Traditional approach requires the expert to resolve a considerable amount of conflicts in order to acquire a high quality dataset. In this project, we consider how to acquire a high quality dataset while maintaining the expert effort minimal. At first, we achieve this goal by building a model which leverages the provenance of the data in reconciling conflicting data. Secondly, we improve our model by taking the dependency between data sources into account. In the end, we empirically show that our solution can significantly reduce the user effort while it can obtain a high quality dataset in comparison with traditional method

    Reconciling Factor Graph with User Feedback

    Get PDF
    Factor graph is a representative graphical model to handle uncertainty of random variables. Factor graph has been used in various application domains such as named entity recognition, social network analysis, and credibility evaluation. In this paper, we study the problem of reducing uncertainty in factor graph towards reaching a common truth or deterministic information. We propose a pay-as-you-go approach that leverages user feedback for uncertainty reduction. As the availability of human input is often limited, we develop techiniques to identify the most uncertain spots in factor graph for maximizing the benefits of a given user feedback. We demonstrate the efficiency of our techniques on real-world applications
    corecore