72 research outputs found
Process Data Warehouse
Systems and/or methods are presented that can efficiently analyze and summarize large collections of data. A summarization component can employ mapping rules to map received data into specified states and observations of interest, which can be utilized to facilitate creating relational tables that can be utilized to facilitate summarizing a collection of data based in part on predefined summarization criteria. An optimizer component can employ pre-computing and materialization of the process behavior to facilitate optimizing data analysis. An adaptor enhancer component can monitor and evaluate system performance and can generate mapping rules that can facilitate improving system performance
SLFTD: a subjective logic based framework for truth discovery
Finding truth from various conflicting candidate values provided by different data sources is called truth discovery, which is of vital importance in data integration. Several algorithms have been proposed in this area, which usually have similar procedure: iterativly inferring the truth and provider's reliability on providing truth until converge. Therefore, an accurate provider's reliability evaluation is essential. However, no work pays attention to ``how reliable this provider continuously providing truth". Therefore, we introduce subjective logic, which can records both (1) the provider's reliability of generating truth, and (2) reliability of provider continuously doing so. Our proposed methods provides a better evaluation for data providers, and based which, truth are discovered more accurately. Our framework can handle both categorical and numerical data, and can identify truth in either a generative or discriminative way. Experiments on two popular real world datasets, Book and Population, validates that our proposed subjective logic based framework can discover truth much more accurately than state-of-art methods
Efficient Data Fusion using the Tsetlin Machine
We propose a novel way of assessing and fusing noisy dynamic data using a
Tsetlin Machine. Our approach consists in monitoring how explanations in form
of logical clauses that a TM learns changes with possible noise in dynamic
data. This way TM can recognize the noise by lowering weights of previously
learned clauses, or reflect it in the form of new clauses. We also perform a
comprehensive experimental study using notably different datasets that
demonstrated high performance of the proposed approach
Automatic Evaluation of Information Provider Reliablity and Expertise
Q&A social media have gained a lot of attention during the recent years. People rely on these sites to obtain information due to a number of advantages they offer as compared to conventional sources of knowledge (e.g., asynchronous and convenient access). However, for the same question one may find highly contradicting answers, causing an ambiguity with respect to the correct information. This can be attributed to the presence of unreliable and/or non-expert users. These two attributes (reliability and expertise) significantly affect the quality of the answer/information provided. We present a novel approach for estimating these user's characteristics relying on human cognitive traits. In brief, we propose each user to monitor the activity of his peers (on the basis of responses to questions asked by him) and observe their compliance with predefined cognitive models. These observations lead to local assessments that can be further fused to obtain a reliability and expertise consensus for every other user in the social network (SN). For the aggregation part we use subjective logic. To the best of our knowledge this is the first study of this kind in the context of Q&A SNs. Our proposed approach is highly distributed; each user can individually estimate the expertise and the reliability of his peers using his direct interactions with them and our framework. The online SN (OSN), which can be considered as a distributed database, performs continuous data aggregation for users expertise and reliability assesment in order to reach a consensus. In our evaluations, we first emulate a Q&A SN to examine various performance aspects of our algorithm (e.g., convergence time, responsiveness etc.). Our evaluations indicate that it can accurately assess the reliability and the expertise of a user with a small number of samples and can successfully react to the latter's behavior change, provided that the cognitive traits hold in practice. Furthermore, the use of the consensus operator for the aggregation of multiple opinions on a specific user, reduces the uncertainty with regards to the final assessment. However, as real data obtained from Yahoo! Answers imply, the pairwise interactions between specific users are limited. Hence, we consider the aggregate set of questions as posted from the system itself and we assess the expertise and realibility of users based on their response behavior. We observe, that users have different behaviors depending on the level at which we are observing them. In particular, while their activity is focused on a few general categories, yielding them reliable, their microscopic (within general category) activity is highly scattered
Towards Open Domain Chatbots - A GRU Architecture for Data Driven Conversations.
Understanding of textual content, such as topic and intent recognition, is a critical part of chatbots, allowing the chatbot to provide relevant responses. Although successful in several narrow domains, the potential diversity of content in broader and more open domains renders traditional pattern recognition techniques inaccurate. In this paper, we propose a novel deep learning architecture for content recognition that consists of multiple levels of gated recurrent units (GRUs). The architecture is designed to capture complex sentence structure at multiple levels of abstraction, seeking content recognition for very wide domains, through a distributed scalable representation of content. To evaluate our architecture, we have compiled 10 years of questions and answers from a youth information service, 200 083 questions spanning a wide range of content, altogether 289 topics, involving law, health, and social issues. Despite the relatively open domain data set, our architecture is able to accurately categorize the 289 intents and topics. Indeed, it provides roughly an order of magnitude higher accuracy compared to more classical content recognition techniques, such as SVM, Naive Bayes, random forest, and K-nearest neighbor, which all seem to fail on this challenging open domain dataset
Data Credence in IoR: Vision and Challenges
As the Internet of Things permeates every aspect of human life, assessing the credence or integrity of the data generated by "things" becomes a central exercise for making decisions or in auditing events. In this paper, we present a vision of this exercise that includes the notion of data credence, assessing data credence in an efficient manner, and the use of technologies that are on the horizon for the very large scale Internet of Things
- …