28 research outputs found

    SUPPORT EFFECTIVE DISCOVERY MANAGEMENT IN VISUAL ANALYTICS

    Get PDF
    Visual analytics promises to supply analysts with the means necessary to ana- lyze complex datasets and make effective decisions in a timely manner. Although significant progress has been made towards effective data exploration in existing vi- sual analytics systems, few of them provide systematic solutions for managing the vast amounts of discoveries generated in data exploration processes. Analysts have to use off line tools to manually annotate, browse, retrieve, organize, and connect their discoveries. In addition, they have no convenient access to the important discoveries captured by collaborators. As a consequence, the lack of effective discovery manage- ment approaches severely hinders the analysts from utilizing the discoveries to make effective decisions. In response to this challenge, this dissertation aims to support effective discov- ery management in visual analytics. It contributes a general discovery manage- ment framework which achieves its effectiveness surrounding the concept of patterns, namely the results of users’ low-level analytic tasks. Patterns permit construction of discoveries together with users’ mental models and evaluation. Different from the mental models, the categories of patterns that can be discovered from data are pre- dictable and application-independent. In addition, the same set of information is often used to annotate patterns in the same category. Therefore, visual analytics sys- tems can semi-automatically annotate patterns in a formalized format by predicting what should be recorded for patterns in popular categories. Using the formalized an- notations, the framework also enhances the automation and efficiency of a variety of discovery management activities such as discovery browsing, retrieval, organization, association, and sharing. The framework seamlessly integrates them with the visual interactive explorations to support effective decision making. Guided by the discovery management framework, our second contribution lies in proposing a variety of novel discovery management techniques for facilitating the discovery management activities. The proposed techniques and framework are im- plemented in a prototype system, ManyInsights, to facilitate discovery management in multidimensional data exploration. To evaluate the prototype system, two long- term case studies are presented. They investigated how the discovery management techniques worked together to benefit exploratory data analysis and collaborative analysis. The studies allowed us to understand the advantages, the limitations, and design implications of ManyInsights and its underlying framework

    On cross-domain social semantic learning

    Get PDF
    Approximately 2.4 billion people are now connected to the Internet, generating massive amounts of data through laptops, mobile phones, sensors and other electronic devices or gadgets. Not surprisingly then, ninety percent of the world's digital data was created in the last two years. This massive explosion of data provides tremendous opportunity to study, model and improve conceptual and physical systems from which the data is produced. It also permits scientists to test pre-existing hypotheses in various fields with large scale experimental evidence. Thus, developing computational algorithms that automatically explores this data is the holy grail of the current generation of computer scientists. Making sense of this data algorithmically can be a complex process, specifically due to two reasons. Firstly, the data is generated by different devices, capturing different aspects of information and resides in different web resources/ platforms on the Internet. Therefore, even if two pieces of data bear singular conceptual similarity, their generation, format and domain of existence on the web can make them seem considerably dissimilar. Secondly, since humans are social creatures, the data often possesses inherent but murky correlations, primarily caused by the causal nature of direct or indirect social interactions. This drastically alters what algorithms must now achieve, necessitating intelligent comprehension of the underlying social nature and semantic contexts within the disparate domain data and a quantifiable way of transferring knowledge gained from one domain to another. Finally, the data is often encountered as a stream and not as static pages on the Internet. Therefore, we must learn, and re-learn as the stream propagates. The main objective of this dissertation is to develop learning algorithms that can identify specific patterns in one domain of data which can consequently augment predictive performance in another domain. The research explores existence of specific data domains which can function in synergy with another and more importantly, proposes models to quantify the synergetic information transfer among such domains. We include large-scale data from various domains in our study: social media data from Twitter, multimedia video data from YouTube, video search query data from Bing Videos, Natural Language search queries from the web, Internet resources in form of web logs (blogs) and spatio-temporal social trends from Twitter. Our work presents a series of solutions to address the key challenges in cross-domain learning, particularly in the field of social and semantic data. We propose the concept of bridging media from disparate sources by building a common latent topic space, which represents one of the first attempts toward answering sociological problems using cross-domain (social) media. This allows information transfer between social and non-social domains, fostering real-time socially relevant applications. We also engineer a concept network from the semantic web, called semNet, that can assist in identifying concept relations and modeling information granularity for robust natural language search. Further, by studying spatio-temporal patterns in this data, we can discover categorical concepts that stimulate collective attention within user groups.Includes bibliographical references (pages 210-214)

    First IJCAI International Workshop on Graph Structures for Knowledge Representation and Reasoning (GKR@IJCAI'09)

    Get PDF
    International audienceThe development of effective techniques for knowledge representation and reasoning (KRR) is a crucial aspect of successful intelligent systems. Different representation paradigms, as well as their use in dedicated reasoning systems, have been extensively studied in the past. Nevertheless, new challenges, problems, and issues have emerged in the context of knowledge representation in Artificial Intelligence (AI), involving the logical manipulation of increasingly large information sets (see for example Semantic Web, BioInformatics and so on). Improvements in storage capacity and performance of computing infrastructure have also affected the nature of KRR systems, shifting their focus towards representational power and execution performance. Therefore, KRR research is faced with a challenge of developing knowledge representation structures optimized for large scale reasoning. This new generation of KRR systems includes graph-based knowledge representation formalisms such as Bayesian Networks (BNs), Semantic Networks (SNs), Conceptual Graphs (CGs), Formal Concept Analysis (FCA), CPnets, GAI-nets, all of which have been successfully used in a number of applications. The goal of this workshop is to bring together the researchers involved in the development and application of graph-based knowledge representation formalisms and reasoning techniques

    Reconnaissance de l'écriture manuscrite en-ligne par approche combinant systèmes à vastes marges et modèles de Markov cachés

    Get PDF
    Handwriting recognition is one of the leading applications of pattern recognition and machine learning. Despite having some limitations, handwriting recognition systems have been used as an input method of many electronic devices and helps in the automation of many manual tasks requiring processing of handwriting images. In general, a handwriting recognition system comprises three functional components; preprocessing, recognition and post-processing. There have been improvements made within each component in the system. However, to further open the avenues of expanding its applications, specific improvements need to be made in the recognition capability of the system. Hidden Markov Model (HMM) has been the dominant methods of recognition in handwriting recognition in offline and online systems. However, the use of Gaussian observation densities in HMM and representational model for word modeling often does not lead to good classification. Hybrid of Neural Network (NN) and HMM later improves word recognition by taking advantage of NN discriminative property and HMM representational capability. However, the use of NN does not optimize recognition capability as the use of Empirical Risk minimization (ERM) principle in its training leads to poor generalization. In this thesis, we focus on improving the recognition capability of a cursive online handwritten word recognition system by using an emerging method in machine learning, the support vector machine (SVM). We first evaluated SVM in isolated character recognition environment using IRONOFF and UNIPEN character databases. SVM, by its use of principle of structural risk minimization (SRM) have allowed simultaneous optimization of representational and discriminative capability of the character recognizer. We finally demonstrate the various practical issues in using SVM within a hybrid setting with HMM. In addition, we tested the hybrid system on the IRONOFF word database and obtained favourable results.Nos travaux concernent la reconnaissance de l'écriture manuscrite qui est l'un des domaines de prédilection pour la reconnaissance des formes et les algorithmes d'apprentissage. Dans le domaine de l'écriture en-ligne, les applications concernent tous les dispositifs de saisie permettant à un usager de communiquer de façon transparente avec les systèmes d'information. Dans ce cadre, nos travaux apportent une contribution pour proposer une nouvelle architecture de reconnaissance de mots manuscrits sans contrainte de style. Celle-ci se situe dans la famille des approches hybrides locale/globale où le paradigme de la segmentation/reconnaissance va se trouver résolu par la complémentarité d'un système de reconnaissance de type discriminant agissant au niveau caractère et d'un système par approche modèle pour superviser le niveau global. Nos choix se sont portés sur des Séparateurs à Vastes Marges (SVM) pour le classifieur de caractères et sur des algorithmes de programmation dynamique, issus d'une modélisation par Modèles de Markov Cachés (HMM). Cette combinaison SVM/HMM est unique dans le domaine de la reconnaissance de l'écriture manuscrite. Des expérimentations ont été menées, d'abord dans un cadre de reconnaissance de caractères isolés puis sur la base IRONOFF de mots cursifs. Elles ont montré la supériorité des approches SVM par rapport aux solutions à bases de réseaux de neurones à convolutions (Time Delay Neural Network) que nous avions développées précédemment, et leur bon comportement en situation de reconnaissance de mots

    Semantic Interpretation of User Queries for Question Answering on Interlinked Data

    Get PDF
    The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data

    Creating ontology-based metadata by annotation for the semantic web

    Get PDF

    SInCom 2015

    Get PDF
    2nd Baden-WĂĽrttemberg Center of Applied Research Symposium on Information and Communication Systems, SInCom 2015, 13. November 2015 in Konstan

    Theory and practice of the ternary relations model of information management

    Get PDF
    This thesis proposes a new, highly generalised and fundamental, information-modelling framework called the TRM (Ternary Relations Model). The TRM was designed to be a model for converging a number of differing paradigms of information management, some of which are quite isolated. These include areas such as: hypertext navigation; relational databases; semi-structured databases; the Semantic Web; ZigZag and workflow modelling. While many related works model linking by the connection of two ends, the TRM adds a third element to this, thereby enriching the links with associative meanings. The TRM is a formal description of a technique that establishes bi-directional and dynamic node-link structures in which each link is an ordered triple of three other nodes. The key features that makes the TRM distinct from other triple-based models (such as RDF) is the integration of bi-directionality, functional links and simplicity in the definition and elements hierarchy. There are two useful applications of the TRM. Firstly it may be used as a tool for the analysis of information models, to elucidate connections and parallels. Secondly, it may be used as a “construction kit” to build new paradigms and/or applications in information management. The TRM may be used to provide a substrate for building diverse systems, such as adaptive hypertext, schemaless database, query languages, hyperlink models and workflow management systems. It is, however, highly generalised and is by no means limited to these purposes

    Theory and practice of the ternary relations model of information management

    Get PDF
    This thesis proposes a new, highly generalised and fundamental, information-modelling framework called the TRM (Ternary Relations Model). The TRM was designed to be a model for converging a number of differing paradigms of information management, some of which are quite isolated. These include areas such as: hypertext navigation; relational databases; semi-structured databases; the Semantic Web; ZigZag and workflow modelling. While many related works model linking by the connection of two ends, the TRM adds a third element to this, thereby enriching the links with associative meanings. The TRM is a formal description of a technique that establishes bi-directional and dynamic node-link structures in which each link is an ordered triple of three other nodes. The key features that makes the TRM distinct from other triple-based models (such as RDF) is the integration of bi-directionality, functional links and simplicity in the definition and elements hierarchy. There are two useful applications of the TRM. Firstly it may be used as a tool for the analysis of information models, to elucidate connections and parallels. Secondly, it may be used as a “construction kit” to build new paradigms and/or applications in information management. The TRM may be used to provide a substrate for building diverse systems, such as adaptive hypertext, schemaless database, query languages, hyperlink models and workflow management systems. It is, however, highly generalised and is by no means limited to these purposes
    corecore