755 research outputs found

    High level synthesis of RDF queries for graph analytics

    Get PDF
    In this paper we present a set of techniques that enable the synthesis of efficient custom accelerators for memory intensive, irregular applications. To address the challenges of irregular applications (large memory footprint, unpredictable fine-grained data accesses, and high synchronization intensity), and exploit their opportunities (thread level parallelism, memory level parallelism), we propose a novel accelerator design that employs an adaptive and Distributed Controller (DC) architecture, and a Memory Interface Controller (MIC) that supports concurrent and atomic memory operations on a multi-ported/multi-banked shared memory. Among the multitude of algorithms that may benefit from our solution, we focus on the acceleration of graph analytics applications and, in particular, on the synthesis of SPARQL queries on Resource Description Framework (RDF) databases. We achieve this objective by incorporating the synthesis techniques into Bambu, an Open Source high-level synthesis tools, and interfacing it with GEMS, the Graph database Engine for Multithreaded Systems. The GEMS' front-end generates optimized C implementations of the input queries, modeled as graph pattern matching algorithms, which are then automatically synthesized by Bambu. We validate our approach by synthesizing several SPARQL queries from the Lehigh University Benchmark (LUBM)

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Heterogeneous Multi-Layered Network Model for Omics Data Integration and Analysis

    Get PDF
    Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven successful in integrating diverse biological data for the representation of the hierarchy of biological system. The HMLN provides unparalleled opportunities but imposes new computational challenges on establishing causal genotype-phenotype associations and understanding environmental impact on organisms. In this review, we focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We first discuss the properties of biological HMLN. Then we survey four categories of state-of-the-art methods (matrix factorization, random walk, knowledge graph, and deep learning). Thirdly, we demonstrate their applications to omics data integration and analysis. Finally, we outline strategies for future directions in the development of new HMLN models

    Leveraging structure for learning representations of words, sentences and knowledge bases

    Get PDF
    This thesis presents work on learning representations of text and Knowledge Bases by taking into consideration their respective structures. The tasks for which the methods are developed and evaluated on are: Short-text classification, Word Sense Induction and Disambiguation, Knowledge Base Completion with linked text corpora, and large-scale Knowledge Base Question Answering. An introductory chapter states the aims and scope of the thesis, followed by a chapter on technical background and definitions. In chapter 3, the impact of dependency syntax on word representation learning in the context of short-text classification is investigated. A new definition of context in dependency graphs is proposed, which generalizes and extends previous definitions used in word representation learning. The resulting word and dependency feature embeddings are used together to represent dependency graph substructures in text classifiers. In chapter 4, a probabilistic latent variable model for Word Sense Induction and Disambiguation is presented. The model estimates sense clusters using pretrained continuous feature vectors of multiple context types: syntactic, local lexical and global lexical, while the number of sense clusters is determined by the Integrated Complete Likelihood criterion. A model for Knowledge Base Completion with linked text corpora is presented in chapter 5. The proposed model represents potential facts by merging subgraphs of the knowledge base with text through linked entities. The model learns to embed the merged graphs into a lower dimensional space and score the plausibility of the fact with a Multilayer Perceptron. Chapter 6 presents a system for Question Answering on Knowledge Bases. The system learns to decompose questions into entity and relation mentions and score their compatibility with queries on the knowledge base expressed as subgraphs. The model consists of several components trained jointly in order to match parts of the question with parts of a potential query by embedding their corresponding structures in lower dimensional spaces

    The 4th Conference of PhD Students in Computer Science

    Get PDF

    A Survey of Source Code Search: A 3-Dimensional Perspective

    Full text link
    (Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.Comment: submitted to ACM Transactions on Software Engineering and Methodolog

    A Semantic Framework for Declarative and Procedural Knowledge

    Get PDF
    In any scientic domain, the full set of data and programs has reached an-ome status, i.e. it has grown massively. The original article on the Semantic Web describes the evolution of a Web of actionable information, i.e.\ud information derived from data through a semantic theory for interpreting the symbols. In a Semantic Web, methodologies are studied for describing, managing and analyzing both resources (domain knowledge) and applications (operational knowledge) - without any restriction on what and where they\ud are respectively suitable and available in the Web - as well as for realizing automatic and semantic-driven work\ud ows of Web applications elaborating Web resources.\ud This thesis attempts to provide a synthesis among Semantic Web technologies, Ontology Research, Knowledge and Work\ud ow Management. Such a synthesis is represented by Resourceome, a Web-based framework consisting of two components which strictly interact with each other: an ontology-based and domain-independent knowledge manager system (Resourceome KMS) - relying on a knowledge model where resource and operational knowledge are contextualized in any domain - and a semantic-driven work ow editor, manager and agent-based execution system (Resourceome WMS).\ud The Resourceome KMS and the Resourceome WMS are exploited in order to realize semantic-driven formulations of work\ud ows, where activities are semantically linked to any involved resource. In the whole, combining the use of domain ontologies and work ow techniques, Resourceome provides a exible domain and operational knowledge organization, a powerful engine for semantic-driven work\ud ow composition, and a distributed, automatic and\ud transparent environment for work ow execution
    corecore