1,009 research outputs found

    Effective semantic-based keyword search over relational databases for knowledge discovery

    Get PDF
    Keyword-based search has been popularized by Internet web search engines such as Google which is the most commonly used search engine to locate the information on the web. On the other hand while traditional database management systems offer powerful query languages such as SQL, they do not provide keyword-based search similar to the one provided by web search engines. The current amount of text data in relational databases is massive and is growing fast. This increases the importance and need for non-technical users to be able to search for such information using simple keyword search just as how they would search for text documents on the web. Keyword search over relational databases (KSRDBs) enables ordinary users to query relational databases by simply submitting keywords without having to know any SQL or having any knowledge of the underlying structure of the data. In this research work our primary focus is to enhance the effectiveness of the keyword search over relational databases using semantic web technologies. We have also addressed some the issues with the effectiveness of the current keyword search over relational databases. In particular we are addressing the followings: We have improved (gained significantly higher precision/recall curve) the existing state-of-the-art ranking functions by incorporating the query keywords\u27 proximity and query keywords\u27 quadgrams of the text attributes with long string into the scoring function. We have adapted a novel approach in making keyword search recommendations based on the text attributes in which the search terms were found without relying on the user\u27s past search criteria. A proof of concept (POC) prototype system called TupleRecommender has been implemented based on this approach. We have designed and implemented a proof of concept (POC) prototype system called database semantic search explorer (DBSemSXplorer) which can answer the traditional keyword search over relational databases in a more effective way with a better presentation of search results. This system is based on semantic web technologies and is equipped with faceted search and inference capability of the Semantic Web to ease the task of knowledge discovery for the end user

    Human Resources Recommender system based on discrete variables

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNatural Language Processing and Understanding has become one of the most exciting and challenging fields in the area of Artificial Intelligence and Machine Learning. With the rapidly changing business environment and surroundings, the importance of having the data transformed in such a way that makes it easy to interpret is the greatest competitive advantage a company can have. Having said this, the purpose of this thesis dissertation is to implement a recommender system for the Human Resources department in a company that will aid the decision-making process of filling a specific job position with the right candidate. The recommender system fill be fed with applicants, each being represented by their skills, and will produce a subset of most adequate candidates given a job position. This work uses StarSpace, a novelty neural embedding model, whose aim is to represent entities in a common vectorial space and further perform similarity measures amongst them

    Rank-aware, Approximate Query Processing on the Semantic Web

    Get PDF
    Search over the Semantic Web corpus frequently leads to queries having large result sets. So, in order to discover relevant data elements, users must rely on ranking techniques to sort results according to their relevance. At the same time, applications oftentimes deal with information needs, which do not require complete and exact results. In this thesis, we face the problem of how to process queries over Web data in an approximate and rank-aware fashion

    Query Relaxation for Entity-relationship Search

    Get PDF

    Solving empty result problem in keyword search over relational databases

    Get PDF

    Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    Automatic extraction of concepts from texts and applications

    Get PDF
    The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.Fundação para a Ciência e a Tecnologia - SFRH/BD/61543/200

    Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    Get PDF
    The variety and complexity of potentially-related data resources available for querying --- webpages, databases, data warehouses --- has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the pro blem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration. Within IE, we focus on the problem of assigning semantic classes to entities. First we develop a context pattern induction method to extend small initial entity lists of various semantic classes. We also demonstrate that features derived from such extended entity lists can significantly improve performance of state-of-the-art discriminative taggers. The output of pattern-based class-instance extractors is often high-precision and low-recall in nature, which is inadequate for many real world applications. We use Adsorption, a graph based label propagation algorithm, to significantly increase recall of an initial high-precision, low-recall pattern-based extractor by combining evidences from unstructured and structured text corpora. Building on Adsorption, we propose a new label propagation algorithm, Modified Adsorption (MAD), and demonstrate its effectiveness on various real-world datasets. Additionally, we also show how class-instance acquisition performance in the graph-based SSL setting can be improved by incorporating additional semantic constraints available in independently developed knowledge bases. Within Information Integration, we develop a novel system, Q, which draws ideas from machine learning and databases to help a non-expert user construct data-integrating queries based on keywords (across databases) and interactive feedback on answers. We also present an information need-driven strategy for automatically incorporating new sources and their information in Q. We also demonstrate that Q\u27s learning strategy is highly effective in combining the outputs of ``black box\u27\u27 schema matchers and in re-weighting bad alignments. This removes the need to develop an expensive mediated schema which has been necessary for most previous systems