1,484 research outputs found

    Schema-aware keyword search on linked data

    Get PDF
    Keyword search is a popular technique for querying the ever growing repositories of RDF graph data on the Web. This is due to the fact that the users do not need to master complex query languages (e.g., SQL, SPARQL) and they do not need to know the underlying structure of the data on the Web to compose their queries. Keyword search is simple and flexible. However, it is at the same time ambiguous since a keyword query can be interpreted in different ways. This feature of keyword search poses at least two challenges: (a) identifying relevant results among a multitude of candidate results, and (b) dealing with the performance scalability issue of the query evaluation algorithms. In the literature, multiple schema-unaware approaches are proposed to cope with the above challenges. Some of them identify as relevant results only those candidate results which maintain the keyword instances in close proximity. Other approaches filter out irrelevant results using their structural characteristics or rank and top-k process the retrieved results based on statistical information about the data. In any case, these approaches cannot disambiguate the query to identify the intent of the user and they cannot scale satisfactorily when the size of the data and the number of the query keywords grow. In recent years, different approaches tried to exploit the schema (structural summary) of the RDF (Resource Description Framework) data graph to address the problems above. In this context, an original hierarchical clustering technique is introduced in this dissertation. This approach clusters the results based on a semantic interpretation of the keyword instances and takes advantage of relevance feedback from the user. The clustering hierarchy uses pattern graphs which are structured queries and clustering together result graphs with the same structure. Pattern graphs represent possible interpretations for the keyword query. By navigating though the hierarchy the user can select the pattern graph which is relevant to her intent. Nevertheless, structural summaries are approximate representations of the data and, therefore, might return empty answers or miss results which are relevant to the user intent. To address this issue, a novel approach is presented which combines the use of the structural summary and the user feedback with a relaxation technique for pattern graphs to extract additional results potentially of interest to the user. Query caching and multi-query optimization techniques are leveraged for the efficient evaluation of relaxed pattern graphs. Although the approaches which consider the structural summary of the data graph are promising, they require interaction with the user. It is claimed in this dissertation that without additional information from the user, it is not possible to produce results of high quality from keyword search on RDF data with the existing techniques. In this regard, an original keyword query language on RDF data is introduced which allows the user to convey his intention flexibly and effortlessly by specifying cohesive keyword groups. A cohesive group of keywords in a query indicates that its keywords should form a cohesive unit in the query results. It is experimentally demonstrated that cohesive keyword queries improve the result quality effectively and prune the search space of the pattern graphs efficiently compared to traditional keyword queries. Most importantly, these benefits are achieved while retaining the simplicity and the convenience of traditional keyword search. The last issue addressed in this dissertation is the diversification problem for keyword search on RDF data. The goal of diversification is to trade off relevance and diversity in the results set of a keyword query in order to minimize the dissatisfaction of the average user. Novel metrics are developed for assessing relevance and diversity along with techniques for the generation of a relevant and diversified set of query interpretations for a keyword query on an RDF data graph. Experimental results show the effectiveness of the metrics and the efficiency of the approach

    TSPOONS: Tracking Salience Profiles Of Online News Stories

    Get PDF
    News space is a relatively nebulous term that describes the general discourse concerning events that affect the populace. Past research has focused on qualitatively analyzing news space in an attempt to answer big questions about how the populace relates to the news and how they respond to it. We want to ask when do stories begin? What stories stand out among the noise? In order to answer the big questions about news space, we need to track the course of individual stories in the news. By analyzing the specific articles that comprise stories, we can synthesize the information gained from several stories to see a more complete picture of the discourse. The individual articles, the groups of articles that become stories, and the overall themes that connect stories together all complete the narrative about what is happening in society. TSPOONS provides a framework for analyzing news stories and answering two main questions: what were the important stories during some time frame and what were the important stories involving some topic. Drawing technical news stories from Techmeme.com, TSPOONS generates profiles of each news story, quantitatively measuring the importance, or salience, of news stories as well as quantifying the impact of these stories over time

    Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

    Get PDF
    The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. "Exploratory data analysis" is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.Comment: 26 page

    A framework for the Comparative analysis of text summarization techniques

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceWe see that with the boom of information technology and IOT (Internet of things), the size of information which is basically data is increasing at an alarming rate. This information can always be harnessed and if channeled into the right direction, we can always find meaningful information. But the problem is this data is not always numerical and there would be problems where the data would be completely textual, and some meaning has to be derived from it. If one would have to go through these texts manually, it would take hours or even days to get a concise and meaningful information out of the text. This is where a need for an automatic summarizer arises easing manual intervention, reducing time and cost but at the same time retaining the key information held by these texts. In the recent years, new methods and approaches have been developed which would help us to do so. These approaches are implemented in lot of domains, for example, Search engines provide snippets as document previews, while news websites produce shortened descriptions of news subjects, usually as headlines, to make surfing easier. Broadly speaking, there are mainly two ways of text summarization – extractive and abstractive summarization. Extractive summarization is the approach in which important sections of the whole text are filtered out to form the condensed form of the text. While the abstractive summarization is the approach in which the text as a whole is interpreted and examined and after discerning the meaning of the text, sentences are generated by the model itself describing the important points in a concise way

    The WEB Book experiments in electronic textbook design

    Get PDF
    This paper describes a series of three evaluations of electronic textbooks on the Web, which focused on assessing how appearance and design can affect users' sense of engagement and directness with the material. The EBONI Project's methodology for evaluating electronic textbooks is outlined and each experiment is described, together with an analysis of results. Finally, some recommendations for successful design are suggested, based on an analysis of all experimental data. These recommendations underline the main findings of the evaluations: that users want some features of paper books to be preserved in the electronic medium, while also preferring electronic text to be written in a scannable style

    COMMON-KEY ENCRYPTION IN DOUBLE WAITRESS WITH KEY PURSUE STABLE DECEIVE STORAGE

    Get PDF
    Ranking and anticipated back no doubt transcendent proper outcomes of a proposal have grown up planned normally abstract public prototype in XML doubt processing. To supervise this, send, we early plan a posh plan of inquire relaxations for promoting close queries over XML data. The juices hidden this cage aren't compelled to rigorously effect the obsessed enquire statement well, they may stay on qualities prone in the unusual inquire. However, previously mentioned proposals constraint satisfactorily takes houses into difficulty, plus they, then, proscription have the clout to in a spirited manner blend houses with space to oppose the relaxed queries. Within our quick fix, we organize nodes into two gathers: absolute apply nodes and analytical refer nodes, and taste the relevant programs on the correlation relationship appraisals of definite trace nodes and demographic associate nodes. We integrate the draft an encyclopedic troop of experiments to illustrate the potency of our advised manner when it comes to rigor and cancel poetry. Querying XML data time and again becomes unmanageable in constructive applications, because the ranked network of XML cites mayhap opposed, and then any saury discord from the archive house can exactly raise the risk for installation of unhappy queries. This genuinely is challenging, especially in turn on to soul that such queries succumb purge explanations, despite the fact not joining errors. Additionally, we produce clue-based counseled acyclic visual representation to spawn and establish formation relaxations and promote impotent estimate interdependent yet comparison affinity appraisal on networks. We, then, build a novel top-k rebirth manner that one may in a spirited manner start transcendent bright juices not outside a tell nonrelevant practicing the ranking measure

    MINING REMOVABLE COVERED PATTERNS OVER ITEM DATASETS WITH CAPABLE ALGORITHMS

    Get PDF
    Ranking and coming back probably the most relevant outcomes of a question have grown to be typically the most popular paradigm in XML query processing. To deal with this issue, we first propose a classy framework of query relaxations for supporting approximate queries over XML data. The solutions underlying this framework aren't compelled to strictly fulfill the given query formulation rather, they may be founded on qualities inferable in the original query. However, the present proposals don't adequately take structures into consideration, plus they, therefore, don't have the strength to stylishly combine structures with contents to reply to the relaxed queries. Within our solution, we classify nodes into two groups: categorical attribute nodes and statistical attribute nodes, and style the related approaches on the similarity relation assessments of categorical attribute nodes and statistical attribute nodes. We complement the make use of a comprehensive group of experiments to exhibit the potency of our suggested approach when it comes to precision and recall metrics. Querying XML data frequently becomes intractable in practical applications, because the hierarchical structure of XML documents might be heterogeneous, and then any slight misunderstanding from the document structure can certainly increase the risk for formulation of unsatisfiable queries. This really is difficult, particularly in light to the fact that such queries yield empty solutions, although not compilation errors. Additionally, we design clue-based directed acyclic graphto generate and organizestructure relaxations anddevelop ineffective assessment coefficient for thatsimilarity relation assessment onstructures. We, then, create a novel top-k retrieval approach that may smartly create the most promising solutions within an order correlated using the ranking measure
    • …
    corecore