775 research outputs found

    Overview of GINIX and Top-k Method

    Get PDF
    In today’s life more applications are web based and peoples may communicate with each other by using Internet. It involves more and more data retrieval from database system as per user demand. Inverted Index is a system use for searching in which searching is takes place as per index sequentially. So it require more time for searching. While Ginix can search as per word in which all files or related document that word is search appropriately. But it only search documents file which are save in database system but not search multimedia files. Hence the more competent technique for searching is top-k method in which all database is scan for finding appropriate result for given data. Also data is search on web pages. It provides more perfect result within less time as compare to Ginix. DOI: 10.17762/ijritcc2321-8169.15011

    Data mining by means of generalized patterns

    Get PDF
    The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

    A comprehensive approach to data warehouse testing

    Full text link
    Testing is an essential part of the design life-cycle of any software product. Nevertheless, while most phases of data warehouse design have received considerable attention in the literature, not much has been said about data warehouse testing. In this paper we introduce a number of data mart-specific testing activities, we classify them in terms of what is tested and how it is tested, and we discuss how they can be framed within a reference design methodology. Categories and Subject Descriptor

    Data Mining Algorithms for Internet Data: from Transport to Application Layer

    Get PDF
    Nowadays we live in a data-driven world. Advances in data generation, collection and storage technology have enabled organizations to gather data sets of massive size. Data mining is a discipline that blends traditional data analysis methods with sophisticated algorithms to handle the challenges posed by these new types of data sets. The Internet is a complex and dynamic system with new protocols and applications that arise at a constant pace. All these characteristics designate the Internet a valuable and challenging data source and application domain for a research activity, both looking at Transport layer, analyzing network tra c flows, and going up to Application layer, focusing on the ever-growing next generation web services: blogs, micro-blogs, on-line social networks, photo sharing services and many other applications (e.g., Twitter, Facebook, Flickr, etc.). In this thesis work we focus on the study, design and development of novel algorithms and frameworks to support large scale data mining activities over huge and heterogeneous data volumes, with a particular focus on Internet data as data source and targeting network tra c classification, on-line social network analysis, recommendation systems and cloud services and Big data

    Business Intelligence on Non-Conventional Data

    Get PDF
    The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data

    An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: Bridgegate, Pizzagate and storytelling on the web

    Full text link
    Although a great deal of attention has been paid to how conspiracy theories circulate on social media and their factual counterpart conspiracies, there has been little computational work done on describing their narrative structures. We present an automated pipeline for the discovery and description of the generative narrative frameworks of conspiracy theories on social media, and actual conspiracies reported in the news media. We base this work on two separate repositories of posts and news articles describing the well-known conspiracy theory Pizzagate from 2016, and the New Jersey conspiracy Bridgegate from 2013. We formulate a graphical generative machine learning model where nodes represent actors/actants, and multi-edges and self-loops among nodes capture context-specific relationships. Posts and news items are viewed as samples of subgraphs of the hidden narrative network. The problem of reconstructing the underlying structure is posed as a latent model estimation problem. We automatically extract and aggregate the actants and their relationships from the posts and articles. We capture context specific actants and interactant relationships by developing a system of supernodes and subnodes. We use these to construct a network, which constitutes the underlying narrative framework. We show how the Pizzagate framework relies on the conspiracy theorists' interpretation of "hidden knowledge" to link otherwise unlinked domains of human interaction, and hypothesize that this multi-domain focus is an important feature of conspiracy theories. While Pizzagate relies on the alignment of multiple domains, Bridgegate remains firmly rooted in the single domain of New Jersey politics. We hypothesize that the narrative framework of a conspiracy theory might stabilize quickly in contrast to the narrative framework of an actual one, which may develop more slowly as revelations come to light.Comment: conspiracy theory, narrative structur
    • …
    corecore