21,883 research outputs found

    VAS (Visual Analysis System): An information visualization engine to interpret World Wide Web structure

    Get PDF
    People increasingly encounter problems of interpreting and filtering mass quantities of information. The enormous growth of information systems on the World Wide Web has demonstrated that we need systems to filter, interpret, organize and present information in ways that allow users to use these large quantities of information. People need to be able to extract knowledge from this sometimes meaningful but sometimes useless mass of data in order to make informed decisions. Web users need to have some kind of information about the sort of page they might visit, such as, is it a rarely referenced or often-referenced page? This master\u27s thesis presents a method to address these problems using data mining and information visualization techniques

    The contribution of data mining to information science

    Get PDF
    The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research

    A social media and crowd-sourcing data mining system for crime prevention during and post-crisis situations

    Get PDF
    A number of large crisis situations, such as natural disasters have affected the planet over the last decade. The outcomes of such disasters are catastrophic for the infrastructures of modern societies. Furthermore, after large disasters, societies come face-to-face with important issues, such as the loss of human lives, people who are missing and the increment of the criminality rate. In many occasions, they seem unprepared to face such issues. This paper aims to present an automated system for the synchronization of the police and Law Enforcement Agencies (LEAs) for the prevention of criminal activities during and post a large crisis situation. The paper presents a review of the literature focusing on the necessity of using data mining in combination with advanced web technologies, such as social media and crowd-sourcing, for the resolution of the problems related to criminal activities caused during and post-crisis situations. The paper provides an introduction to examples of different techniques and algorithms used for social media and crowd-sourcing scanning, such as sentiment analysis and link analysis. The main focus of the paper is the ATHENA Crisis Management system. The function of the ATHENA system is based on the use of social media and crowd-sourcing for collecting crisis-related information. The system uses a number of data mining techniques to collect and analyze data from the social media for the purpose of crime prevention. A number of conclusions are drawn on the significance of social media and crowd-sourcing data mining techniques for the resolution of problems related to large crisis situations with emphasis to the ATHENA system

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    Spectral Ranking in Complex Networks Using Memristor Crossbars

    Get PDF
    Various centrality measures have been proposed to identify the influence of each node in a complex network. Among the most popular ranking metrics, spectral measures stand out from the crowd. They rely on the computation of the dominant eigenvector of suitable matrices related to the graph: EigenCentrality, PageRank, Hyperlink Induced Topic Search (HITS) and Stochastic Approach for Link-Structure Analysis (SALSA). The simplest algorithm used to solve this linear algebra computation is the Power Method. It consists of multiple Matrix-Vector Multiplications (MVMs) and a normalization step to avoid divergent behaviours. In this work, we present an analog circuit used to accelerate the Power Iteration algorithm including current-mode termination for the memristor crossbars and a normalization circuit. The normalization step together with the feedback loop of the complete circuit ensure stability and convergence of the dominant eigenvector. We implement a transistor level peripheral circuitry around the memristor crossbar and take non-idealities such as wire parasitics, source driver resistance and finite memristor precision into account. We compute the different spectral centralities to demonstrate the performance of the system. We compare our results to the ones coming from the conventional digital computers and observe significant energy savings while maintaining a competitive accuracy

    The Conspiracy Money Machine: Uncovering Telegram's Conspiracy Channels and their Profit Model

    Full text link
    In recent years, major social media platforms have implemented increasingly strict moderation policies, resulting in bans and restrictions on conspiracy theory-related content. To circumvent these restrictions, conspiracy theorists are turning to alternatives, such as Telegram, where they can express and spread their views with fewer limitations. Telegram offers channels -- virtual rooms where only administrators can broadcast messages -- and a more permissive content policy. These features have created the perfect breeding ground for a complex ecosystem of conspiracy channels. In this paper, we illuminate this ecosystem. First, we propose an approach to detect conspiracy channels. Then, we discover that conspiracy channels can be clustered into four distinct communities comprising over 17,000 channels. Finally, we uncover the "Conspiracy Money Machine," revealing how most conspiracy channels actively seek to profit from their subscribers. We find conspiracy theorists leverage e-commerce platforms to sell questionable products or lucratively promote them through affiliate links. Moreover, we observe that conspiracy channels use donation and crowdfunding platforms to raise funds for their campaigns. We determine that this business involves hundreds of donors and generates a turnover of over $90 million

    Mining Web Dynamics for Search

    Get PDF
    Billions of web users collectively contribute to a dynamic web that preserves how information sources and descriptions change over time. This dynamic process sheds light on the quality of web content, and even indicates the temporal properties of information needs expressed via queries. However, existing commercial search engines typically utilize one crawl of web content (the latest) without considering the complementary information concealed in web dynamics. As a result, the generated rankings may be biased due to the efficiency of knowledge on page or hyperlink evolution, and the time-sensitive facet within search quality, e.g., freshness, has to be neglected. While previous research efforts have been focused on exploring the temporal dimension in retrieval process, few of them showed consistent improvements on large-scale real-world archival web corpus with a broad time span.We investigate how to utilize the changes of web pages and hyperlinks to improve search quality, in terms of freshness and relevance of search results. Three applications that I have focused on are: (1) document representation, in which the anchortext (short descriptive text associated with hyperlinks) importance is estimated by considering its historical status; (2) web authority estimation, in which web freshness is quantified and utilized for controlling the authority propagation; and (3) learning to rank, in which freshness and relevance are optimized simultaneously in an adaptive way depending on query type. The contributions of this thesis are: (1) incorporate web dynamics information into critical components within search infrastructure in a principled way; and (2) empirically verify the proposed methods by conducting experiments based on (or depending on) a large-scale real-world archival web corpus, and demonstrated their superiority over existing state-of-the-art
    • …
    corecore