1,554 research outputs found

    Casual Information Visualization on Exploring Spatiotemporal Data

    Get PDF
    The goal of this thesis is to study how the diverse data on the Web which are familiar to everyone can be visualized, and with a special consideration on their spatial and temporal information. We introduce novel approaches and visualization techniques dealing with different types of data contents: interactively browsing large amount of tags linking with geospace and time, navigating and locating spatiotemporal photos or videos in collections, and especially, providing visual supports for the exploration of diverse Web contents on arbitrary webpages in terms of augmented Web browsing

    Mining the ‘Internet Graveyard’: Rethinking the Historians’ Toolkit

    Get PDF
    “Mining the Internet Graveyard” argues that the advent of massive quantity of born-digital historical sources necessitates a rethinking of the historians’ toolkit. The contours of a third wave of computational history are outlined, a trend marked by ever-increasing amounts of digitized information (especially web based), falling digital storage costs, a move to the cloud, and a corresponding increase in computational power to process these sources. Following this, the article uses a case study of an early born-digital archive at Library and Archives Canada – Canada’s Digital Collections project (CDC) – to bring some of these problems into view. An array of off-the-shelf data analysis solutions, coupled with code written in Mathematica, helps us bring context and retrieve information from a digital collection on a previously inaccessible scale. The article concludes with an illustration of the various computational tools available, as well as a call for greater digital literacy in history curricula and professional development.Social Sciences and Humanities Research Council || 430-2013-061

    What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

    Get PDF
    We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.Comment: To appear in NAACL 201

    Hierarchical Classification and its Application in University Search

    Get PDF
    Web search engines have been adopted by most universities for searching webpages in their own domains. Basically, a user sends keywords to the search engine and the search engine returns a flat ranked list of webpages. However, in university search, user queries are usually related to topics. Simple keyword queries are often insufficient to express topics as keywords. On the other hand, most E-commerce sites allow users to browse and search products in various hierarchies. It would be ideal if hierarchical browsing and keyword search can be seamlessly combined for university search engines. The main difficulty is to automatically classify and rank a massive number of webpages into the topic hierarchies for universities. In this thesis, we use machine learning and data mining techniques to build a novel hybrid search engine with integrated hierarchies for universities, called SEEU (Search Engine with hiErarchy for Universities). Firstly, we study the problem of effective hierarchical webpage classification. We develop a parallel webpage classification system based on Support Vector Machines. With extensive experiments on the well-known ODP (Open Directory Project) dataset, we empirically demonstrate that our hierarchical classification system is very effective and outperforms the traditional flat classification approaches significantly. Secondly, we study the problem of integrating hierarchical classification into the ranking system of keywords-based search engines. We propose a novel ranking framework, called ERIC (Enhanced Ranking by hIerarchical Classification), for search engines with hierarchies. Experimental results on four large-scale TREC (Text REtrieval Conference) web search datasets show that our ranking system with hierarchical classification outperforms the traditional flat keywords-based search methods significantly. Thirdly, we propose a novel active learning framework to improve the performance of hierarchical classification, which is important for ranking webpages in hierarchies. From our experiments on the benchmark text datasets, we find that our active learning framework can achieve good classification performance yet save a considerable number of labeling effort compared with the state-of-the-art active learning methods for hierarchical text classification. Fourthly, based on the proposed classification and ranking methods, we present a novel hierarchical classification framework for mining academic topics from university webpages. We build an academic topic hierarchy based on the commonly accepted Wikipedia academic disciplines. Based on this hierarchy, we train a hierarchical classifier and apply it to mine academic topics. According to our comprehensive analysis, the academic topics mined by our method are reasonable and consistent with the real-world topic distribution in universities. Finally, we combine all the proposed techniques together and implement the SEEU search engine. According to two usability studies conducted in the ECE and the CS departments at our university, SEEU is favored by the majority of participants. To conclude, the main contribution of this thesis is a novel search engine, called SEEU, for universities. We discuss the challenges toward building SEEU and propose effective machine learning and data mining methods to tackle them. With extensive experiments on well-known benchmark datasets and real-world university webpage datasets, we demonstrate that our system is very effective. In addition, two usability studies of SEEU in our university show that SEEU has a great promise for university search

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    A Broad Evaluation of the Tor English Content Ecosystem

    Full text link
    Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page

    4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022)

    Full text link
    Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 4th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges. Due to the covid pandemic, CARMA 2022 is planned as a virtual and face-to-face conference, simultaneouslyDoménech I De Soria, J.; Vicente Cuervo, MR. (2022). 4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat PolitÚcnica de ValÚncia. https://doi.org/10.4995/CARMA2022.2022.1595

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest in the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

    Leveraging Mobile App Classification and User Context Information for Improving Recommendation Systems

    Get PDF
    Mobile apps play a significant role in current online environments where there is an overwhelming supply of information. Although mobile apps are part of our daily routine, searching and finding mobile apps is becoming a nontrivial task due to the current volume, velocity and variety of information. Therefore, app recommender systems provide users’ desired apps based on their preferences. However, current recommender systems and their underlying techniques are limited in effectively leveraging app classification schemes and context information. In this thesis, I attempt to address this gap by proposing a text analytics framework for mobile app recommendation by leveraging an app classification scheme that incorporates the needs of users as well as the complexity of the user-item-context information in mobile app usage pattern. In this recommendation framework, I adopt and empirically test an app classification scheme based on textual information about mobile apps using data from Google Play store. In addition, I demonstrate how context information such as user social media status can be matched with app classification categories using tree-based and rule-based prediction algorithms. Methodology wise, my research attempts to show the feasibility of textual data analysis in profiling apps based on app descriptions and other structured attributes, as well as explore mechanisms for matching user preferences and context information with app usage categories. Practically, the proposed text analytics framework can allow app developers reach a wider usage base through better understanding of user motivation and context information
    • 

    corecore