23 research outputs found

    Discovery of Points of Interest with Different Granularities for Tour Recommendation Using a City Adaptive Clustering Framework

    Get PDF
    Increasing demand for personalized tours for tourists travel in an urban area motivates more attention to points of interest (POI) and tour recommendation services. Recently, the granularity of POI has been discussed to provide more detailed information for tour planning, which supports both inside and outside routes that would improve tourists' travel experience. Such tour recommendation systems require a predefined POI database with different granularities, but existing POI discovery methods do not consider the granularity of POI well and treat all POIs as the same scale. On the other hand, the parameters also need to be tuned for different cities, which is not a trivial process. To this end, we propose a city adaptive clustering framework for discovering POIs with different granularities in this article. Our proposed method takes advantage of two clustering algorithms and is adaptive to different cities due to automatic identification of suitable parameters for different datasets. Experiments on two real-world social image datasets reveal the effectiveness of our proposed framework. Finally, the discovered POIs with two levels of granularity are successfully applied on inner and outside tour planning

    Using social media for sub-event detection during disasters

    Get PDF
    AbstractSocial media platforms have become fundamental tools for sharing information during natural disasters or catastrophic events. This paper presents SEDOM-DD (Sub-Events Detection on sOcial Media During Disasters), a new method that analyzes user posts to discover sub-events that occurred after a disaster (e.g., collapsed buildings, broken gas pipes, floods). SEDOM-DD has been evaluated with datasets of different sizes that contain real posts from social media related to different natural disasters (e.g., earthquakes, floods and hurricanes). Starting from such data, we generated synthetic datasets with different features, such as different percentages of relevant posts and/or geotagged posts. Experiments performed on both real and synthetic datasets showed that SEDOM-DD is able to identify sub-events with high accuracy. For example, with a percentage of relevant posts of 80% and geotagged posts of 15%, our method detects the sub-events and their areas with an accuracy of 85%, revealing the high accuracy and effectiveness of the proposed approach

    Incentive-Centered Design for User-Contributed Content

    Full text link
    We review incentive-centered design for user-contributed content (UCC) on the Internet. UCC systems, produced (in part) through voluntary contributions made by non-employees, face fundamental incentives problems. In particular, to succeed, users need to be motivated to contribute in the first place ("getting stuff in"). Further, given heterogeneity in content quality and variety, the degree of success will depend on incentives to contribute a desirable mix of quality and variety ("getting \emph{good} stuff in"). Third, because UCC systems generally function as open-access publishing platforms, there is a need to prevent or reduce the amount of negative value (polluting or manipulating) content. The work to date on incentives problems facing UCC is limited and uneven in coverage. Much of the empirical research concerns specific settings and does not provide readily generalizable results. And, although there are well-developed theoretical literatures on, for example, the private provision of public goods (the "getting stuff in" problem), this literature is only applicable to UCC in a limited way because it focuses on contributions of (homogeneous) money, and thus does not address the many problems associated with heterogeneous information content contributions (the "getting \emph{good} stuff in" problem). We believe that our review of the literature has identified more open questions for research than it has pointed to known results.http://deepblue.lib.umich.edu/bitstream/2027.42/100229/1/icd4ucc.pdf7

    Extracting and Harnessing Interpretation in Data Mining

    Get PDF
    Machine learning, especially the recent deep learning technique, has aroused significant development to various data mining applications, including recommender systems, misinformation detection, outlier detection, and health informatics. Unfortunately, while complex models have achieved unprecedented prediction capability, they are often criticized as ``black boxes'' due to multiple layers of non-linear transformation and the hardly understandable working mechanism. To tackle the opacity issue, interpretable machine learning has attracted increasing attentions. Traditional interpretation methods mainly focus on explaining predictions of classification models with gradient based methods or local approximation methods. However, the natural characteristics of data mining applications are not considered, and the internal mechanisms of models are not fully explored. Meanwhile, it is unknown how to utilize interpretation to improve models. To bridge the gap, I developed a series of interpretation methods that gradually increase the transparency of data mining models. First, a fundamental goal of interpretation is providing the attribution of input features to model outputs. To adapt feature attribution to explaining outlier detection, I propose Contextual Outlier Interpretation (COIN). Second, to overcome the limitation of attribution methods that do not explain internal information inside models, I further propose representation interpretation methods to extract knowledge as a taxonomy. However, these post-hoc methods may suffer from interpretation accuracy and the inability to directly control model training process. Therefore, I propose an interpretable network embedding framework to explicitly control the meaning of latent dimensions. Finally, besides obtaining explanation, I propose to use interpretation to discover the vulnerability of models in adversarial circumstances, and then actively prepare models using adversarial training to improve their robustness against potential threats. My research of interpretable machine learning enables data scientists to better understand their models and discover defects for further improvement, as well as improves the experiences of customers who benefit from data mining systems. It broadly impacts fields such as Information Retrieval, Information Security, Social Computing, and Health Informatics

    Spatiotemporal enabled Content-based Image Retrieval

    Full text link

    SoK: Anti-Facial Recognition Technology

    Full text link
    The rapid adoption of facial recognition (FR) technology by both government and commercial entities in recent years has raised concerns about civil liberties and privacy. In response, a broad suite of so-called "anti-facial recognition" (AFR) tools has been developed to help users avoid unwanted facial recognition. The set of AFR tools proposed in the last few years is wide-ranging and rapidly evolving, necessitating a step back to consider the broader design space of AFR systems and long-term challenges. This paper aims to fill that gap and provides the first comprehensive analysis of the AFR research landscape. Using the operational stages of FR systems as a starting point, we create a systematic framework for analyzing the benefits and tradeoffs of different AFR approaches. We then consider both technical and social challenges facing AFR tools and propose directions for future research in this field.Comment: Camera-ready version for Oakland S&P 202

    Advanced Location-Based Technologies and Services

    Get PDF
    Since the publication of the first edition in 2004, advances in mobile devices, positioning sensors, WiFi fingerprinting, and wireless communications, among others, have paved the way for developing new and advanced location-based services (LBSs). This second edition provides up-to-date information on LBSs, including WiFi fingerprinting, mobile computing, geospatial clouds, geospatial data mining, location privacy, and location-based social networking. It also includes new chapters on application areas such as LBSs for public health, indoor navigation, and advertising. In addition, the chapter on remote sensing has been revised to address advancements

    Efficient processing of similarity queries with applications

    Get PDF
    Today, a myriad of data sources, from the Internet to business operations to scientific instruments, produce large and different types of data. Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological applications, call for identifying and processing similarities in big data. As a result, it is imperative to develop new similarity query processing approaches and systems that scale from low dimensional data to high dimensional data, from single machine to clusters of hundreds of machines, and from disk-based to memory-based processing. This dissertation introduces and studies several similarity-aware query operators, analyzes and optimizes their performance. The first contribution of this dissertation is an SQL-based Similarity Group-by operator (SGB, for short) that extends the semantics of the standard SQL Group-by operator to group data with similar but not necessarily equal values. We realize these SGB operators by extending the Standard SQL Group-by and introduce two new SGB operators for multi-dimensional data. We implement and test the new SGB operators and their algorithms inside an open-source centralized database server (PostgreSQL). In the second contribution of this dissertation, we study how to efficiently process Hamming-distance-based similarity queries (Hamming-distance select and Hamming-distance join) that are crucial to many applications. We introduce a new index, termed the HA-Index, that speeds up distance comparisons and eliminates redundancies when performing the two flavors of Hamming distance range queries (namely, the selects and joins). In the third and last contribution of this dissertation, we develop a system for similarity query processing and optimization in an in-memory and distributed setup for big spatial data. We propose a query scheduler and a distributed query optimizer that use a new cost model to optimize the cost of similarity query processing in this in-memory distributed setup. The scheduler and query optimizer generates query execution plans that minimize the effect of query skew. The query scheduler employs new spatial indexing techniques based on bloom filters to forward queries to the appropriate local sites. The proposed query processing and optimization techniques are prototyped inside Spark, a distributed main-memory computation system

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
    corecore