695 research outputs found


    Get PDF
    As an efficient online academic information repository and information channel with crowds’ contribution, online research social platforms have become an efficient tool for various kinds of research & management applications. Social network platforms have also become a major source to seek for field experts. They have advantages of crowd contributions, easy to access without geographic restrictions and avoiding conflict of interests over traditional database and search engine based approaches. However, current research attempts to find experts based on features such as published research work, social relationships, and online behaviours (e.g. reads and downloads of publications) on social platforms, they ignore to verify the reliability of identified experts. To bridge this gap, this research proposes an innovative Topic Sensitive SimRank (TSSR) model to identify “real” experts on social network platforms. TSSR model includes three components: LDA for Expertise Extension, Topic Sensitive Network for Reputation Measurement, and Topic Sensitive SimRank for unsuitable experts detection. We also design a parallel computing strategy to improve the efficiency of the proposed methods. Last, to verify the effectiveness of the proposed model, we design an experiment on one of the research social platforms-ScholarMate to seek for experts for companies that need academic-industry collaboration

    Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

    Full text link
    Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

    Diamond Dicing

    Get PDF
    In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

    Location Selection Query in Google Maps using Voronoi-based Spatial Skyline (VS2) Algorithm

    Get PDF
    Google Maps is one of the popular location selection systems. One of the popular features of Google Maps is nearby search. For example, someone who wants to find the closest restaurants to his location can use the nearby search feature. This feature only considers one specific location in providing the desired place choice. In a real-world situation, there may be a need to consider more than one location in selecting the desired place. Assume someone would like to choose a hotel close to the conference hall, the museum, beach, and souvenir store. In this situation, nearby search feature in Google Maps may not be able to suggest a list of hotels that are interesting for him based on the distance from each destination places. In this paper, we have successfully developed a web-based application of Google Maps search using Voronoi-based Spatial Skyline (VS2) algorithm to choose some Point Of Interest (POI) from Google Maps as their considered locations to select desired place. We used Google Maps API to provide POI information for our web-based application. The experiment result showed that the execution time increases while the number of considered location increases

    Providing Diversity in K-Nearest Neighbor Query Results

    Full text link
    Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.Comment: 20 pages, 11 figure