Search CORE

505 research outputs found

Skyline queries computation on crowdsourced- enabled incomplete database

Author: Aljuboori Ali A.Alwan
Gulzar Yonis
Ibrahim Hamidah
Swidan Marwa
Turaev Sherzod
Zaid Abualkishik Abedallah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches

Universiti Putra Malaysia Institutional Repository

The International Islamic University Malaysia Repository

Processing Incomplete k Nearest Neighbor Search

Author: CHEN Gang
CUI Huiyong
GAO Yunjun
MIAO Xiaoye
ZHENG Baihua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

Institutional Knowledge at Singapore Management University

Searching and mining in enriched geo-spatial data

Author: Schmid Klaus Arthur
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 09/12/2016
Field of study

The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in räumlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten über sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhärenter Unsicherheit. Beispiele für solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusätzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hält viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in räumlichen Daten. In all diesen Fällen haben die entwickelten Methoden signifikante Forschungsbeiträge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gängiges Anwendungsgebiet räumlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulärer Maße optimieren, d.h., der kürzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusätzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder Präferenzen erfüllen. Diese Arbeit präsentiert zwei Ansätze, solche multi-enriched Daten in Straßennetze einzufügen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen über Nutzerpräferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden präsentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework präsentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfügbaren Ressourcen. Eine Anwendung ist die Suche nach Parkplätzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulässt. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant über den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit präsentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukünftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfügbar und Anfragen lassen sich aufgrund der \P-Vollständigkeit des Problems nicht ohne Weiteres auf größere Datensätze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wäre. Diese Arbeit präsentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

MOVING OBJECTS MANAGEMENT FOR LOCATION-BASED SERVICES

Author: GUO LONG
Publication venue
Publication date: 17/08/2015
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Leveraging Client Processing for Location Privacy in Mobile Local Search

Author: Eltarjaman Wisam Mohamed
Publication venue: Digital Commons @ DU
Publication date: 01/01/2016
Field of study

Usage of mobile services is growing rapidly. Most Internet-based services targeted for PC based browsers now have mobile counterparts. These mobile counterparts often are enhanced when they use user\u27s location as one of the inputs. Even some PC-based services such as point of interest Search, Mapping, Airline tickets, and software download mirrors now use user\u27s location in order to enhance their services. Location-based services are exactly these, that take the user\u27s location as an input and enhance the experience based on that. With increased use of these services comes the increased risk to location privacy. The location is considered an attribute that user\u27s hold as important to their privacy. Compromise of one\u27s location, in other words, loss of location privacy can have several detrimental effects on the user ranging from trivial annoyance to unreasonable persecution. More and more companies in the Internet economy rely exclusively on the huge data sets they collect about users. The more detailed and accurate the data a company has about its users, the more valuable the company is considered. No wonder that these companies are often the same companies that offer these services for free. This gives them an opportunity to collect more accurate location information. Research community in the location privacy protection area had to reciprocate by modeling an adversary that could be the service provider itself. To further drive this point, we show that a well-equipped service provider can infer user\u27s location even if the location information is not directly available by using other information he collects about the user. There is no dearth of proposals of several protocols and algorithms that protect location privacy. A lot of these earlier proposals require a trusted third party to play as an intermediary between the service provider and the user. These protocols use anonymization and/or obfuscation techniques to protect user\u27s identity and/or location. This requirement of trusted third parties comes with its own complications and risks and makes these proposals impractical in real life scenarios. Thus it is preferable that protocols do not require a trusted third party. We look at existing proposals in the area of private information retrieval. We present a brief survey of several proposals in the literature and implement two representative algorithms. We run experiments using different sizes of databases to ascertain their practicability and performance features. We show that private information retrieval based protocols still have long ways to go before they become practical enough for local search applications. We propose location privacy preserving mechanisms that take advantage of the processing power of modern mobile devices and provide configurable levels of location privacy. We propose these techniques both in the single query scenario and multiple query scenario. In single query scenario, the user issues a query to the server and obtains the answer. In the multiple query scenario, the user keeps sending queries as she moves about in the area of interest. We show that the multiple query scenario increases the accuracy of adversary\u27s determination of user\u27s location, and hence improvements are needed to cope with this situation. So, we propose an extension of the single query scenario that addresses this riskier multiple query scenario, still maintaining the practicability and acceptable performance when implemented on a modern mobile device. Later we propose a technique based on differential privacy that is inspired by differential privacy in statistical databases. All three mechanisms proposed by us are implemented in realistic hardware or simulators, run against simulated but real life data and their characteristics ascertained to show that they are practical and ready for adaptation. This dissertation study the privacy issues for location-based services in mobile environment and proposes a set of new techniques that eliminate the need for a trusted third party by implementing efficient algorithms on modern mobile hardware

University of Denver

Recommended from our members

Robust Algorithms for Clustering with Applications to Data Integration

Author: Galhotra Sainyam
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/10/2021
Field of study

A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below. 1. While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that leverage supervision through an oracle i.e an abstraction of crowdsourcing. Additionally, we focus on scalability to handle web-scale datasets. 2. In community detection applications, a common setback in evaluation of the quality of clustering techniques is the lack of ground truth data. We propose a generative model that considers dependent edge formation and devise techniques for efficient cluster recovery

ScholarWorks@UMass Amherst

Big data analytics for large-scale wireless networks: Challenges and opportunities

Author: Dai HN
Vasilakos AV
Wang H
Wong RCW
Zheng Z
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

© 2019 Association for Computing Machinery. The wide proliferation of various wireless communication systems and wireless devices has led to the arrival of big data era in large-scale wireless networks. Big data of large-scale wireless networks has the key features of wide variety, high volume, real-time velocity, and huge value leading to the unique research challenges that are different from existing computing systems. In this article, we present a survey of the state-of-art big data analytics (BDA) approaches for large-scale wireless networks. In particular, we categorize the life cycle of BDA into four consecutive stages: Data Acquisition, Data Preprocessing, Data Storage, and Data Analytics. We then present a detailed survey of the technical solutions to the challenges in BDA for large-scale wireless networks according to each stage in the life cycle of BDA. Moreover, we discuss the open research issues and outline the future directions in this promising area

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

NORA - Norwegian Open Research Archives

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen