112 research outputs found

    Processing count queries over event streams at multiple time granularities

    Get PDF
    Management and analysis of streaming data has become crucial with its applications in web, sensor data, network tra c data, and stock market. Data streams consist of mostly numeric data but what is more interesting is the events derived from the numerical data that need to be monitored. The events obtained from streaming data form event streams. Event streams have similar properties to data streams, i.e., they are seen only once in a fixed order as a continuous stream. Events appearing in the event stream have time stamps associated with them in a certain time granularity, such as second, minute, or hour. One type of frequently asked queries over event streams is count queries, i.e., the frequency of an event occurrence over time. Count queries can be answered over event streams easily, however, users may ask queries over di erent time granularities as well. For example, a broker may ask how many times a stock increased in the same time frame, where the time frames specified could be hour, day, or both. This is crucial especially in the case of event streams where only a window of an event stream is available at a certain time instead of the whole stream. In this paper, we propose a technique for predicting the frequencies of event occurrences in event streams at multiple time granularities. The proposed approximation method e ciently estimates the count of events with a high accuracy in an event stream at any time granularity by examining the distance distributions of event occurrences. The proposed method has been implemented and tested on di erent real data sets and the results obtained are presented to show its e ectiveness

    A look ahead approach to secure multi-party protocols

    Get PDF
    Secure multi-party protocols have been proposed to enable non-colluding parties to cooperate without a trusted server. Even though such protocols prevent information disclosure other than the objective function, they are quite costly in computation and communication. Therefore, the high overhead makes it necessary for parties to estimate the utility that can be achieved as a result of the protocol beforehand. In this paper, we propose a look ahead approach, specifically for secure multi-party protocols to achieve distributed k-anonymity, which helps parties to decide if the utility benefit from the protocol is within an acceptable range before initiating the protocol. Look ahead operation is highly localized and its accuracy depends on the amount of information the parties are willing to share. Experimental results show the effectiveness of the proposed methods

    Implementation of parallel nested transactions for nested rule execution in active databases

    Get PDF
    Ankara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1996.Thesis (Master's) -- Bilkent University, 1996.Includes bibliographical references leaves 53-56.(Jonventional, passive datal)ases, ex('cute transcictions or queries in response to the requests from a user or an application program. In contrcist, an Active Database Management System (ADI3MS) allows users to specify actions to be executed when some specific evcMits are signaled. ADBMSs ¿ichieve tliis feciture by mecins of rules. Execution of ruh's is an important part of an ADBMS which may affect the overall performanc'e of the system. Nested transactions are proposed as a rule execution model for ADBMSs. The nested trcinsciction model, in contrast to flat transactions, allows transactions to be started inside some other trcinsactions forming a transaction hierarchy. In this thesis, implementation issues of pcirallel nested transactions, wluM’e all the transactions in the hierarchy may run in pcirallel, aix' discussed for parallel rule execution in ADBMSs. Implementation of nested transactions ha.s I^een performed by extending the flat trcuisaction semantics of OpenOODB using Solaris threads. A formal specification of the proposed (xxec.ution model using ACTA framework is also provided.Saygın, YücelM.S

    Secret charing vs. encryption-based techniques for privacy preserving data mining

    Get PDF
    Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach

    New features for sentiment analysis: Do sentences matter?

    Get PDF
    1st International Workshop on Sentiment Discovery from Affective Data 2012, SDAD 2012 - In Conjunction with ECML-PKDD 2012; Bristol; United Kingdom; 28 September 2012 through 28 September 2012In this work, we propose and evaluate new features to be used in a word polarity based approach to sentiment classification. In particular, we analyze sentences as the first step before estimating the overall review polarity. We consider different aspects of sentences, such as length, purity, irrealis content, subjectivity, and position within the opinionated text. This analysis is then used to find sentences that may convey better information about the overall review polarity. The TripAdvisor dataset is used to evaluate the effect of sentence level features on polarity classification. Our initial results indicate a small improvement in classification accuracy when using the newly proposed features. However, the benefit of these features is not limited to improving sentiment classification accuracy since sentence level features can be used for other important tasks such as review summarization.European Commission, FP7, under UBIPOL (Ubiquitous Participation Platform for Policy Making) Projec

    Privacy preserving spatio-temporal clustering on horizontally partitioned data

    Get PDF
    Time-stamped location information is regarded as spatio-temporal data and, by its nature, such data is highly sensitive from the perspective of privacy. In this paper, we propose a privacy preserving spatio-temporal clustering method for horizontally partitioned data which, to the best of our knowledge, was not done before. Our methods are based on building the dissimilarity matrix through a series of secure multi-party trajectory comparisons managed by a third party. Our trajectory comparison protocol complies with most trajectory comparison functions and complexity analysis of our methods shows that our protocol does not introduce extra overhead when constructing dissimilarity matrix, compared to the centralized approach. This work was funded by the Information Society Technologies programme of the European Commission, Future and Emerging Technologies under IST-014915 GeoPKDD project

    Towards trajectory anonymization: a generalization-based approach

    Get PDF
    Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques

    Discovering the prerequisite relationships among instructional videos from subtitles

    Get PDF
    Nowadays, students prefer to complement their studies with online video materials. While there are many video e-learning resources available on the internet, video sharing platforms which provide these resources, such as YouTube, do not structure the presented materials in a prerequisite order. As a result, learners are not able to use the existing materials effectively since they do not know in which order they need to be studied. Our aim is to overcome this limitation of existing video sharing systems and improve the learning experience of their users by discovering prerequisite relationships among videos where basic materials are covered prior to more advanced ones. Experiments performed on commonly used gold standard datasets show the effectiveness of the proposed approach utilizing measures based on phrase similarity scores

    Discovering private trajectories using background information

    Get PDF
    Trajectories are spatio-temporal traces of moving objects which contain valuable information to be harvested by spatio-temporal data mining techniques. Applications like city traffic planning, identification of evacuation routes, trend detection, and many more can benefit from trajectory mining. However, the trajectories of individuals often contain private and sensitive information, so anyone who possess trajectory data must take special care when disclosing this data. Removing identifiers from trajectories before the release is not effective against linkage type attacks, and rich sources of background information make it even worse. An alternative is to apply transformation techniques to map the given set of trajectories into another set where the distances are preserved. This way, the actual trajectories are not released, but the distance information can still be used for data mining techniques such as clustering. In this paper, we show that an unknown private trajectory can be reconstructed using the available background information together with the mutual distances released for data mining purposes. The background knowledge is in the form of known trajectories and extra information such as the speed limit. We provide analytical results which bound the number of the known trajectories needed to reconstruct private trajectories. Experiments performed on real trajectory data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds
    corecore