132 research outputs found

    Blogs Search Engine Adopting RSS Syndication Using Fuzzy Logic

    Get PDF
    The rapid development of Internet increases the writers of blog sites. Sometimes these blog sites focused on solving some important problems. To find specific blogs are hard problem for the users because a lot of these blogs contain unuseful information such as online advertisements, notice and noise which minimize the rank of blog site. Furthermore to retrieve more relevant blogs is another problem which lowering the search performance. This study proposes blogs search engine adopting RSS syndication using Fuzzy logic. The blogs search engine consists of three main phases which are crawling using RSS feeds algorithm, indexing weblogs algorithm and searching technique with Fuzzy logic. In RSS crawling process RSS feeds need to be gathered to extract useful information such as title, links, publish time and description. Indexing weblogs use the links to retrieve the blogs sites for text processing and construct indexing database. In order to retrieve such information needed by any user, there is user interface to search for keyword with importance degree and compute the density of keyword from the indexing database. The rank of the pages is computed based on fuzzy weighted average value. A prototype is built using visual basic 2008 to validate the proposed blogs search engine. It is a windows application with http connection protocol. In system evaluation used two measurement performances which are precision and mean average precision. The parameters of precision determine based on respondents whom determine the total retrieved links and the total relevant links for the keyword search result. The number of keywords that used in testing system is five pairs keywords. The experimental results show that the mean average precision is 81.7% of the whole system performance. The percent of respondents is 80% who knows and uses the blogs and 20% don’t have knowledge. The execution time of the system based on respondents is 70% between 3-5 minute and 30% less than 3 minute. This percentage is good considering the rate of satisfaction for system is 80% satisfied and 20% strongly satisfied

    Artificial intelligence for ocean science data integration:current state, gaps, and way forward

    Get PDF

    Entropy-based privacy against profiling of user mobility

    Get PDF
    Location-based services (LBSs) flood mobile phones nowadays, but their use poses an evident privacy risk. The locations accompanying the LBS queries can be exploited by the LBS provider to build the user profile of visited locations, which might disclose sensitive data, such as work or home locations. The classic concept of entropy is widely used to evaluate privacy in these scenarios, where the information is represented as a sequence of independent samples of categorized data. However, since the LBS queries might be sent very frequently, location profiles can be improved by adding temporal dependencies, thus becoming mobility profiles, where location samples are not independent anymore and might disclose the user's mobility patterns. Since the time dimension is factored in, the classic entropy concept falls short of evaluating the real privacy level, which depends also on the time component. Therefore, we propose to extend the entropy-based privacy metric to the use of the entropy rate to evaluate mobility profiles. Then, two perturbative mechanisms are considered to preserve locations and mobility profiles under gradual utility constraints. We further use the proposed privacy metric and compare it to classic ones to evaluate both synthetic and real mobility profiles when the perturbative methods proposed are applied. The results prove the usefulness of the proposed metric for mobility profiles and the need for tailoring the perturbative methods to the features of mobility profiles in order to improve privacy without completely loosing utility.This work is partially supported by the Spanish Ministry of Science and Innovation through the CONSEQUENCE (TEC2010-20572-C02-01/02) and EMRISCO (TEC2013-47665-C4-4-R) projects.The work of Das was partially supported by NSF Grants IIS-1404673, CNS-1355505, CNS-1404677 and DGE-1433659. Part of the work by Rodriguez-Carrion was conducted while she was visiting the Computer Science Department at Missouri University of Science and Technology in 2013–2014

    Pervasive Data Access in Wireless and Mobile Computing Environments

    Get PDF
    The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. with wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered

    Anonymization procedures for tabular data: an explanatory technical and legal synthesis

    Get PDF
    In the European Union, Data Controllers and Data Processors, who work with personal data, have to comply with the General Data Protection Regulation and other applicable laws. This affects the storing and processing of personal data. But some data processing in data mining or statistical analyses does not require any personal reference to the data. Thus, personal context can be removed. For these use cases, to comply with applicable laws, any existing personal information has to be removed by applying the so-called anonymization. However, anonymization should maintain data utility. Therefore, the concept of anonymization is a double-edged sword with an intrinsic trade-off: privacy enforcement vs. utility preservation. The former might not be entirely guaranteed when anonymized data are published as Open Data. In theory and practice, there exist diverse approaches to conduct and score anonymization. This explanatory synthesis discusses the technical perspectives on the anonymization of tabular data with a special emphasis on the European Union’s legal base. The studied methods for conducting anonymization, and scoring the anonymization procedure and the resulting anonymity are explained in unifying terminology. The examined methods and scores cover both categorical and numerical data. The examined scores involve data utility, information preservation, and privacy models. In practice-relevant examples, methods and scores are experimentally tested on records from the UCI Machine Learning Repository’s “Census Income (Adult)” dataset

    An experimental study of learned cardinality estimation

    Get PDF
    Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this thesis, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five new learned methods with eight traditional methods on four real-world datasets under a unified workload setting. The results show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates). We find that they can- not catch up with fast data updates and return large errors for different reasons. For less frequent updates, they can perform better but there is no clear winner among themselves. Thirdly, we take a deeper look into learned models and explore when they may go wrong. Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size. More importantly, their behaviors are much harder to interpret and often unpredictable. Based on these findings, we identify two promising research directions (control the cost of learned models and make learned models trustworthy) and suggest a number of research opportunities. We hope that our study can guide researchers and practitioners to work together to eventually push learned cardinality estimators into real database systems
    • …
    corecore