10 research outputs found

    PLDANet: Reasonable Combination of PCA and LDA Convolutional Networks

    Get PDF
    Integrating deep learning with traditional machine learning methods is an intriguing research direction. For example, PCANet and LDANet adopts Principal Component Analysis (PCA) and Fisher Linear Discriminant Analysis (LDA) to learn convolutional kernels separately. It is not reasonable to adopt LDA to learn filter kernels in each convolutional layer, local features of images from different classes may be similar, such as background areas. Therefore, it is meaningful to adopt LDA to learn filter kernels only when all the patches carry information from the whole image. However, to our knowledge, there are no existing works that study how to combine PCA and LDA to learn convolutional kernels to achieve the best performance. In this paper, we propose the convolutional coverage theory. Furthermore, we propose the PLDANet model which adopts PCA and LDA reasonably in different convolutional layers based on the coverage theory. The experimental study has shown the effectiveness of the proposed PLDANet model

    Query with Assumptions for Probabilistic Relational Databases

    Get PDF
    Users may have prior knowledge about a probabilistic database. They prefer to query over a probabilistic database on their prior knowledge which cannot be written as component clauses of conventional SQL queries. A naive approach is to query over a new database version, which is generated by transforming the original probabilistic database to satisfy users\u27 prior knowledge; however, it is impractical to generate a different probabilistic database version for each prior knowledge. In this paper, we propose the concept of the query with assumptions which allow users to describe their prior knowledge with a newly introduced ASSUMPTION clause of SQL. We also propose an approach to obtain the result of a query based on assumption clauses. The experimental studies show our approach has better performance compared to the naive approach

    An Efficient Top-k Query Scheme Based on Multilayer Grouping

    Get PDF
    The top-k query is to find the k data that has the highest scores from a candidate dataset. Sorting is a common method to find out top-k results. However, most of existing methods are not efficient enough. To remove this issue, we propose an efficient top-k query scheme based on multilayer grouping. First, we find the reference item by computing the average score of the candidate dataset. Second, we group the candidate dataset into three datasets: winner set, middle set and loser set based on the reference item. Third, we further group the winner set to the second-layer three datasets according to k value. And so on, until the data number of winner set is close to k value. Meanwhile, if k value is larger than the data number of winner set, we directly return the winner set to the user as a part of top-k results almost without sorting. In this case, we also return the top results with the highest scores from the middle set almost without sorting. Based on above innovations, we almost minimize the sorting. Experimental results show that our scheme significantly outperforms the current classical method on the performance of memory consumption and top-k query

    Enabling Access Control for Encrypted Multi-Dimensional Data in Cloud Computing through Range Search

    Get PDF
    With the growing popularity of cloud computing, data owners are increasingly opting to outsource their data to cloud servers due to the numerous benefits it offers. However, this outsourcing raises concerns about data privacy since the data stored on remote cloud servers is not directly controlled by the owners. Encryption of the data is an effective approach to mitigate these privacy concerns. However, encrypted data lacks distinguishability, leading to limitations in supporting common operations such as range search and access control. In this research paper, we propose a method called RSAC (Range Search Supporting Access Control) for encrypted multi-dimensional data in cloud computing. Our method leverages policy design, bucket embedding, algorithm design, and Ciphertext Policy-Attribute Based Encryption (CPABE) to achieve its objectives. We present extensive experimental results that demonstrate the efficiency of our method and conduct a thorough security analysis to ensure its robustness. Our proposed RSAC method addresses the challenges of range search and access control over encrypted multi-dimensional data, thus contributing to enhancing privacy and security in cloud computing environments

    A Method for Automatically Generating Join Queries Based on Relations-Attributes Distance Matrix over Data Lakes

    No full text
    Techniques for identifying joinable or unionable tables in data lakes can yield valuable information for data scientists. However, more than half of their working time is spent familiarizing themselves with the metadata and correlations of datasets. Simplifying the use of information in data lakes is crucial for enhancing their utilization. The existing solution of integrating correlated relations into a single large data table via full disjunction requires integration updating when either data or metadata changes, complicating data maintenance. This paper proposes a method for automatically generating join queries based on the distance matrix of relations and attributes in data lakes. The distance matrix only requires updating when metadata changes, simplifying data maintenance. Experimental results demonstrate that once the distance matrix is generated, the time required to generate the join queries is negligible. Compared to the existing solution, the time cost for executing join queries over correlated tables is nearly identical to that of selection queries over integrated tables. The results of these two queries are also the same, showcasing the effectiveness and efficiency of our method

    Privacy-Guarding Optimal Route Finding with Support for Semantic Search on Encrypted Graph in Cloud Computing Scenario

    No full text
    The arrival of cloud computing age makes data outsourcing an important and convenient application. More and more individuals and organizations outsource large amounts of graph data to the cloud computing platform (CCP) for the sake of saving cost. As the server on CCP is not completely honest and trustworthy, the outsourcing graph data are usually encrypted before they are sent to CCP. The optimal route finding on graph data is a popular operation which is frequently used in many fields. The optimal route finding with support for semantic search has stronger query capabilities, and a consumer can use similar words of graph vertices as query terms to implement optimal route finding. Due to encrypting the outsourcing graph data before they are sent to CCP, it is not easy for data customers to manipulate and further use the encrypted graph data. In this paper, we present a solution to execute privacy-guarding optimal route finding with support for semantic search on the encrypted graph in the cloud computing scenario (PORF). We designed a scheme by building secure query index to implement optimal route finding with support for semantic search based on searchable encryption idea and stemmer mechanism. We give formal security analysis for our scheme. We also analyze the efficiency of our scheme through the experimental evaluation

    A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

    No full text
    We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings
    corecore