35 research outputs found

    SCFM: Social and crowdsourcing factorization machines for recommendation

    Get PDF
    With the rapid development of social networks, the exponential growth of social information has attracted much attention. Social information has great value in recommender systems to alleviate the sparsity and cold start problem. On the other hand, the crowd computing empowers recommender systems by utilizing human wisdom. Internal user reviews can be exploited as the wisdom of the crowd to contribute information. In this paper, we propose social and crowdsourcing factorization machines, called SCFM. Our approach fuses social and crowd computing into the factorization machine model. For social computing, we calculate the influence value between users by taking users’ social information and user similarity into account. For crowd computing, we apply LDA (Latent Dirichlet Allocation) on people review to obtain sets of underlying topic probabilities. Furthermore, we impose two important constraints called social regularization and domain inner regularization. The experimental results show that our approach outperforms other state-of-the-art methods.This project is supported by the National Natural Science Foundation of China (Nos. 61672340, 61472240, 61572268)

    An MDA approach for developing secure OLAP applications: Metamodels and transformations

    Get PDF
    Decision makers query enterprise information stored in DataWarehouses (DW) by using tools (such as On-Line Analytical Processing (OLAP) tools) which employ specific views or cubes from the corporate DW or Data Marts, based on multidimensional modelling. Since the information managed is critical, security constraints have to be correctly established in order to avoid unauthorized access. In previous work we defined a Model-Driven based approach for developing a secure DW repository by following a relational approach. Nevertheless, it is also important to define security constraints in the metadata layer that connects the DW repository with the OLAP tools; that is, over the same multidimensional structures that end users manage. This paper incorporates a proposal for developing secure OLAP applications within our previous approach: it improves a UML profile for conceptual modelling; it defines a logical metamodel for OLAP applications; and it defines and implements transformations from conceptual to logical models, as well as from logical models to secure implementation in a specific OLAP tool (SQL Server Analysis Services). © 2015 ComSIS Consortium. All rights reserved.This research is part of the following projects: SIGMA-CC (TIN2012-36904), GEODAS-BC (TIN2012-37493-C01) and GEODAS-BI (TIN2012-37493-C03) funded by the Ministerio de Economía y Competitividad and Fondo Europeo de Desarrollo Regional FEDER

    An MDA approach for developing Secure OLAP applications: metamodels and transformations

    Get PDF
    Decision makers query enterprise information stored in Data Warehouses (DW) by using tools (such as On-Line Analytical Processing (OLAP) tools) which employ specific views or cubes from the corporate DW or Data Marts, based on multidimensional modelling. Since the information managed is critical, security constraints have to be correctly established in order to avoid unauthorized access. In previous work we defined a Model-Driven based approach for developing a secure DW repository by following a relational approach. Nevertheless, it is also important to define security constraints in the metadata layer that connects the DW repository with the OLAP tools; that is, over the same multidimensional structures that end users manage. This paper incorporates a proposal for developing secure OLAP applications within our previous approach: it improves a UML profile for conceptual modelling; it defines a logical metamodel for OLAP applications; and it defines and implements transformations from conceptual to logical models, as well as from logical models to secure implementation in a specific OLAP tool (SQL Server Analysis Services).This research is part of the following projects: SIGMA-CC (TIN2012-36904), GEODAS-BC (TIN2012-37493-C01) and GEODAS-BI (TIN2012-37493-C03) funded by the Ministerio de EconomĂ­a y Competitividad and Fondo Europeo de Desarrollo Regional FEDER. SERENIDAD (PEII11-037-7035) and MOTERO (PEII11- 0399-9449) funded by the ConsejerĂ­a de EducaciĂłn, Ciencia y Cultura de la Junta de Comunidades de Castilla La Mancha, and Fondo Europeo de Desarrollo Regional FEDER

    Advance of the Access Methods

    Get PDF
    The goal of this paper is to outline the advance of the access methods in the last ten years as well as to make review of all available in the accessible bibliography methods

    Post-authorship attribution using regularized deep neural network

    Get PDF
    Post-authorship attribution is a scientific process of using stylometric features to identify the genuine writer of an online text snippet such as an email, blog, forum post, or chat log. It has useful applications in manifold domains, for instance, in a verification process to proactively detect misogynistic, misandrist, xenophobic, and abusive posts on the internet or social networks. The process assumes that texts can be characterized by sequences of words that agglutinate the functional and content lyrics of a writer. However, defining an appropriate characterization of text to capture the unique writing style of an author is a complex endeavor in the discipline of computational linguistics. Moreover, posts are typically short texts with obfuscating vocabularies that might impact the accuracy of authorship attribution. The vocabularies include idioms, onomatopoeias, homophones, phonemes, synonyms, acronyms, anaphora, and polysemy. The method of the regularized deep neural network (RDNN) is introduced in this paper to circumvent the intrinsic challenges of post-authorship attribution. It is based on a convolutional neural network, bidirectional long short-term memory encoder, and distributed highway network. The neural network was used to extract lexical stylometric features that are fed into the bidirectional encoder to extract a syntactic feature-vector representation. The feature vector was then supplied as input to the distributed high networks for regularization to minimize the network-generalization error. The regularized feature vector was ultimately passed to the bidirectional decoder to learn the writing style of an author. The feature-classification layer consists of a fully connected network and a SoftMax function to make the prediction. The RDNN method was tested against thirteen state-of-the-art methods using four benchmark experimental datasets to validate its performance. Experimental results have demonstrated the effectiveness of the method when compared to the existing state-of-the-art methods on three datasets while producing comparable results on one dataset.The Department of Science and Technology (DST) and the Council for Scientific and Industrial Research (CSIR).https://www.mdpi.com/journal/applsciam2023Computer Scienc

    Observation-based Fine Grained Access Control of Data

    Get PDF
    In this paper, we propose an observation-based fine grained access control (OFGAC) mechanism where data are made accessible at various levels of abstractions according to their sensitivity levels. In this setting, unauthorized users are not able to infer the exact content of the confidential data, while they are allowed to get partial information out of it, according to their access rights. The traditional fine grained access control (FGAC) can be seen as a special case of the OFGAC framework

    On Pattern Mining in Graph Data to Support Decision-Making

    Get PDF
    In recent years graph data models became increasingly important in both research and industry. Their core is a generic data structure of things (vertices) and connections among those things (edges). Rich graph models such as the property graph model promise an extraordinary analytical power because relationships can be evaluated without knowledge about a domain-specific database schema. This dissertation studies the usage of graph models for data integration and data mining of business data. Although a typical company's business data implicitly describes a graph it is usually stored in multiple relational databases. Therefore, we propose the first semi-automated approach to transform data from multiple relational databases into a single graph whose vertices represent domain objects and whose edges represent their mutual relationships. This transformation is the base of our conceptual framework BIIIG (Business Intelligence with Integrated Instance Graphs). We further proposed a graph-based approach to data integration. The process is executed after the transformation. In established data mining approaches interrelated input data is mostly represented by tuples of measure values and dimension values. In the context of graphs these values must be attached to the graph structure and aggregated measure values are graph attributes. Since the latter was not supported by any existing model, we proposed the use of collections of property graphs. They act as data structure of the novel Extended Property Graph Model (EPGM). The model supports vertices and edges that may appear in different graphs as well as graph properties. Further on, we proposed some operators that benefit from this data structure, for example, graph-based aggregation of measure values. A primitive operation of graph pattern mining is frequent subgraph mining (FSM). However, existing algorithms provided no support for directed multigraphs. We extended the popular gSpan algorithm to overcome this limitation. Some patterns might not be frequent while their generalizations are. Generalized graph patterns can be mined by attaching vertices to taxonomies. We proposed a novel approach to Generalized Multidimensional Frequent Subgraph Mining (GM-FSM), in particular the first solution to generalized FSM that supports not only directed multigraphs but also multiple dimensional taxonomies. In scenarios that compare patterns of different categories, e.g., fraud or not, FSM is not sufficient since pattern frequencies may differ by category. Further on, determining all pattern frequencies without frequency pruning is not an option due to the computational complexity of FSM. Thus, we developed an FSM extension to extract patterns that are characteristic for a specific category according to a user-defined interestingness function called Characteristic Subgraph Mining (CSM). Parts of this work were done in the context of GRADOOP, a framework for distributed graph analytics. To make the primitive operation of frequent subgraph mining available to this framework, we developed Distributed In-Memory gSpan (DIMSpan), a frequent subgraph miner that is tailored to the characteristics of shared-nothing clusters and distributed dataflow systems. Finally, the results of use case evaluations in cooperation with a large scale enterprise will be presented. This includes a report of practical experiences gained in implementation and application of the proposed algorithms
    corecore