42 research outputs found

    A scalable framework for stylometric analysis query processing

    Get PDF
    This is an accepted manuscript of an article published by IEEE in 2016 IEEE 16th International Conference on Data Mining (ICDM) on 02/02/2017, available online: https://ieeexplore.ieee.org/document/7837960 The accepted version of the publication may differ from the final published version.Stylometry is the statistical analyses of variationsin the author's literary style. The technique has been used inmany linguistic analysis applications, such as, author profiling, authorship identification, and authorship verification. Over thepast two decades, authorship identification has been extensivelystudied by researchers in the area of natural language processing. However, these studies are generally limited to (i) a small number of candidate authors, and (ii) documents with similar lengths. In this paper, we propose a novel solution by modeling authorship attribution as a set similarity problem to overcome the two stated limitations. We conducted extensive experimental studies on a real dataset collected from an online book archive, Project Gutenberg. Experimental results show that in comparison to existing stylometry studies, our proposed solution can handlea larger number of documents of different lengths written by alarger pool of candidate authors with a high accuracy.Published versio

    One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation

    Full text link
    Cross-domain recommendation is an important method to improve recommender system performance, especially when observations in target domains are sparse. However, most existing techniques focus on single-target or dual-target cross-domain recommendation (CDR) and are hard to be generalized to CDR with multiple target domains. In addition, the negative transfer problem is prevalent in CDR, where the recommendation performance in a target domain may not always be enhanced by knowledge learned from a source domain, especially when the source domain has sparse data. In this study, we propose CAT-ART, a multi-target CDR method that learns to improve recommendations in all participating domains through representation learning and embedding transfer. Our method consists of two parts: a self-supervised Contrastive AuToencoder (CAT) framework to generate global user embeddings based on information from all participating domains, and an Attention-based Representation Transfer (ART) framework which transfers domain-specific user embeddings from other domains to assist with target domain recommendation. CAT-ART boosts the recommendation performance in any target domain through the combined use of the learned global user representation and knowledge transferred from other domains, in addition to the original user embedding in the target domain. We conducted extensive experiments on a collected real-world CDR dataset spanning 5 domains and involving a million users. Experimental results demonstrate the superiority of the proposed method over a range of prior arts. We further conducted ablation studies to verify the effectiveness of the proposed components. Our collected dataset will be open-sourced to facilitate future research in the field of multi-domain recommender systems and user modeling.Comment: 9 pages, accepted by WSDM 202

    Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

    Full text link
    Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicks, likes, shares, and follows, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks

    TransRec: Learning Transferable Recommendation from Mixture-of-Modality Feedback

    Full text link
    Learning large-scale pre-trained models on broad-ranging data and then transfer to a wide range of target tasks has become the de facto paradigm in many machine learning (ML) communities. Such big models are not only strong performers in practice but also offer a promising way to break out of the task-specific modeling restrictions, thereby enabling task-agnostic and unified ML systems. However, such a popular paradigm is mainly unexplored by the recommender systems (RS) community. A critical issue is that standard recommendation models are primarily built on categorical identity features. That is, the users and the interacted items are represented by their unique IDs, which are generally not shareable across different systems or platforms. To pursue the transferable recommendations, we propose studying pre-trained RS models in a novel scenario where a user's interaction feedback involves a mixture-of-modality (MoM) items, e.g., text and images. We then present TransRec, a very simple modification made on the popular ID-based RS framework. TransRec learns directly from the raw features of the MoM items in an end-to-end training manner and thus enables effective transfer learning under various scenarios without relying on overlapped users or items. We empirically study the transferring ability of TransRec across four different real-world recommendation settings. Besides, we look at its effects by scaling source and target data size. Our results suggest that learning neural recommendation models from MoM feedback provides a promising way to realize universal RS

    (1)H, (13)C, (15)N backbone and side-chain resonance assignments of the human Raf-1 kinase inhibitor protein

    Get PDF
    Raf-1 kinase inhibitor protein (RKIP) plays a pivotal role in modulating multiple signaling networks. Here we report backbone and side chain resonance assignments of uniformly (15)N, (13)C labeled human RKIP.Natural Science Foundation of China[30900233, 30730026]; Program of Shanghai Subject Chief Scientist[09XD1405100

    Privacy-Preserving Assessment of Location Data Trustworthiness

    Get PDF
    Assessing the trustworthiness of location data corresponding to individuals is essential in several applications, such as forensic science and epidemic control. To obtain accurate and trustworthy location data, analysts must often gather and correlate information from several independent sources, e.g., physical observation, witness testimony, surveillance footage, etc. However, such information may be fraudulent, its accuracy may be low, and its vol-ume may be insufficient to ensure highly trustworthy data. On the other hand, recent advancements in mobile computing and positioning systems, e.g., GPS-enabled cell phones, highway sensors, etc., bring new and effective technological means to track the location of an individual. Nevertheless, collection and sharing of such data must be done in ways that do not violate an individual’s right to personal privacy. Previous research efforts acknowledged the importance of assessing location data trustworthiness, but they assume that data is available to the analyst in direct, unperturbed form. However, such an assumption is not realistic, due to the fact that repositories of personal location data must conform to privacy regulations.In this paper, we study the challenging problem of refining trustworthiness of location data with the help of large repositories of anonymized information. We show how two important trustworthiness evaluation techniques, namely common pattern analysis and conflict/support analysis, can benefit from the use of anonymized location data. We have implemented a prototype of the proposed privacy-preserving trustworthiness evaluation techniques, and theexperimental results demonstrate that using anonymized data can significantly help in improving the accuracy of location trustworthiness assessment

    A Semi-Analytical Model and Parameter Analysis of a Collaborative Drainage Scheme for a Deeply Buried Tunnel and Parallel Adit in Water-Rich Ground

    No full text
    For a railway or highway tunnel under high water pressure during operation, various factors such as the design of the drainage system, material aging, and pipeline blockage must be considered for the tunnels to work with the parallel adit to drain and control the external water pressure on the tunnel lining. A simplified steady-state seepage model in a semi-infinite multi-connected domain for the tunnel and parallel adit was established and was solved iteratively using the complex variable method and the Schwartz alternating method. After verifying the numerical simulation, parametric analysis, orthogonal tests, and multivariate nonlinear regression were also carried out. Results show that the simplified theoretical model and its semi-analytical algorithm have a fast convergence speed, and the obtained regression formula is simple, which is suitable for calculation and parameter analysis. A scheme that primarily relies on the parallel adit for drainage would make the external water pressure of the lining facing the parallel adit side less than that of the opposite side. Therefore, to reduce pressure uniformly and meet the requirements of surrounding rock stability, the horizontal net distance between the parallel adit and the tunnel should be no less than the tunnel diameter. Drainage volume of the parallel adit is linearly negatively correlated with tunnel water pressure on the lining and has the most significant effect on pressure reduction. The influence of the vertical distance between the parallel adit and the tunnel on water pressure is small

    A Semi-Analytical Model and Parameter Analysis of a Collaborative Drainage Scheme for a Deeply Buried Tunnel and Parallel Adit in Water-Rich Ground

    No full text
    For a railway or highway tunnel under high water pressure during operation, various factors such as the design of the drainage system, material aging, and pipeline blockage must be considered for the tunnels to work with the parallel adit to drain and control the external water pressure on the tunnel lining. A simplified steady-state seepage model in a semi-infinite multi-connected domain for the tunnel and parallel adit was established and was solved iteratively using the complex variable method and the Schwartz alternating method. After verifying the numerical simulation, parametric analysis, orthogonal tests, and multivariate nonlinear regression were also carried out. Results show that the simplified theoretical model and its semi-analytical algorithm have a fast convergence speed, and the obtained regression formula is simple, which is suitable for calculation and parameter analysis. A scheme that primarily relies on the parallel adit for drainage would make the external water pressure of the lining facing the parallel adit side less than that of the opposite side. Therefore, to reduce pressure uniformly and meet the requirements of surrounding rock stability, the horizontal net distance between the parallel adit and the tunnel should be no less than the tunnel diameter. Drainage volume of the parallel adit is linearly negatively correlated with tunnel water pressure on the lining and has the most significant effect on pressure reduction. The influence of the vertical distance between the parallel adit and the tunnel on water pressure is small

    Reverse Design and Additive Manufacturing of Furniture Protective Foot Covers

    Get PDF
    Reverse design and additive manufacturing technologies are fast ways to develop customised products. In this study, furniture protective foot covers were taken as the design object. Using flexible filaments of polylactic acid (PLA) and the development process of reverse design to additive manufacturing, the protective foot covers were designed and manufactured to fit the shape of the chair feet. Furniture protective foot covers have high practical value. They have a certain buffering effect, avoiding the damage caused by the collision of furniture feet with the ground when moving furniture; secondly, they reduce the noise generated by the collision of furniture feet with the ground, creating a quiet and comfortable home environment. According to the finite element simulation results, the maximum stress value of the European-style chair installed with protective foot covers was decreased by 90.8% in the case of vertical fall, which verifies that the protective foot covers have an obvious buffering effect. Noise test results show that the noise of the European-style chair installed with protective foot covers was decreased by 51.0%, which verifies that the protective foot covers have an obvious quieting effect
    corecore