74 research outputs found

    An Inferable Representation Learning for Fraud Review Detection with Cold-start Problem

    Full text link
    © 2019 IEEE. Fraud review significantly damages the business reputation and also customers' trust to certain products. It has become a serious problem existing on the current social media. Various efforts have been put in to tackle such problems. However, in the case of cold-start where a review is posted by a new user who just pops up on the social media, common fraud detection methods may fail because most of them are heavily depended on the information about the user's historical behavior and its social relation to other users, yet such information is lacking in the cold-start case. This paper presents a novel Joint-bEhavior-and-Social-relaTion-infERable (JESTER) embedding method to leverage the user reviewing behavior and social relations for cold-start fraud review detection. JESTER embeds the deep characteristics of existing user behavior and social relations of users and items in an inferable user-item-review-rating representation space where the representation of a new user can be efficiently inferred by a closed-form solution and reflects the user's most probable behavior and social relations. Thus, a cold-start fraud review can be effectively detected accordingly. Our experiments show JESTER (i) performs significantly better in detecting fraud reviews on four real-life social media data sets, and (ii) effectively infers new user representation in the cold-start problem, compared to three state-of-the-art and two baseline competitors

    Reliable Sentiment Analysis in Social Media

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Sentiment analysis in social media is critical yet challenging because the source materials (i.e., reviews posted in social media) are with high complexity, low quality, and uncertain credibility. For example, words and sentences in a textual review may couple with each other, and they may have heterogeneous meanings under different contexts or in different language locales. These couplings and heterogeneities essentially determine the sentiment polarity of the review but are too complex to be captured and modeled. Also, social reviews contain a large number of informal words and typos (a.k.a., noise) but a rare number of vocabularies (a.k.a., sparsity). As a result, most of the existing natural language processing (NLP) methods may fail to represent social reviews effectively. Furthermore, a large proportion of social reviews are posted by fraudsters. These fraud reviews manipulate social opinion, and thus, they disturb sentiment analysis. This research focuses on reliable sentiment analysis in social media. It systematically investigates the sentiment analysis techniques to tackle three major challenges in social media: high data complexity, low data quality, and uncertain credibility. Specifically, this research focuses on two research problems: general sentiment analysis in social media and fraudulent sentiment analysis in social media. The general sentiment analysis targets on tackling high data complexity and low-quality of social articles that are credible. The fraudulent sentiment analysis handles the uncertain credibility issue, which is common and profoundly affects the precise sentiment analysis in social media. Based on these investigations, this research proposes a serial of methods to achieve reliable sentiment analysis: It studies the polarity-shift characteristics and non-IID characteristics in general paragraphs to capture the sentiment more accurately. It further models multi-granularity noise and sparsity in short text, which is the most common data in social media, for robust short text sentiment analysis. Finally, it tackles the uncertain credibility problem in social media by studying fraudulent sentiment analysis in both supervised and unsupervised scenarios. This research evaluates the performance and properties of the proposed reliable sentiment analysis methods by extensive experiments on large real-world data sets. It demonstrates that the proposed methods are superior and reliable in social media sentiment analysis

    Non-IID representation learning on complex categorical data

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Learning complex categorical data requires proper vector or metric representations of the intricate characteristics of that data. Existing methods for categorical data representation usually assume data is independent and identically distributed (IID). However, real-world data is often hierarchically associated with diverse couplings and heterogeneities (i.e., non-IIDness, e.g., various couplings such as value co-occurrences and attribute correlation and dependency, as well as heterogeneities such as heterogeneous distributions or complementary and inconsistent relations). Existing methods either capture only some of these couplings and heterogeneities or simply assume IID data in building their representations. This thesis aims to deeply understand and effectively represent non-IIDness in categorical data. Specifically, it focuses on (1) modeling heterogeneous couplings within and between attributes in categorical data; (2) disentangling attribute couplings with a mixture of heterogeneous distributions; (3) hierarchically learning heterogeneous couplings; (4) integrating complementary and inconsistent heterogeneous couplings; and (5) adaptively identifying and learning dynamic couplings and heterogeneities. Accordingly, this thesis proposes (1) a non-IID similarity metrics learning framework to model complex interactions within and between attributes in non-IID categorical data; (2) a decoupled non-IID learning framework to capture and embed heterogeneous distributions in non-IID categorical data with bounded information loss; (3) a heterogeneous metric learning method with hierarchical couplings to learn and integrate the heterogeneous dependencies and distributions in non-IID categorical data into a representation of a similarity metric; (4) an unsupervised heterogeneous coupling learning approach to integrate the complementary and inconsistent heterogeneous couplings in non-IID categorical data; and (5) an unsupervised hierarchical and heterogeneous coupling learning method to learn hierarchical and heterogeneous couplings on dynamic non-IID categorical data. Theoretical analyses support the effectiveness of the proposed methods and bound the information loss in their generated high-quality representations. Extensive experiments demonstrate that the proposed non-IID representation methods for complex categorical data perform significantly better than state-of-the-art methods in terms of multiple downstream learning tasks and representation-quality evaluation metrics

    Privacy Protection in Conversations

    Get PDF
    Leakage of personal information in online conversations raises serious privacy concerns. For example, malicious users might collect sensitive personal information from vulnerable users via deliberately designed conversations. This thesis tackles the problem of privacy leakage in textual conversations and proposes to mitigate the risks of privacy disclosure by detecting and rewriting the risky utterances. Previous research on privacy protection in text has a focus on manipulating the implicit semantic representations in a continuous high dimensional space, which are mostly used for eliminating trails of personal information to machine learning models. Our research has a focus on the explicit expressions of conversations, namely sequences of words or tokens, which are generally used between human interlocutors or human-computer interactions. The new setting for privacy protection in text could be applied to the conversations by individual human users, such as vulnerable people, and artificial conversational bots, such as digital personal assistants. This thesis consists of two parts, answering two research questions: How to detect the utterances with the risk of privacy leakage? and How to modify or rewrite the utterances into the ones with less private information? In the first part of this thesis, we aim to detect the utterances with privacy leakage risk and report the sensitive utterances to authorized users for approval. One of the essential challenge of the detection task is that we cannot acquire a large-scale aligned corpus for supervised training of natural language inference for private information. A compact dataset is collect to merely validate the privacy leakage detection models. We investigate weakly supervised methods to learn utterance-level inference from coarse set-level alignment signals. Then, we propose novel alignment models for utterance inference. Our approaches manage to outperform competitive baseline alignment methods. Additionally, we develop a privacy-leakage detection system integrated in Facebook Messenger to demonstrate the utility of our proposed task in real-world usage scenarios. In the second part of this thesis, we investigate two pieces of work to rewrite the privacy-leakage sentences automatically into less sensitive ones. The first work discusses obscuring personal information in form of classifiable attributes. We propose to reduce the bias of sensitive attributes, such as gender, political slant and race, using an obscured text rewriting models. The rewriting models are guided by corresponding classifiers for the personal attributes to obscure. Adversarial training and fairness risk measurement are proposed to enhance the fairness of the generators, alleviating privacy leakage of the target attributes. The second work protects personal information in the form of open-domain textual descriptions. We further explore three feasible rewriting strategies, deleting, obscuring, and steering, for privacy-aware text rewriting. We investigate the possibility of fine-tuning a pre-trained language model for privacy-aware text rewriting. Based on our dataset, we further observe the relation of rewriting strategies to their semantic spaces in a knowledge graph. Then, a simple but effective decoding method is developed to incorporate these semantic spaces into the rewriting models. As a whole, this thesis presents a comprehensive study and the first solutions in varying settings for protecting privacy in conversations. We demonstrate that both privacy leakage detection and privacy-aware text rewriting are plausible using machine learning methods. Our contributions also include novel ideas for text alignment for natural language inference, training technologies for attribute obfuscating, and open-domain knowledge guidance to text rewriting. This thesis opens up inquiries into protecting sensitive user information in conversations from the perspective of explicit text representation

    Outrageous Fortune and the Criminalization of Mass Torts

    Get PDF
    The case of the blameworthy-but-fortunate defendant has emerged as one of the most perplexing scenarios in mass tort litigation today. One need look no further than the front page of the newspaper to find examples of mass tort defendants said to have engaged in irresponsible conduct - even conduct that one might regard as morally outrageous in character - but that nonetheless advance eminently plausible contentions that they have not caused harm to others. This issue is not merely a matter for abstract speculation. A now-familiar mass tort scenario involves a defendant that markets a product without informing consumers about tentative suspicions of some health hazard. These initial suspicions, however, ultimately may not prove true. In fact, subsequent scientific research may support, perhaps convincingly, a contention by such a defendant that there simply is no causal link between its product and the malady from which the plaintiff suffers. The most prominent example of this first scenario is the ongoing controversy over silicone gel breast implants. A second and even more problematic situation involves a defendant that does not merely fail to warn consumers about a risk that may be associated with its product, but that affirmatively induces repeated use through outright fraud or obfuscation of the product\u27s risk. This second scenario describes the allegations advanced in the current controversy over nicotine in tobacco products. Here, the problem also is one of causation, but of a different sort: the centuries-old awareness of the hazards of smoking, including the widespread recognition that the practice can be exceedingly hard to quit, makes it difficult to attribute the maladies of current smokers to an informational shortfall brought about by the tobacco industry. Fraud is the cause of harm, after all, only when one\u27s fraudulent misrepresentations are apt to be believed and acted upon - only when, at the very least, other people are not saying loudly that one is lying. In this respect, the causal problem posed by the tobacco litigation is by no means restricted to that particular context. Rather, the same concerns likely would surface with respect to the many variations on the same theme that one might envision in the future for the alcohol industry, the fast-food industry, or the purveyors of other products with long-recognized health risks

    Impact of Federal Acquisition Regulation Change on Contractor Misconduct

    Get PDF
    The Project on Government Oversight listed 632 reported acts of government contractor misconduct since 2007 that resulted in settlements or fines totaling $41.95 billion in the government contracting industry. Government contracting officials changed the Federal Acquisition Regulation (FAR) in 2009 to reduce acts of misconduct. The purpose of this causal-comparative study was to discover if the change to the FAR in 2009 significantly reduced the rate of reported contractor misconduct and to investigate the impact of the change on government contractor ethics business processes. Deterrence theory guided this study of how the change to the FAR in 2009 impacted the rate of reported government contractor misconduct (dependent variable) and government contractor ethics business processes (dependent variable). Data were collected on the top 100 government contractors over 2 separate 3-year time periods (independent variable), 2006 through 2008 and 2010 through 2012, before and after the change to the FAR. Data extracted from official government databases and government oversight organizations included annual contract awards (n = 600), contractor misconduct reports (n = 600), and contractor ethics business process records (n = 600). A Wilcoxon Signed-Ranks test resulted in 2 findings. First, the rate of reported government contractor misconduct was not significantly reduced by the change to the FAR in 2009, z = -0.949, p = .34, r = -.072. Second, government contractor ethics business processes were significantly impacted by the change to the FAR in 2009, z = -12.263, p \u3c .001, r = -.763. This study may contribute to positive social change by informing federal contracting authorities and corporate executives that implementing ethics business processes did not reduce misconduct. These findings call for further action to improve corporate ethical behavior

    BARCH: a business analytics problem formulation and solving framework

    Get PDF
    The BARCH framework is a business framework that is specifically formulated to help analysts and management who want to identify and formulate a scenario to which Analytics can be applied and the outcome will have a direct impact on the business. This is the overarching public work that I have used extensively in various projects and research. This framework has been developed initially in the banking sector and has evolved progressively with successive projects. The framework’s name represents five aspects for the formulation and identification of an area that one can use Analytics to answer. The five aspects are Business, Analytics, Revenue, Cost and Human. The five aspects represent the entire system and approach to the identification, formulation, understanding and modelling of Analytic problems. The five aspects are not necessarily sequential but are interrelated in some ways where certain aspects are dependent on the other aspects. For example, revenue and cost are related to business and depend on the business from which they are derived. However, in most practices involving Analytics, Analytics are conducted independent of business and the techniques in Analytics are not derived from business directly. This lack of harmony between business and Analytics creates an unfortunate combination of factors that has led to the failure of Analytics projects for many businesses. In intensely practising Analytics and critically reflecting on every piece of work I have done, I have learned the importance of combining knowledge with skills and experience to come up with new knowledge and a form of practical wisdom. I also realize now the importance of understanding fields that are not directly related to my field of specialization. Through this context statement I have been able to increase the articulation of my thinking and the complexities of practice through approaches to knowledge such as transdisciplinarity which further supports the translation of what I can do and what needs to be done in a way that business clients can understand. Having the opportunity to explore concepts new to me from other academic fields and seeking their relevance and application in my own area of expertise has helped me considerably in the ongoing development of the BARCH framework and successful implementation of Analytics projects. I have selected the results of three projects published in papers that are listed in Appendices A-C to demonstrate how the model can be applied to solve problems successfully compared to other frameworks. The evolution of the model involves a continual feedback loop of learning from each successive project which contributes to the BARCH model being able to not only continuously demonstrate its applicability to various problems but to consistently produce better and more refined results. The majority of analytical models applied to the many problems in the business environment address the problems only superficially (Bose, 2009; Krioukov et. al., 2011), that is without understanding the impact on the business as a whole. Many Analytics projects have not delivered the promised impact because the models applied are overly complicated (Stubbs, 2013) to solve the root causes of the business problem. This situation is compounded by an increasing number of analysts applying Analytics to business problems without a proper understanding of the context, technique and environment (Stubbs, 2013). While many experts in the field interpret the problem as a multidisciplinary problem, the problem is in my opinion transdisciplinary in nature

    Smart technologies and beyond: exploring how a smart band can assist in monitoring children’s independent mobility & well-being

    Get PDF
    The problem which is being investigated through this thesis is not having a device(s) or method(s) which are appropriate for monitoring a child’s vital and tracking a child’s location. This aspect is being explored by other researchers which are yet to find a viable solution. This work focuses on providing a solution that would consider using the Internet of Things for measuring and improving children’s health. Additionally, the focus of this research is on the use of technology for health and the needs of parents who are concerned about their child’s physical health and well-being. This work also provides an insight into how technology is used during the pandemic. This thesis will be based on a mixture of quantitative and qualitative research, which will have been used to review the following areas covering key aspects and focuses of this study which are (i) Children’s Independent Mobility (ii) Physical activity for children (iii) Emotions of a child (iv) Smart Technologies and (v) Children’s smart wearables. This will allow a review of the problem in detail and how technology can help the health sector, especially for children. The deliverable of this study is to recommend a suitable smart band device that enables location tracking of the child, activity tracking as well as monitoring the health and wellbeing of the child. The research also includes an element of practical research in the form of (i) Surveys, the use of smart technology and a perspective on the solution from parents. (ii) Focus group, in the form of a survey allowing opinions and collection of information on the child and what the parents think of smart technology and how it could potentially help with their fears. (iii) Observation, which allows the collection of data from children who were given six activities to conduct while wearing the Fitbit Charge HR. The information gained from these elements will help provide guidelines for a proposed solution. In this thesis, there are three frameworks which are about (i) Research process for this study (ii) Key Performance Indicators (KPIs) which are findings from the literature review and (iii) Proposed framework for the solution, all three combined frameworks can help health professionals and many parents who want an efficient and reliable device, also deployment of technologies used in the health industry for children in support of independent mobility. Current frameworks have some considerations within the technology and medical field but were not up to date with the latest elements such as parents fears within today’s world and the advanced features of technology
    • …
    corecore