14 research outputs found

    UNDERSTANDING THE MASSIVE ONLINE REVIEWS: A NOVEL REPRESENTATIVE SUBSET EXTRACTION METHOD

    Get PDF
    Online review hasalready been recognized as an important sales assistant for consumers to make their purchase decision. However, with the rapid development of electronic commerce,overwhelming informationoverloads and review manipulation make consumers lost in ocean of reviews and face huge cognitive stress. To address this issue, different types of online review have been developed by online marketplaces. Especially, except traditional types of online reviews (positive, neutral and negative), several new types of online review (review with picture and additional review) do not only contain plain text, but also pictures. Consumers could attach additional reviews to the original reviews to further share their experience sometimes later. Few studies have focused on which types of online reviews are able to influence consumers’ decisions more efficiently. Especially, research on new types of reviews is still unanswered.Using data from Taobao.com, the biggest electronic marketplace in China,this study conducts an empirical investigation to bridge the gap. Weinvestigatethat whether and howtraditional text reviewsand new types of reviews influence consumers’ purchase decision making. The results show that under the context of information overload and review manipulation, traditional reviewsare still influential, but less effective than new types of reviews. Although review with picture and additional review don’t show valence directly, they present more reliable references towards product quality and attract consumers’ attention more efficiently.And it is more interesting that new types of online review provide an effective channel for consumers to alleviate their dissatisfaction to effect potential consumers purchase decision making. The findings of this study can provide useful implications for researchers by highlighting the roles of different types of online review in consumers’ decision making. Also, the empirical investigation in this paper will remind business vendors to focus on online reviews especially new types of online reviews and conduct targeted marketing strategies to increase competitive advantage and improve their sales performance

    Information Technology, Artificial Intelligence and Machine Learning in Smart Grid – Performance Comparison between Topology Identification Methodology and Neural Network Identification Methodology for the Branch Number Approximation of Overhead Low-Voltage Broadband over Power Lines Network Topologies

    Get PDF
    Broadband over Power Lines (BPL) networks that are deployed across the smart grid can benefit from the usage of machine learning, as smarter grid diagnostics are collected and analyzed. In this paper, the neural network identification methodology of Overhead Low-Voltage (OV LV) BPL networks that aims at identifying the number of branches for a given OV LV BPL topology channel attenuation behavior is proposed, which is simply denoted as NNIM-BNI. In order to identify the branch number of an OV LV BPL topology through its channel attenuation behavior, NNIM-BNI exploits the Deterministic Hybrid Model (DHM), which has been extensively tested in OV LV BPL networks for their channel attenuation determination, and the OV LV BPL topology database of Topology Identification Methodology (TIM). The results of NNIM-BNI towards the branch number identification of OV LV BPL topologies are compared against the ones of a newly proposed TIM-based methodology, denoted as TIM-BNI.Citation: Lazaropoulos, A. G. (2021). Information Technology, Artificial Intelligence and Machine Learning in Smart Grid-Performance Comparison between Topology Identification Methodology and Neural Network Identification Methodology for the Branch Number Approximation of Overhead Low-Voltage Broadband over Power Lines Network Topologies. Trends in Renewable Energy, 7, 87-113. DOI: 10.17737/tre.2021.7.1.0013

    Semantical rule-based false positive detection for IDS

    Get PDF

    Flipping 419 Scams: Targeting the Weak and the Vulnerable

    Get PDF
    Most of cyberscam-related studies focus on threats perpetrated against the Western society, with a particular attention to the USA and Europe. Regrettably, no research has been done on scams targeting African countries, especially Nigeria, where the notorious and (in)famous 419 advanced fee scam, targeted towards other countries, originated. However, as we know, cybercrime is a global problem affecting all parties. In this study, we investigate a form of advance fee fraud scam unique to Nigeria and targeted at Nigerians, but unknown to the Western world. For the study, we rely substantially on almost two years worth of data harvested from an online discussion forum used by criminals. We complement this dataset with recent data from three other active forums to consolidate and generalize the research. We apply machine learning to the data to understand the criminals’ modus operandi. We show that the criminals exploit the socio-political and economic problems prevalent in the country to craft various fraud schemes to defraud vulnerable groups such as secondary school students and unemployed graduates. The result of our research can help potential victims and policy makers to develop measures to counter the activities of these criminal groups

    Data Summarization with Social Contexts

    Get PDF
    While social data is being widely used in various applications such as sentiment analysis and trend prediction, its sheer size also presents great challenges for storing, sharing and processing such data. These challenges can be addressed by data summarization which transforms the original dataset into a smaller, yet still useful, subset. Existing methods find such subsets with objective functions based on data properties such as representativeness or informativeness but do not exploit social contexts, which are distinct characteristics of social data. Further, till date very little work has focused on topic preserving data summarization, despite the abundant work on topic modeling. This is a challenging task for two reasons. First, since topic model is based on latent variables, existing methods are not well-suited to capture latent topics. Second, it is difficult to find such social contexts that provide valuable information for building effective topic-preserving summarization model. To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset. We take Twitter data as a case study. Through analyzing Twitter data, we discover two social contexts which are important for topic generation and dissemination, namely (i) CrowdExp topic score that captures the influence of both the crowd and the expert users in Twitter and (ii) Retweet topic score that captures the influence of Twitter users' actions. We conduct extensive experiments on two real-world Twitter datasets using two applications. The experimental results show that, by leveraging social contexts, our proposed solution can enhance topic-preserving data summarization and improve application performance by up to 18%

    Advances in Data Mining Knowledge Discovery and Applications

    Get PDF
    Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications

    Efficient algorithms in analyzing genomic data

    Get PDF
    With the development of high-throughput and low-cost genotyping technologies, immense data can be cheaply and efficiently produced for various genetic studies. A typical dataset may contain hundreds of samples with millions of genotypes/haplotypes. In order to prevent data analysis from becoming a bottleneck, there is an evident need for fast and efficient analysis methods. My thesis focuses on two interesting and important genetic analyzing problems. Genome-wide Association mapping. The goal of genome wide association mapping is to identify genes or narrow regions in the genome which have significant statistical correlations to the given phenotypes. The discovery of these genes offers the potential for increased understanding of biological processes affecting phenotypes such as body weight and blood pressure. Sample selection for maximal Genetic Diversity. Given a large set of samples, it is usually more efficient to first conduct experiments on a small subset. Then the following question arises: What subset to use? There are many experimental scenarios where the ultimate objective is to maintain, or at least maximize, the genetic diversity within relatively small breeding populations. In my thesis, I developed the following efficient and effective algorithms to address these problems. Phylogeny-based Genom-wide association mapping: TreeQA: The algorithm uses local perfect phylogeny tree in genome wide analysis for genotype/phenotype association mapping. Samples are partitioned according to the sub-trees they belong to. The association between a tree and the phenotype is measured by some statistic tests. TreeQA+: TreeQA+ inherits all the advantages of TreeQA. Moreover, it improves TreeQA by incorporating sample correlations into the association study. Sample selection for maximal genetic diversity: Sample Selection in biallelic SNP Data: Samples are selected based on their genetic diversity among a set of SNPs. Given a set of samples, the algorithms search for the minimum subset that retains all diversity (or a high percentage of diversity). Representative Sample Selection in Non-Biallelic Data: For more general data (non-biallelic), information-theoretic measurements such as entropy and mutual information are used to measure the diversity of a sample subset. Samples are selected to maximize the original information retained

    Direct Manipulation Querying of Database Systems.

    Full text link
    Database systems are tremendously powerful and useful, as evidenced by their popularity in modern business. Unfortunately, for non-expert users, to use a database is still a daunting task due to its poor usability. This PhD dissertation examines stages in the information seeking process and proposes techniques to help users interact with the database through direct manipulation, which has been proven a natural interaction paradigm. For the first stage of information seeking, query formulation, we proposed a spreadsheet algebra upon which a direct manipulation interface for database querying can be built. We developed a spreadsheet algebra that is powerful (capable of expressing at least all single-block SQL queries) and can be intuitively implemented in a spreadsheet. In addition, we proposed assisted querying by browsing, where we help users query the database through browsing. For the second stage, result review, instead of asking users to review possibly many results in a flat table, we proposed a hierarchical navigation scheme that allows users to browse the results through representatives with easy drill-down and filtering capabilities. We proposed an efficient tree-based method for generating the representatives. For the query refinement stage, we proposed and implemented a provenance-based automatic refinement framework. Users label a set of output tuples and our framework produces a ranked list of changes that best improve the query. This dissertation significantly lowers the barrier for non-expert users and reduces the effort for expert users to use a database.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86282/1/binliu_1.pd
    corecore