2,525 research outputs found

    To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

    Full text link
    Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some -- but not all -- inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2Ă—\times on average, and up to 10.9Ă—\times, on widely studied MIPS datasets.Comment: 12 pages, 8 figures, 2 table

    What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

    Get PDF
    Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective

    Semantic data integration for supply chain management: with a specific focus on applications in the semiconductor industry

    Get PDF
    Supply Chain Management (SCM) is essential to monitor, control, and enhance the performance of SCs. Increasing globalization and diversity of Supply Chains (SC)s lead to complex SC structures, limited visibility among SC partners, and challenging collaboration caused by dispersed data silos. Digitalization is responsible for driving and transforming SCs of fundamental sectors such as the semiconductor industry. This is further accelerated due to the inevitable role that semiconductor products play in electronics, IoT, and security systems. Semiconductor SCM is unique as the SC operations exhibit special features, e.g., long production lead times and short product life. Hence, systematic SCM is required to establish information exchange, overcome inefficiency resulting from incompatibility, and adapt to industry-specific challenges. The Semantic Web is designed for linking data and establishing information exchange. Semantic models provide high-level descriptions of the domain that enable interoperability. Semantic data integration consolidates the heterogeneous data into meaningful and valuable information. The main goal of this thesis is to investigate Semantic Web Technologies (SWT) for SCM with a specific focus on applications in the semiconductor industry. As part of SCM, End-to-End SC modeling ensures visibility of SC partners and flows. Existing models are limited in the way they represent operational SC relationships beyond one-to-one structures. The scarcity of empirical data from multiple SC partners hinders the analysis of the impact of supply network partners on each other and the benchmarking of the overall SC performance. In our work, we investigate (i) how semantic models can be used to standardize and benchmark SCs. Moreover, in a volatile and unpredictable environment, SC experts require methodical and efficient approaches to integrate various data sources for informed decision-making regarding SC behavior. Thus, this work addresses (ii) how semantic data integration can help make SCs more efficient and resilient. Moreover, to secure a good position in a competitive market, semiconductor SCs strive to implement operational strategies to control demand variation, i.e., bullwhip, while maintaining sustainable relationships with customers. We examine (iii) how we can apply semantic technologies to specifically support semiconductor SCs. In this thesis, we provide semantic models that integrate, in a standardized way, SC processes, structure, and flows, ensuring both an elaborate understanding of the holistic SCs and including granular operational details. We demonstrate that these models enable the instantiation of a synthetic SC for benchmarking. We contribute with semantic data integration applications to enable interoperability and make SCs more efficient and resilient. Moreover, we leverage ontologies and KGs to implement customer-oriented bullwhip-taming strategies. We create semantic-based approaches intertwined with Artificial Intelligence (AI) algorithms to address semiconductor industry specifics and ensure operational excellence. The results prove that relying on semantic technologies contributes to achieving rigorous and systematic SCM. We deem that better standardization, simulation, benchmarking, and analysis, as elaborated in the contributions, will help master more complex SC scenarios. SCs stakeholders can increasingly understand the domain and thus are better equipped with effective control strategies to restrain disruption accelerators, such as the bullwhip effect. In essence, the proposed Sematic Web Technology-based strategies unlock the potential to increase the efficiency, resilience, and operational excellence of supply networks and the semiconductor SC in particular

    Deep Learning for Learning Representation and Its Application to Natural Language Processing

    Get PDF
    As the web evolves even faster than expected, the exponential growth of data becomes overwhelming. Textual data is being generated at an ever-increasing pace via emails, documents on the web, tweets, online user reviews, blogs, and so on. As the amount of unstructured text data grows, so does the need for intelligently processing and understanding it. The focus of this dissertation is on developing learning models that automatically induce representations of human language to solve higher level language tasks. In contrast to most conventional learning techniques, which employ certain shallow-structured learning architectures, deep learning is a newly developed machine learning technique which uses supervised and/or unsupervised strategies to automatically learn hierarchical representations in deep architectures and has been employed in varied tasks such as classification or regression. Deep learning was inspired by biological observations on human brain mechanisms for processing natural signals and has attracted the tremendous attention of both academia and industry in recent years due to its state-of-the-art performance in many research domains such as computer vision, speech recognition, and natural language processing. This dissertation focuses on how to represent the unstructured text data and how to model it with deep learning models in different natural language processing viii applications such as sequence tagging, sentiment analysis, semantic similarity and etc. Specifically, my dissertation addresses the following research topics: In Chapter 3, we examine one of the fundamental problems in NLP, text classification, by leveraging contextual information [MLX18a]; In Chapter 4, we propose a unified framework for generating an informative map from review corpus [MLX18b]; Chapter 5 discusses the tagging address queries in map search [Mok18]. This research was performed in collaboration with Microsoft; and In Chapter 6, we discuss an ongoing research work in the neural language sentence matching problem. We are working on extending this work to a recommendation system

    An Intensive Spectrum for Intention Mining Analysis

    Get PDF
    There is huge volume of data in the social networks. This data can be retrieved and integrated to extract useful meaning and come out with the insights which is called as intentions. This can be used in different fields like business, recommender systems, education, Scientific research, games, etc. Also, there are various intention mining techniques which can be applied to several fields as information retrieval, business, etc. There is no specific definition of intention mining and also there is very less existing literature present. Accordingly, there is need to conduct systematic literature review of the very recent research area. Understanding intention mining, purpose of intention mining, categories and techniques of intention mining is the need. The paper endorses a spectrum for intention mining so that further literature review of intention mining can be completed. We validate our work through dimensions, categories and techniques for intention mining

    Three Essays on Consumers\u27 Activities in the Online Domain

    Get PDF
    Nowadays, with the explosive growth in the usage of the Internet, consumers are performing all kinds of activities over the Internet like searching or buying. We want to study the different activities of consumers in the online domain. In our daily lives, people are often making various kinds of product purchases. When making such purchases, a lot of factors can affect consumers\u27 decisions. This includes the nature of the product category, and especially in the online domain, the nature of their search activities. In the first essay/chapter, we develop an econometric model to understand the relationships between different dimensions of on-line search and purchase behavior. Our approach uses endogeneity corrections to develop a model that is more correct than the typical non-endogeneity corrected model. Thus we believe our results to be truly reflective of what is happening in the search-buying domain. We use extensive empirical data to test several hypotheses that we developed. Parameters from our model estimations reveal that there are interesting variations in the search-purchase behavior relationships across types of product categories. This difference is especially evident between utilitarian and hedonic goods. Our findings have important theoretical and managerial implications. The amount of information in text reviews is tremendously greater than that in typical numerical data. A major challenge for marketers is how to extract the most relevant information from this big data source. In our second essay/chapter, we do this by using a text mining methodology that draws on machine learning algorithms. We collect data using a Java WebCrawler type programming approach. We use a word-based model to predict consumers\u27 recommendations. Model prediction accuracy was high. In the marketing literature there has been almost no work where such a methodology has been used to make predictions of recommendations based on big data stemming from textual information. An interesting finding from our research is that as the number of textual features increases, the predictive accuracy of the model increases only up to a point. Beyond that, inclusion of more words in the model leads to a decrease in predictive accuracy. We also use a diagnostic approach to identify key words that are determinants of user recommendations. Since our model deals with big data, we address in details the issue of scalability; our computations show that our approach is very scalable. Potential for marketing implications seems considerable. Marketers are always interested in predicting market sales so that they can arrange the firm activities accordingly. In the meantime, this market sales information can also help the consumers to make right buying decisions. However the high cost and long period of collecting the available data with a lag makes it very inconvenient and out of date. With the rise of multi-social media sharing websites such as YouTube, Flickr, and various blogs, consumers can search and learn various types of information from these websites. The availability of large amounts of data on the Internet enables us to use large scale data mining algorithms for solving complex problems. The users\u27 online searching activities can be captured for predicting the market sales. In the third essay/chapter, we focus on the impacts of different search behavior and marketing outcomes like product sales. We examined the three major online search areas including text, image, and video from search engines like Google to help us accurately and easily predict the sales of automobiles. We believe that our work here opens a brand new arena for using multimedia search activities and will have a big impact on marketing sciences

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
    • …
    corecore