9,133 research outputs found

    Large-Scale information extraction from textual definitions through deep syntactic and semantic analysis

    Get PDF
    We present DEFIE, an approach to large-scale Information Extraction (IE) based on a syntactic-semantic analysis of textual definitions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically. The output of DEFIE is a high-quality knowledge base consisting of several million automatically acquired semantic relations

    Automatic domain ontology extraction for context-sensitive opinion mining

    Get PDF
    Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline

    Designing text mining-based competitive intelligence systems

    Get PDF

    Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data.

    Get PDF
    A journal article is often accompanied by a list of keyphrases, composed of about five to fifteen important words and phrases that capture the articleÂ’s main topics. Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching it as a supervised learning problem. An input document is treated as a set of candidate phrases that must be classified as either keyphrases or non-keyphrases. To classify a candidate phrase as a keyphrase, the most important features (attributes) appear to be the frequency and location of the candidate phrase in the document. Recent work has demonstrated that it is also useful to know the frequency of the candidate phrase as a manually assigned keyphrase for other documents in the same domain as the given document (e.g., the domain of computer science). Unfortunately, this keyphrase-frequency feature is domain-specific (the learning process must be repeated for each new domain) and training-intensive (good performance requires a relatively large number of training documents in the given domain, with manually assigned keyphrases). The aim of the work described here is to remove these limitations. In this paper, I introduce new features that are conceptually related to keyphrase-frequency and I present experiments that show that the new features result in improved keyphrase extraction, although they are neither domain-specific nor training-intensive. The new features are generated by issuing queries to a Web search engine, based on the candidate phrases in the input document. The feature values are calculated from the number of hits for the queries (the number of matching Web pages). In essence, these new features are derived by mining lexical knowledge from a very large collection of unlabeled data, consisting of approximately 350 million Web pages without manually assigned keyphrases

    A Probabilistic Generative Model for Latent Business Networks Mining

    Get PDF
    The structural embeddedness theory posits that a company’s embeddedness in a business network impacts its competitive performance. This highlights the theoretical and practical values toward business network mining and analysis. Given the fact that latent business relationships may exist and business networks continuously evolve over time, a manual approach for the discovery and analysis of business network is ineffective. Though numerous research has been devoted to social network discovery and analysis, relatively little research is conducted on business network discovery. Guided by the design science research methodology, the main contribution of our research is the design and development of a novel probabilistic generative model for latent business relationship mining. The proposed method can effectively and efficiently discover evolving latent business networks over time. Our experimental results confirm that the proposed method outperforms the well-known vector space model based latent business relationship mining method by 28% in terms of AUC value

    Analysis of Competitor Intelligence in the Era of Big Data: An Integrated System Using Text Summarization Based on Global Optimization

    Get PDF
    Automatic text summarization can be applied to extract summaries from competitor intelligence (CI) corpora that organizations create by gathering textual data from the Internet. Such a representation of CI text is easier for managers to interpret and use for making decisions. This research investigates design of an integrated system for CI analysis which comprises clustering and automatic text summarization and evaluates quality of extractive summaries generated automatically by various text-summarization techniques based on global optimization. This research is conducted using experimentation and empirical analysis of results. A survey of practicing managers is also carried out to understand the effectiveness of automatically generated summaries from CI perspective. Firstly, it shows that global optimization-based techniques generate good quality extractive summaries for CI analysis from topical clusters created by the clustering step of the integrated system. Secondly, it shows the usefulness of the generated summaries by having them evaluated by practicing managers from CI perspective. Finally, the implication of this research from the point of view of theory and practice is discussed

    Analyzing user reviews of messaging Apps for competitive analysis

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe rise of various messaging apps has resulted in intensively fierce competition, and the era of Web 2.0 enables business managers to gain competitive intelligence from user-generated content (UGC). Text-mining UGC for competitive intelligence has been drawing great interest of researchers. However, relevant studies mostly focus on industries such as hospitality and products, and few studies applied such techniques to effectively perform competitive analysis for messaging apps. Here, we conducted a competitive analysis based on topic modeling and sentiment analysis by text-mining 27,479 user reviews of four iOS messaging apps, namely Messenger, WhatsApp, Signal and Telegram. The results show that the performance of topic modeling and sentiment analysis is encouraging, and that a combination of the extracted app aspect-based topics and the adjusted sentiment scores can effectively reveal meaningful competitive insights into user concerns, competitive strengths and weaknesses as well as changes of user sentiments over time. We anticipate that this study will not only advance the existing literature on competitive analysis using text mining techniques for messaging apps but also help existing players and new entrants in the market to sharpen their competitive edge by better understanding their user needs and the industry trends
    • …
    corecore