127 research outputs found

    Are You Finding the Right Person? A Name Translation System Towards Web 2.0

    Get PDF
    In a multilingual world, information available in global information systems is increasing rapidly. Searching for proper names in foreign language becomes an important task in multilingual search and knowledge discovery. However, these names are the most difficult to handle because they are often unknown words that cannot be found in a translation dictionary and even human experts cannot handle the variation generated during translation. Furthermore, existing research on name translation have focused on translation algorithms. However, user experience during name translation and name search are often ignored. With the Web technology moving towards Web 2.0, creating a platform that allow easier distributed collaboration and information sharing, we seek methods to incorporate Web 2.0 technologies into a name translation system. In this research, we review challenges in name translation and propose an interactive name translation and search system: NameTran. This system takes English names and translates them into Chinese using a combined hybrid Hidden Markov Model-based (HMM-based) transliteration approach and a web mining approach. Evaluation results showed that web mining consistently boosted the performance of a pure HMM approach. Our system achieved top-1 accuracy of 0.64 and top-8 accuracy of 0.96. To cope with changing popularity and variation in name translations, we demonstrated the feasibility of allowing users to rank translations and the new ranking serves as feedback to the original trained HMM model. We believe that such user input will significantly improve system usability

    Building domain-specific web collections for scientific digital libraries: A meta-search enhanced focused crawling method

    Get PDF
    Collecting domain-specific documents from the Web using focused crawlers has been considered one of the most important strategies to build digital libraries that serve the scientific community. However, because most focused crawlers use local search algorithms to traverse the Web space, they could be easily trapped within a limited sub-graph of the Web that surrounds the starting URLs and build domain-specific collections that are not comprehensive and diverse enough to scientists and researchers. In this study, we investigated the problems of traditional focused crawlers caused by local search algorithms and proposed a new crawling approach, meta-search enhanced focused crawling, to address the problems. We conducted two user evaluation experiments to examine the performance of our proposed approach and the results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques

    Employee Satisfaction and Corporate Performance: Mining Employee Reviews on Glassdoor.com

    Get PDF
    In recent years, Big Data has created significant opportunities for academic research in a wide range of topics within the social sciences. We contribute to this growing field by exploiting the unique social media data from Glassdoor.com. We extract anonymous employee reviews for textual analysis to reveal the relation between employee satisfaction and company performance. Using categories from corporate value studies, our analysis not only provide a “bird’s eye view,” but also provide specific aspects of employee satisfaction are responsible for driving these correlations. We found that while Innovation is the most important category for technology industry, Quality category drives retailing and financial industry. We confirmed the significant correlation between overall employee satisfaction and corporate performance and discovered categories that are negatively correlated with performance: Safety, Communication and Integrity. We hope that this research encourages other researchers to consider the rich environ that a text analytics methodology makes possible

    Turning Unstructured and Incoherent Group Discussion into DATree: A TBL Coherence Analysis Approach

    Get PDF
    Despite the rapid growth of user-generated unstructured text from online group discussions, business decision-makers are facing the challenge of understanding its highly incoherent content. Coherence analysis attempts to reconstruct the order of discussion messages. However, existing methods only focus on system and cohesion features. While they work with asynchronous discussions, they fail with synchronous discussions because these features rarely appear. We believe that discussion logic features play an important role in coherence analysis. Therefore, we propose a TCA method for coherence analysis, which is composed of a novel message similarity measure algorithm, a subtopic segmentation algorithm and a TBL-based classification algorithm. System, cohesion and discussion logic features are all incorporated into our TCA method. Results from experiments showed that the TCA method achieved significantly better performance than existing methods. Furthermore, we illustrate that the DATree generated by the TCA method can enhance decision-makers’ content analysis capability

    Examining Competitive Intelligence Using External and Internal Data Sources: A Text Mining Approach

    Get PDF
    Competitive intelligence (CI) is the practice of studying competitors and competitive environment in support of firm’s strategic decision-making process. Currently, competitors are usually studied from business profile information and reports edited by CI professionals. While being inefficient and expensive in labor and resources, their results are often incomplete and lack objectivity. Some existing literatures introduced text mining to leverage Web information for CI usage. Despite improving on coverage, most of these analyses identify competitors using name co-occurrences from a single data source. The validity and reliability of these studies remain questionable. Our experiment demonstrates that syntactic level text mining can lead to improvements on CI performance. It also shows that the selection of different online data sources and competitor name extraction methods have different implications on CI outcome

    Entanglement Negativity and Defect Extremal Surface

    Full text link
    We study entanglement negativity for evaporating black hole based on the holographic model with defect brane. We introduce a defect extremal surface formula for entanglement negativity. Based on partial reduction, we show the equivalence between defect extremal surface formula and island formula for entanglement negativity in AdS3_3/BCFT2_2. Extending the study to the model of eternal black hole plus CFT bath, we find that black hole-radiation negativity follows Page curve, black hole-black hole negativity decreases until vanishing, radiation-radiation negativity increases and then saturates at a time later than Page time. In all the time dependent cases, defect extremal surface formula agrees with island formula

    Big Data in Fashion Industry: Color Cycle Mining from Runway Data

    Get PDF
    Color is a powerful selling tool, especially in the fashion and textile industry, in which products aim to inspire consumers visually. Color Cycle Analysis studies the recurring cycle of trends. Traditional fashion color cycle analysis and prediction is performed by observing and extrapolating from trends apparent on fashion runways. With the emergence of big data, there is a potential to apply data analytics method in fashion industry. We propose and develop a data-driven methodology to analyze color trends by mining online textual data of global fashion runways collected from the Style.com website. By capturing three important elements in color hue, saturation and brightness, we are able effectively extract their presence and variations in textual data. We illustrate the re-occurrence of seven Color Cycle phases: High Chroma, Multicolored, Subdued, Earth Tones, Achromatic, and Purple Phase from runway review data

    Mining Firm-level Uncertainty in Stock Market: A Text Mining Approach

    Get PDF
    The traditional finance paradigm seeks to understand uncertainty and their impact on stock market. However, most previous studies try to quantify uncertainty at macro-level such as the EPU index. There are few studies tapping into firm-level uncertainty. In this paper, we address this empirical anomaly by integrating text mining tools to measure the firm-level uncertainty score from news content. We focus on companies listed in S&P 1500. We crawled a total of 2,196,975 news articles from LexisNexis database from April 2007 to July 2017. We extracted uncertainty related information as features by using named entity extraction, LM dictionary, and other linguistic features. We employed nonlinear machine learning models to investigate the impact on stocks future returns by uncertainty-related features. To address the theoretical problem, we use traditional asset pricing techniques to test the relationship among information derived uncertainty and the financial market performance

    SpidersRUs: Automated development of vertical search engines in different domains and languages

    Get PDF
    In this paper we discuss the architecture of a tool designed to help users develop vertical search engines in different domains and different languages. The design of the tool is presented and an evaluation study was conducted, showing that the system is easier to use than other existing tools. Categories and Subject Descriptor

    Building Knowledge Management System for Researching Terrorist Groups on the Web

    Get PDF
    Nowadays, terrorist organizations have found a cost-effective resource to advance their courses by posting high-impact Web sites on the Internet. This alternate side of the Web is referred to as the “Dark Web.” While counterterrorism researchers seek to obtain and analyze information from the Dark Web, several problems prevent effective and efficient knowledge discovery: the dynamic and hidden character of terrorist Web sites, information overload, and language barrier problems. This study proposes an intelligent knowledge management system to support the discovery and analysis of multilingual terrorist-created Web data. We developed a systematic approach to identify, collect and store up-to-date multilingual terrorist Web data. We also propose to build an intelligent Web-based knowledge portal integrated with advanced text and Web mining techniques such as summarization, categorization and cross-lingual retrieval to facilitate the knowledge discovery from Dark Web resources. We believe our knowledge portal provide counterterrorism research communities with valuable datasets and tools in knowledge discovery and sharing
    • …
    corecore