374,181 research outputs found

    Web crawler research methodology

    Get PDF
    In economic and social sciences it is crucial to test theoretical models against reliable and big enough databases. The general research challenge is to build up a well-structured database that suits well to the given research question and that is cost efficient at the same time. In this paper we focus on crawler programs that proved to be an effective tool of data base building in very different problem settings. First we explain how crawler programs work and illustrate a complex research process mapping business relationships using social media information sources. In this case we illustrate how search robots can be used to collect data for mapping complex network relationship to characterize business relationships in a well defined environment. After that extend the case and present a framework of three structurally different research models where crawler programs can be applied successfully: exploration, classification and time series analysis. In the case of exploration we present findings about the Hungarian web agency industry when no previous statistical data was available about their operations. For classification we show how the top visited Hungarian web domains can be divided into predefined categories of e-business models. In the third research we used a crawler to gather the values of concrete pre-defined records containing ticket prices of low cost airlines from one single site. Based on the experiences we highlight some conceptual conclusions and opportunities of crawler based research in e-business. --e-business research,web search,web crawler,Hungarian web,social network analyis

    Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis

    Get PDF
    Background: In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large. Objective: Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis. Methods: An XGBoost model was trained and tested to classify "current depression" and "no lifetime depression" for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network. Results: The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (P<.05) and indirect (P≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes. Conclusions: XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.ope

    Penerapan Social Network Analysis Dalam Menganalisis Kerjasama Tokopedia Dengan Boyband Korea BTS

    Get PDF
    Tokopedia is an Indonesian technology company with a mission to achieve digitaleconomic equity. Since its founding in 2009, Tokopedia has transformed into aninfluential unicorn not only in Indonesia but also in Southeast Asia. On October 7,2019, Tokopedia announced a South Korean music group, BTS, to become the new&nbsp;brand ambassador for Tokopedia. BTS is a global mega star group from South&nbsp;Korea which is shaded by Big Hit Entertainment. Consisting of seven members&nbsp;including RM, Jin, SUGA, j-hope, Jimin, V, and Jung Kook, BTS was founded in&nbsp;2013 and has had worldwide success. The extraordinary growth and achievements&nbsp;achieved by BTS have managed to break records in recent years so that BTS is&nbsp;designated as the persona of the Tokopedia brand. Through this collaboration, the&nbsp;public and BTS fans are expected to be closer to their inspirational figure. The&nbsp;various types of marketing currently carried out by Tokopedia in collaboration and&nbsp;collaboration with BTS have had a lot of impact on Tokopedia's sales. One of the&nbsp;marketing efforts that has been done by Tokopedia is using Twitter. Twitter is one&nbsp;of the social media used to attract consumers to buy products sold on Tokopedia.&nbsp;Using Social Network Analysis (SNA) provides a statistical tool for examining&nbsp;relational data not only on the characteristic attributes of individual actors, and&nbsp;focuses on explaining the patterns of relationships between actors, and analyzing&nbsp;the structure of these patterns. Social etwork representation is expressed in graph&nbsp;form because graph is the most fundamental type of social network representation.&nbsp;Social Network Analysis (SNA) argues that the relationship between nodes is&nbsp;important. The focus of Social Network Analysis (SNA) is on knowing the&nbsp;actors/nodes involved and how relationships occur. This study uses Social Network&nbsp;Analysis (SNA) to produce a structure of relationship data patterns between the&nbsp;collaboration between Tokopedia and Korean Boyband BTS which can help&nbsp;Tokopedia to review the collaboration that has been done. Tokopedia can take&nbsp;action in collaborating with BTS, to continuing or stopping or replacing with newbrand ambassadors

    What country, university or research institute, performed the best on COVID-19? Bibliometric analysis of scientific literature

    Full text link
    In this article, we conduct data mining to discover the countries, universities and companies, produced or collaborated the most research on Covid-19 since the pandemic started. We present some interesting findings, but despite analysing all available records on COVID-19 from the Web of Science Core Collection, we failed to reach any significant conclusions on how the world responded to the COVID-19 pandemic. Therefore, we increased our analysis to include all available data records on pandemics and epidemics from 1900 to 2020. We discover some interesting results on countries, universities and companies, that produced collaborated most the most in research on pandemic and epidemics. Then we compared the results with the analysing on COVID-19 data records. This has created some interesting findings that are explained and graphically visualised in the article
    corecore