6,028 research outputs found

    Design and Implementation of Network Public Opinion Analysis System

    Get PDF
    Network public opinion analysis is an important way of information analysis processing. This paper based on the research of the related technologies, designs and realizes a new network public opinion analysis system. System mainly includes network data fetching part, fetching the data processing part, analyzes the processed data part and display part of the public opinion analysis results. In the document extraction part, used the web crawler technology, Larbin web crawler to realize the collection of web content; In public opinion information analysis part, the implementation of the new topic adopts an improved Single - Pass clustering algorithm. This algorithm is using of multi-center, using the title and body of the vector to compared two-way ,that is better reflect the dynamics of public opinion topics. Finally, in the network environment of a university, we have the tests repeatedly. The results show that the new public opinion analysis system running is stable and has good efficiency. The thesis has certain value for the development of other information analysis systems in the Internet

    A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

    Get PDF
    It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators

    Internet rumor audience response prediction algorithm based on machine learning in big data environment

    Get PDF
    Rumors are an important factor affecting social stability in some special times. Therefore, the dissemination and prevention and control mechanisms of rumors have always been issues of concern to the academic community and have long been highly valued and widely discussed by experts and scholars. However, in combination with the Internet as a new type of media, although people have begun to pay attention to online rumors, research on it is still relatively fragmented, especially in the cross-domain research specific to the social influence of online rumors, and there is no clear indication of online rumors. The specific definition also did not analyze in detail the internal connection between its influence and group behavior. Therefore, this article will combine actual cases to explore and analyze the spread and influence process of online rumors and show its social influence, hoping to enrich the research of online rumors. Nowadays, the Internet has become the most important carrier to reflect the public grievances. Internet users have expressed their opinions on hot issues such as enterprises, people’s livelihood, and government management, which has formed a powerful public opinion pressure, which has far exceeded the traditional media. The hidden dangers of security cannot be ignored. Therefore, how to monitor network public opinion from a large amount of network data is a difficult problem that needs to be solved urgently. Firstly, this consists of four modules: information collection, web page preprocessing, public opinion analysis, and public information report. Secondly, text clustering, the core technology of network public opinion, is optimized, and single-pass algorithm based on double threshold is proposed. Then the dual-threshold single-pass algorithm is optimized based on the MapReduce parallel computing model, and finally a network public opinion collection technology is formed under the background of big data. Simulation results can greatly improve the performance of text clustering and can effectively optimize the design using the parallel computing model based on MapReduce. The average miss rate after optimization is 0.7569 times, the average false alarm rate is 0.5556 times, and C det is 0.5714 times. It proves that the collection technology based on machine learning under the background of big data is effective and has good performance
    corecore