91 research outputs found

    Research on Application of Distributed Clustering Algorithms Based on MapReduce in Social Networking Services

    Get PDF
    在信息爆炸的大数据时代,人们的生活、工作和思维方式逐渐在改变。对于数据分析而言,传统的抽样方法有悖于数据量的增长态势,使用全体数据取代随机抽样成为时代的发展需求。为了实现这个目标,仅仅依赖摩尔定律来提升计算性能是远远不够的,云计算等弹性计算体系架构逐渐受到关注。社交网络作为互联网发展史上的一个重要和成功的应用领域,也是大数据时代的重要数据来源之一。这不论对于社交网络服务提供商自身还是对其商业伙伴,乃至对于社会科学研究领域而言,都是巨大的财富。 本文针对目前国内主流微博网站在自动话题识别和归类上的欠缺,研究基于分布式聚类算法和信息检索技术,并结合语义相似度计算模型,实现一个能够根据内容对微博按话题聚类,并以此为基础向用户推荐相似话题微博的应用。论文的主要工作包括以下几个方面: 首先,研究探讨MapReduce编程模型的基本原理,就开源框架Hadoop对其工作流程、容错和任务调度机制的实现进行分析,并探讨MapReduce用于大数据处理的核心思想和基本流程。 其次,阐述k-Means、Canopy两个经典聚类算法的原理以及它们在实际应用中的结合方式,同时研究算法并行实现的可行性及策略并基于MapReduce模型给予实现。 最后,总结向量空间模型和语义方法在文本相似度计算上各自的优缺点,提出了一种综合TF-IDF和语义的文本相似度计算方法,详细论述该方法的思想及计算过程,将此作为微博文本聚类的距离度量依据。 实验结果表明,论文中的技术和方法是切实可行的,能够较为有效地识别出微博中的话题并给予用户特定的推荐和反馈,从而改变用户浏览微博的习惯,具有一定的实用性。In the age of information explosion, so called “big data revolution”, how our live, work, and think has been transformed. For data analysis, the traditional sampling method seems irreconcilable with the increment of data volume. “From some to all” has been the requirement of nowadays. In order to achieve this goal, with depending only on the Moore’s Law is not enough. Elastic computing architectures, such as cloud computing, have received increasingly large amounts of attention. Meanwhile, social networking service, as the millstone in internet history, which is the most important data source of the age of big data. It’s huge wealth for not only the SNS provider, but his commercial partners, even the field of social science research. This dissertation focuses on the microblog topics recommendation based on dis-tributed clustering algorithm and information retrieval, combined with semantic simi-larity model. It fills the gap of auto topics detection and classification in the domestic mainstream microblogging sites. The main works of the dissertation as follow: First of all, do research on core ideas of big data processing with the MapReduce programming model and the Hadoop framework. Secondly, analysis the principles of two classical clustering algorithms, so called k-Means and Canopy, design and implement their parallel computing strategies. Finally, summarize the drawback which using in text similarity computing be-tween vector space model and semantic method. And then propose a combined text similarity algorithm integrating TF-IDF and word semantics. Meanwhile, it was used as the distance metric of microblog text clustering. Experimental results show that method proposed by the dissertation is practica-ble.学位:工程硕士院系专业:软件学院_软件工程学号:2432011115228

    Development of Geographic Profiling Software - Spatial Analysis Methods of Offender’s Nodes: SAMON -

    Get PDF
    Existing geographic profiling software that performs the widely tested probability distance strategies has issues when implemented in criminal investigation in Japan. Therefore, we developed the Spatial Analysis Methods of Offender’s Nodes (SAMON) software based on a free software environment, R. Given the issues involving existing software, SAMON includes the following three features: (1) prediction of an offender’s home base using different distance decay functions constructed from Japanese burglars’ Journey-to-Crime distances; (2) validation of prediction accuracy in the solved case; and (3) calibration of the distance decay functions using a sample of solved cases in a type and region that the user is interested in. We expect that SAMON will improve the availability of probability distance strategies and its accuracy in the Japanese context

    拡散相関分光法を用いた手技療法における骨格筋局所血流の評価

    Get PDF
    掲載誌:Yasuhiro Matsuda, Mikie Nakabayashi, Tatsuya Suzuki, Sinan Zhang, Masashi Ichinose, Yumie Ono (2022). Evaluation of Local Skeletal Muscle Blood Flow in Manipulative Therapy by Diffuse Correlation Spectroscopy, Frontiers in Bioengineering and Biotechnology, 9: 80005
    corecore