11 research outputs found

    A Modified Hierarchical Agglomerative Approach for Efficient Document Clustering System

    Get PDF
    In today’s world, the increasing volume of text documents has brought challenges for their effective and efficient organization. This has led to an enormous demand for efficient tools that turn data into valuable knowledge. One of the techniques that can play an important role towards the achievement of this objective is document clustering. The main function of document clustering is automatic grouping of documents so that the documents within a cluster are very similar, but dissimilar to the documents in other clusters. This research proposes a Modified Agglomerative Hierarchical Clustering (MAHC) algorithm based on hierarchical method. In many traditional systems, the number of term frequency is considered to create data representation matrix. However, a modified algorithm creates data representation matrix based only on occurrence of items, not on frequency of items. The proposed algorithm can increase the quality of clustering because it can merge the related or similar documents into the same cluster efficiently. Moreover, the proposed algorithm can reduce the processing time than the existing methods. In this paper, the performance of clustering between the proposed and original clustering algorithm was compared and evaluated by using F-measure

    多観点類似度を用いたクラスタリングに関する研究

    Get PDF
     データ集合を教師データを用いた事前学習をおこなうことなくクラスタと呼ばれる部分集合に分割する手法をクラスタリングと呼ぶ。クラスタリングの基本は、類似したデータ同士を同じクラスタに所属させることである。このため、データ間の類似度の設定はクラスタリングにおいて非常に重要である。一般的に用いられる代表的な類似度としては、ユークリッド空間上の多次元ベクトルに対するユークリッド距離やcosine 類似度が知られている。Cosine 類似度は、文書データのような高次で疎なデータに対する類似度指標としてよく用いられる。Nguyenらはcosine類似度における原点を複数用いた多観点類似度(Multiviewpoint-Based Similarity:MVS) を提案した。そして、MVS を非階層クラスタリングに適用することで、文書データのクラスタリングにおいて優れた結果を示した。ただし、非階層クラスタリングは事前に分割するクラスタの数を人為的に指定する必要がある。 本研究では、この多観点類似度に関する2 つのテーマを取り扱う。 1つ目は、Nguyenらの提案した多観点なcosine 類似度を階層クラスタリングについて適用した手法の開発である。階層クラスタリングは非階層クラスタリングのように事前に分割するクラスタ数を指定する必要がなく、階層的な分割構造を抽出できる。ただしMVSはcosine類似度より計算量が大きいため、階層クラスタリング全体の計算量を悪化させる恐れがある。そこで提案手法では、クラスタ間類似度の計算を高速化する手法を開発し、一般的な階層クラスタリングと同様の計算量O(mn2+n2logn)でのクラスタリングを実現した。さらに文書データを用いた実験により、MVS を用いた階層クラスタリングが既存手法と同程度の計算時間で、より高い分類精度を示すことを確認した。 2つ目は、cosine類似度以外への多観点類似度の適用である。本研究では、ユークリッド距離に対して基準点が影響を与えるような新しい距離定義である多観点距離(Multiviewpoint-Based Distance:MVD) を提案する。さらに、このMVD を、非階層クラスタリングの代表的手法であるk-meansに対して適用したクラスタリング手法を開発した。また、開発したMVDを用いた分割クラスタリング手法が、k-meansのクラスタリング結果を改善することを実験的に示した。電気通信大学201

    文節の係り受け関係を用いた観点に基づく意見クラスタリング

    Get PDF
    Web上には,様々なトピックに関する意見が存在し,トピックに関する意見には様々な観点のものが混在している.例えば,「原発」というトピックに関する意見には安全性やエネルギー,健康といった観点の意見が混在している.意見をこのような観点ごとに分類することで,観点ごとに意見を容易に把握・比較でき,新たな観点の意見を発見する手がかりにもなる.意見を観点ごとに分類する研究は少なく,分類する観点を予め設定しているものや,観点の差異を考慮していない手法がほとんどである.そこで本研究では,予め観点を設定せずに,文脈情報,とりわけ名詞と動詞の係り受け関係を考慮して意見集合に適した観点を自動的に特定・分類するクラスタリング手法を提案する.本研究で提案する意見クラスタリング手法では,「意見の観点の違いは名詞と動詞の係り受け関係の違いに反映される」という仮定のもと,文節の係り受け関係から名詞Nと動詞Vのペア〈N,V〉を抽出し,これをクラスタリングに利用する.具体的には,各意見から得られた文節の係り受け関係をもとに名詞とそれが係る動詞のペア〈N,V〉を抽出する.そして,日本語WordNetと潜在意味インデキシングを用いて計算した名詞Nどうしの類似度と動詞Vどうしの類似度から抽出した〈N,V〉間の類似度を計算するが,特に,名詞Nどうしの類似度が高くなるほど動詞Vどうしの類似度が〈N,V〉間の類似度に大きく影響を与えるように計算する.最終的に意見どうしの類似度を〈N,V〉間の類似度から計算し,Ward法による階層型クラスタリングを行う.評価実験では,意見集合に対して人手による観点に基づいた分類と提案手法および従来のクラスタリング手法による分類がどの程度近いかということを指標として分類性能を調べた.実験の結果,提案手法では従来手法より高い分類性能となり,提案手法が有用であることが示された.電気通信大学201

    Pro-socially motivated interaction for knowledge integration in crowd-based open innovation

    Get PDF
    Purpose: The purpose of this paper is to study how the online temporary crowd shares knowledge in a way that fosters the integration of their diverse knowledge. Having the crowd integrate its knowledge to offer solution-ideas to ill-structured problems posed by organizations is one of the desired outcomes of crowd-based open innovation because, by integrating others’ knowledge, the ideas are more likely to consider the many divergent issues related to solving the ill-structured problem. Unfortunately, the diversity of knowledge content offered by heterogeneous specialists in the online temporary crowd makes integration difficult, and the lean social context of the crowd makes extensive dialogue to resolve integration issues impractical. The authors address this issue by exploring theoretically how the manner in which interaction is organically conducted during open innovation challenges enables the generation of integrative ideas. The authors hypothesize that, as online crowds organically share knowledge based upon successful pro-socially motivated interaction, they become more productive in generating integrative ideas. Design/methodology/approach: Using a multilevel mixed-effects model, this paper analyzed 2,244 posts embedded in 747 threads with 214 integrative ideas taken from 10 open innovation challenges. Findings: Integrative ideas were more likely to occur after pro-socially motivated interactions. Research limitations/implications: Ideas that integrate knowledge about the variety of issues that relate to solving an ill-structured problem are desired outcomes of crowd-based open innovation challenges. Given that members of the crowd in open innovation challenges rarely engage in dialogue, a new theory is needed to explain why integrative ideas emerge at all. The authors’ adaptation of pro-social motivation interaction theory helps to provide such a theoretical explanation. Practitioners of crowd-based open innovation should endeavor to implement systems that encourage the crowd members to maintain a high level of activeness in pro-socially motivated interaction to ensure that their knowledge is integrated as solutions are generated. Originality/value: The present study extends the crowd-based open innovation literature by identifying new forms of social interaction that foster more integrated ideas from the crowd, suggesting the mitigating role of pro-socially motivated interaction in the negative relationship between knowledge diversity and knowledge integration. This study fills in the research gap in knowledge management research describing a need for conceptual frameworks explaining how to manage the increasing complexity of knowledge in the context of crowd-based collaboration for innovation

    Spectacularly Binocular: Exploiting Binocular Luster Effects for HCI Applications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Proceedings of the 2004 ONR Decision-Support Workshop Series: Interoperability

    Get PDF
    In August of 1998 the Collaborative Agent Design Research Center (CADRC) of the California Polytechnic State University in San Luis Obispo (Cal Poly), approached Dr. Phillip Abraham of the Office of Naval Research (ONR) with the proposal for an annual workshop focusing on emerging concepts in decision-support systems for military applications. The proposal was considered timely by the ONR Logistics Program Office for at least two reasons. First, rapid advances in information systems technology over the past decade had produced distributed collaborative computer-assistance capabilities with profound potential for providing meaningful support to military decision makers. Indeed, some systems based on these new capabilities such as the Integrated Marine Multi-Agent Command and Control System (IMMACCS) and the Integrated Computerized Deployment System (ICODES) had already reached the field-testing and final product stages, respectively. Second, over the past two decades the US Navy and Marine Corps had been increasingly challenged by missions demanding the rapid deployment of forces into hostile or devastate dterritories with minimum or non-existent indigenous support capabilities. Under these conditions Marine Corps forces had to rely mostly, if not entirely, on sea-based support and sustainment operations. Particularly today, operational strategies such as Operational Maneuver From The Sea (OMFTS) and Sea To Objective Maneuver (STOM) are very much in need of intelligent, near real-time and adaptive decision-support tools to assist military commanders and their staff under conditions of rapid change and overwhelming data loads. In the light of these developments the Logistics Program Office of ONR considered it timely to provide an annual forum for the interchange of ideas, needs and concepts that would address the decision-support requirements and opportunities in combined Navy and Marine Corps sea-based warfare and humanitarian relief operations. The first ONR Workshop was held April 20-22, 1999 at the Embassy Suites Hotel in San Luis Obispo, California. It focused on advances in technology with particular emphasis on an emerging family of powerful computer-based tools, and concluded that the most able members of this family of tools appear to be computer-based agents that are capable of communicating within a virtual environment of the real world. From 2001 onward the venue of the Workshop moved from the West Coast to Washington, and in 2003 the sponsorship was taken over by ONR’s Littoral Combat/Power Projection (FNC) Program Office (Program Manager: Mr. Barry Blumenthal). Themes and keynote speakers of past Workshops have included: 1999: ‘Collaborative Decision Making Tools’ Vadm Jerry Tuttle (USN Ret.); LtGen Paul Van Riper (USMC Ret.);Radm Leland Kollmorgen (USN Ret.); and, Dr. Gary Klein (KleinAssociates) 2000: ‘The Human-Computer Partnership in Decision-Support’ Dr. Ronald DeMarco (Associate Technical Director, ONR); Radm CharlesMunns; Col Robert Schmidle; and, Col Ray Cole (USMC Ret.) 2001: ‘Continuing the Revolution in Military Affairs’ Mr. Andrew Marshall (Director, Office of Net Assessment, OSD); and,Radm Jay M. Cohen (Chief of Naval Research, ONR) 2002: ‘Transformation ... ’ Vadm Jerry Tuttle (USN Ret.); and, Steve Cooper (CIO, Office ofHomeland Security) 2003: ‘Developing the New Infostructure’ Richard P. Lee (Assistant Deputy Under Secretary, OSD); and, MichaelO’Neil (Boeing) 2004: ‘Interoperability’ MajGen Bradley M. Lott (USMC), Deputy Commanding General, Marine Corps Combat Development Command; Donald Diggs, Director, C2 Policy, OASD (NII

    Modeling and Simulation in Engineering

    Get PDF
    This book provides an open platform to establish and share knowledge developed by scholars, scientists, and engineers from all over the world, about various applications of the modeling and simulation in the design process of products, in various engineering fields. The book consists of 12 chapters arranged in two sections (3D Modeling and Virtual Prototyping), reflecting the multidimensionality of applications related to modeling and simulation. Some of the most recent modeling and simulation techniques, as well as some of the most accurate and sophisticated software in treating complex systems, are applied. All the original contributions in this book are jointed by the basic principle of a successful modeling and simulation process: as complex as necessary, and as simple as possible. The idea is to manipulate the simplifying assumptions in a way that reduces the complexity of the model (in order to make a real-time simulation), but without altering the precision of the results

    Designing new network adaptation and ATM adaptation layers for interactive multimedia applications

    Get PDF
    Multimedia services, audiovisual applications composed of a combination of discrete and continuous data streams, will be a major part of the traffic flowing in the next generation of high speed networks. The cornerstones for multimedia are Asynchronous Transfer Mode (ATM) foreseen as the technology for the future Broadband Integrated Services Digital Network (B-ISDN) and audio and video compression algorithms such as MPEG-2 that reduce applications bandwidth requirements. Powerful desktop computers available today can integrate seamlessly the network access and the applications and thus bring the new multimedia services to home and business users. Among these services, those based on multipoint capabilities are expected to play a major role.    Interactive multimedia applications unlike traditional data transfer applications have stringent simultaneous requirements in terms of loss and delay jitter due to the nature of audiovisual information. In addition, such stream-based applications deliver data at a variable rate, in particular if a constant quality is required.    ATM, is able to integrate traffic of different nature within a single network creating interactions of different types that translate into delay jitter and loss. Traditional protocol layers do not have the appropriate mechanisms to provide the required network quality of service (QoS) for such interactive variable bit rate (VBR) multimedia multipoint applications. This lack of functionalities calls for the design of protocol layers with the appropriate functions to handle the stringent requirements of multimedia.    This thesis contributes to the solution of this problem by proposing new Network Adaptation and ATM Adaptation Layers for interactive VBR multimedia multipoint services.    The foundations to build these new multimedia protocol layers are twofold; the requirements of real-time multimedia applications and the nature of compressed audiovisual data.    On this basis, we present a set of design principles we consider as mandatory for a generic Multimedia AAL capable of handling interactive VBR multimedia applications in point-to-point as well as multicast environments. These design principles are then used as a foundation to derive a first set of functions for the MAAL, namely; cell loss detection via sequence numbering, packet delineation, dummy cell insertion and cell loss correction via RSE FEC techniques.    The proposed functions, partly based on some theoretical studies, are implemented and evaluated in a simulated environment. Performances are evaluated from the network point of view using classic metrics such as cell and packet loss. We also study the behavior of the cell loss process in order to evaluate the efficiency to be expected from the proposed cell loss correction method. We also discuss the difficulties to map network QoS parameters to user QoS parameters for multimedia applications and especially for video information. In order to present a complete performance evaluation that is also meaningful to the end-user, we make use of the MPQM metric to map the obtained network performance results to a user level. We evaluate the impact that cell loss has onto video and also the improvements achieved with the MAAL.    All performance results are compared to an equivalent implementation based on AAL5, as specified by the current ITU-T and ATM Forum standards.    An AAL has to be by definition generic. But to fully exploit the functionalities of the AAL layer, it is necessary to have a protocol layer that will efficiently interface the network and the applications. This role is devoted to the Network Adaptation Layer.    The network adaptation layer (NAL) we propose, aims at efficiently interface the applications to the underlying network to achieve a reliable but low overhead transmission of video streams. Since this requires an a priori knowledge of the information structure to be transmitted, we propose the NAL to be codec specific.    The NAL targets interactive multimedia applications. These applications share a set of common requirements independent of the encoding scheme used. This calls for the definition of a set of design principles that should be shared by any NAL even if the implementation of the functions themselves is codec specific. On the basis of the design principles, we derive the common functions that NALs have to perform which are mainly two; the segmentation and reassembly of data packets and the selective data protection.    On this basis, we develop an MPEG-2 specific NAL. It provides a perceptual syntactic information protection, the PSIP, which results in an intelligent and minimum overhead protection of video information. The PSIP takes advantage of the hierarchical organization of the compressed video data, common to the majority of the compression algorithms, to perform a selective data protection based on the perceptual relevance of the syntactic information.    The transmission over the combined NAL-MAAL layers shows significant improvement in terms of CLR and perceptual quality compared to equivalent transmissions over AAL5 with the same overhead.    The usage of the MPQM as a performance metric, which is one of the main contributions of this thesis, leads to a very interesting observation. The experimental results show that for unexpectedly high CLRs, the average perceptual quality remains close to the original value. The economical potential of such an observation is very important. Given that the data flows are VBR, it is possible to improve network utilization by means of statistical multiplexing. It is therefore possible to reduce the cost per communication by increasing the number of connections with a minimal loss in quality.    This conclusion could not have been derived without the combined usage of perceptual and network QoS metrics, which have been able to unveil the economic potential of perceptually protected streams.    The proposed concepts are finally tested in a real environment where a proof-of-concept implementation of the MAAL has shown a behavior close to the simulated results therefore validating the proposed multimedia protocol layers
    corecore