7 research outputs found

    Constrained Hierarchical Clustering via Graph Coarsening and Optimal Cuts

    Full text link
    Motivated by extracting and summarizing relevant information in short sentence settings, such as satisfaction questionnaires, hotel reviews, and X/Twitter, we study the problem of clustering words in a hierarchical fashion. In particular, we focus on the problem of clustering with horizontal and vertical structural constraints. Horizontal constraints are typically cannot-link and must-link among words, while vertical constraints are precedence constraints among cluster levels. We overcome state-of-the-art bottlenecks by formulating the problem in two steps: first, as a soft-constrained regularized least-squares which guides the result of a sequential graph coarsening algorithm towards the horizontal feasible set. Then, flat clusters are extracted from the resulting hierarchical tree by computing optimal cut heights based on the available constraints. We show that the resulting approach compares very well with respect to existing algorithms and is computationally light.Comment: 5 pages, appeared at the Asilomar Conference on Signals, Systems, and Computer, 11/202

    Personalized Recommendation of Web Pages Using Group Average Agglomerative Hierarchical Clustering (GAAHC)

    Get PDF
    Entrepreneurs are investing heavily on marketing and promoting business through the websites to enhance their online reputation and draw the attention of the web users. Website structure plays the vital role in attracting the web users. Creating personalized website structure for individual user by restructuring the web site structure is a tedious and endless job. If the users do not find the required information easily in the websites, then users abandon such websites. Hence, personalized recommendation of web pages to the web users increases the user’s interest and the time they spend in the website. Personalization is the process of creating customized participation of users to a website, rather than providing a broad participation. Personalization allows the website to present the users with the unique participation bespoke to their demands and passion. Personalized recommendation is a challenging task, which has drawn the focus of many researchers. Personalization has to trace the behavior of individual users. Usage behavior can be traced by observing the individual navigation patterns using web log file of the specific website. This method requires session identification, clustering sessions into similar clusters and building a model for personalized recommendations using access time length and frequency of access. Most of the existing works on this topic have used K-Means clustering with Euclidean distance. K-Means suffers from choosing the initial random center and sequence of page visits is not considered. The proposed research work uses Group Average Agglomerative Hierarchical Clustering (GAAHC), with Modified Levenshtein

    Machine Learning Tips and Tricks for Power Line Communications

    Get PDF
    4openopenTonello A.M.; Letizia N.A.; Righini D.; Marcuzzi F.Tonello, A. M.; Letizia, N. A.; Righini, D.; Marcuzzi, F

    An exploration of methodologies to improve semi-supervised hierarchical clustering with knowledge-based constraints

    Get PDF
    Clustering algorithms with constraints (also known as semi-supervised clustering algorithms) have been introduced to the field of machine learning as a significant variant to the conventional unsupervised clustering learning algorithms. They have been demonstrated to achieve better performance due to integrating prior knowledge during the clustering process, that enables uncovering relevant useful information from the data being clustered. However, the research conducted within the context of developing semi-supervised hierarchical clustering techniques are still an open and active investigation area. Majority of current semi-supervised clustering algorithms are developed as partitional clustering (PC) methods and only few research efforts have been made on developing semi-supervised hierarchical clustering methods. The aim of this research is to enhance hierarchical clustering (HC) algorithms based on prior knowledge, by adopting novel methodologies. [Continues.

    On Two Web IR Boosting Tools: Clustering and Ranking

    Get PDF
    This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience. In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly. There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs. When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis. An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies
    corecore