1,660 research outputs found

    SOMvisua: Data Clustering and Visualization Based on SOM and GHSOM

    Get PDF
    Text in web pages is based on expert opinion of a large number of people including the views of authors. These views are based on cultural or community aspects, which make extracting information from text very difficult. Search in text usually finds text similarities between paragraphs in documents. This paper proposes a framework for data clustering and visualization called SOMvisua. SOMvisua is based on a graph representation of data input for Self-Organizing Map (SOM) and Growing Hierarchically Self-Organizing Map (GHSOM) algorithms. In SOMvisua, sentences from an input article are represented as graph model instead of vector space model. SOM and GHSOM clustering algorithms construct knowledge from this article

    Self-Organizing Maps for clustering and visualization of bipartite graphs

    No full text
    National audienceGraphs (also frequently called networks) have attracted a burst of attention in the last years, with applications to social science, biology, computer science... The present paper proposes a data mining method for visualizing and clustering the nodes of a peculiar class of graphs: bipartite graphs. The method is based on a self-organizing map algorithm and relies on an extension of this approach to data described by a dissimilarity matrix

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    Individualized Storyline-based News Topic Retrospection

    Get PDF
    It takes a great effort for common news readers to track events promptly, and not to mention that they can retrospect them precisely after it occurred for a long time period. Although topic detection and tracking techniques have been developed to promptly identify and keep track of similar events in a topic and monitor their progress, the cognitive load remains for a reader to digest these reports. A storyline-based summarization may facilitate readers to recall occurred events in a topic by extracting informative sentences of news reports to compose a concise summary with essential episodes. This paper proposes SToRe (Story-line based Topic Retrospection), that identifies events from news reports and composes a storyline summary to portray the event evolution in a topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The main storyline guides the extraction of representative sentences from news articles to summarize occurred events. This study demonstrates that different topic term sets result in different storylines, and in turn, different summaries. This adaptation is useful for users to review occurred news topics in different storylines

    On-line relational and multiple relational SOM

    No full text
    International audienceIn some applications and in order to address real-world situations better, data may be more complex than simple numerical vectors. In some examples, data can be known only through their pairwise dissimilarities or through multiple dissimilarities, each of them describing a particular feature of the data set. Several variants of the Self Organizing Map (SOM) algorithm were introduced to generalize the original algorithm to the framework of dissimilarity data. Whereas median SOM is based on a rough representation of the prototypes, relational SOM allows representing these prototypes by a virtual linear combination of all elements in the data set, referring to a pseudo-euclidean framework. In the present article, an on-line version of relational SOM is introduced and studied. Similarly to the situation in the Euclidean framework, this on-line algorithm provides a better organization and is much less sensible to prototype initialization than standard (batch) relational SOM. In a more general case, this stochastic version allows us to integrate an additional stochastic gradient descent step in the algorithm which can tune the respective weights of several dissimilarities in an optimal way: the resulting \emph{multiple relational SOM} thus has the ability to integrate several sources of data of different types, or to make a consensus between several dissimilarities describing the same data. The algorithms introduced in this manuscript are tested on several data sets, including categorical data and graphs. On-line relational SOM is currently available in the R package SOMbrero that can be downloaded at http://sombrero.r-forge.r-project.org or directly tested on its Web User Interface at http://shiny.nathalievilla.org/sombrero

    Role of Business Intelligence and Knowledge Management in Solving Business Problems

    Get PDF
    The term “Business intelligence” is described as a plan or a strategy where the operations like reporting, data analysis, data mining, event processing are performed to improve the production and growth of a business enterprise or a business entity. And on the other hand, the “Knowledge management” is explained as well-organized management of resources and information within a commercial organization it can be a business too. Almost all business will have limitations and challenges which can be also known as the business problems. One of the main business problem is demand, the business plans must work according to the demand of the consumers. Analyzing the demand would provide the solutions for queries like what is the business trend? What is the need of the users? What should be the improvement make in the production? Where is the current position of the enterprise? And who all will be the competitors? For the predictive analysis a dataset of bitcoin is taken. The major aim of the study is to implement the strategies to overcome the business problems mainly the demand prediction. And the objective is to find out the relevant issues and the remedies by using knowledge management and business intelligence to the common business problems. The dataset has columns called lowest price, highest price, open price, close price, trading volume and market capital. The research methodology used is predictive analysis using PCA and K-means clustering algorithm. By this dataset predictive plots are developed as achieved results for easy analysis by using research methodology. PCA and K-means are the algorithm used for accurate prediction. The importance of study is to predict the future sale, as it is very essential for a business enterprise to find future demand so that the organization can improve production

    Topic Retrospection with Storyline-based Summarization on News Reports

    Get PDF
    The electronics newspaper becomes a main source for online news readers. When facing the numerous stories of a series of events, news readers need some supports in order to review a topic in an efficient way. Besides identifying events and presenting the search results with news titles and keywords the TDT (Topic Detection and Tracking) is used to do, a summarized text to present event evolution is necessary for general news readers to review events under a news topic. This paper proposes a topic retrospection process and implements the SToRe system that identifies various events under a news topic, and composes a summary that news readers can get the sketch of event evolution in the topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The summarization extracts the representative sentences and takes the main theme as the template to compose summary. The summarization not only provides enough information to comprehend the development of a topic, but also serves as an index to help readers to find more detailed information. A lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. The experimental results show that the SToRe system enables news readers to effectively and efficiently capture the evolution of a news topic

    Formal concept matching and reinforcement learning in adaptive information retrieval

    Get PDF
    The superiority of the human brain in information retrieval (IR) tasks seems to come firstly from its ability to read and understand the concepts, ideas or meanings central to documents, in order to reason out the usefulness of documents to information needs, and secondly from its ability to learn from experience and be adaptive to the environment. In this work we attempt to incorporate these properties into the development of an IR model to improve document retrieval. We investigate the applicability of concept lattices, which are based on the theory of Formal Concept Analysis (FCA), to the representation of documents. This allows the use of more elegant representation units, as opposed to keywords, in order to better capture concepts/ideas expressed in natural language text. We also investigate the use of a reinforcement leaming strategy to learn and improve document representations, based on the information present in query statements and user relevance feedback. Features or concepts of each document/query, formulated using FCA, are weighted separately with respect to the documents they are in, and organised into separate concept lattices according to a subsumption relation. Furthen-nore, each concept lattice is encoded in a two-layer neural network structure known as a Bidirectional Associative Memory (BAM), for efficient manipulation of the concepts in the lattice representation. This avoids implementation drawbacks faced by other FCA-based approaches. Retrieval of a document for an information need is based on concept matching between concept lattice representations of a document and a query. The learning strategy works by making the similarity of relevant documents stronger and non-relevant documents weaker for each query, depending on the relevance judgements of the users on retrieved documents. Our approach is radically different to existing FCA-based approaches in the following respects: concept formulation; weight assignment to object-attribute pairs; the representation of each document in a separate concept lattice; and encoding concept lattices in BAM structures. Furthermore, in contrast to the traditional relevance feedback mechanism, our learning strategy makes use of relevance feedback information to enhance document representations, thus making the document representations dynamic and adaptive to the user interactions. The results obtained on the CISI, CACM and ASLIB Cranfield collections are presented and compared with published results. In particular, the performance of the system is shown to improve significantly as the system learns from experience.The School of Computing, University of Plymouth, UK

    A framework for an Integrated Mining of Heterogeneous data in decision support systems

    Get PDF
    The volume of information available on the Internet and corporate intranets continues to increase along with the corresponding increase in the data (structured and unstructured) stored by many organizations. Over the past years, data mining techniques have been used to explore large volume of data (structured) in order to discover knowledge, often in form of a decision support system. For effective decision making, there is need to discover knowledge from both structured and unstructured data for completeness and comprehensiveness. The aim of this paper is to present a framework to discover this kind of knowledge and to present a report on the work-in-progress on an on going research work. The proposed framework is composed of three basic phases: extraction and integration, data mining and finally the relevance of such a system to the business decision support system. In the first phase, both the structured and unstructured data are combined to form an XML database (combined data warehouse (CDW)). Efficiency is enhanced by clustering of unstructured data (documents) using SOM (Self Organized Maps) clustering algorithm, extracting keyphrases based on training and TF/IDF (Term Frequency/Inverse Document Frequency) by using the KEA (Keyphrases Extraction Algorithm) toolkit. In the second phase, association rule mining technique is applied to discover knowledge from the combined data warehouse. The final phase reflects the changes that such a system will bring about to the marketing decision support system. The paper also describes a developed system which evaluates the association rules mined from structured data that forms the first phase of the research work. The proposed system is expected to improve the quality of decisions, and this will be evaluated by using standard metrics for evaluating the interestingness of association rule which is based on statistical independence and correlation analysis
    corecore