Search CORE

866 research outputs found

Summarizing information from Web sites on distributed power generation and alternative energy development

Author: Fung C.C.
Thanadechteemapat W.
Wong K.P.
Publication venue
Publication date: 01/01/2009
Field of study

The World Wide Web (WWW) has become a huge repository of information and knowledge, and an essential channel for information exchange. Many sites and thousands of pages of information on distributed power generation and alternate energy development are being added or modified constantly and the task of finding the most appropriate information is getting difficult. While search engines are capable to return a collection of links according to key terms and some forms of ranking mechanism, it is still necessary to access the Web page and navigate through the site in order to find the information. This paper proposes an interactive summarization framework called iWISE to facilitate the process by providing a summary of the information on the Web site. The proposed approach makes use of graphical visualization, tag clouds and text summarization. A number of cases are presented and compared in this paper with a discussion on future work

Research Repository

Software Newsroom – an approach to automation of news search and editing

Author: Gross Oskar
Huovelin Juhani
Linden Krister
Maisala Sami Petri Tapio
Niemi Jyrki
Oittinen Tero
Silfverberg Miikka
Solin Otto
Toivonen Hannu
Publication venue
Publication date: 07/11/2013
Field of study

We have developed tools and applied methods for automated identification of potential news from textual data for an automated news search system called Software Newsroom. The purpose of the tools is to analyze data collected from the internet and to identify information that has a high probability of containing new information. The identified information is summarized in order to help understanding the semantic contents of the data, and to assist the news editing process. It has been demonstrated that words with a certain set of syntactic and semantic properties are effective when building topic models for English. We demonstrate that words with the same properties in Finnish are useful as well. Extracting such words requires knowledge about the special characteristics of the Finnish language, which are taken into account in our analysis. Two different methodological approaches have been applied for the news search. One of the methods is based on topic analysis and it applies Multinomial Principal Component Analysis (MPCA) for topic model creation and data profiling. The second method is based on word association analysis and applies the log-likelihood ratio (LLR). For the topic mining, we have created English and Finnish language corpora from Wikipedia and Finnish corpora from several Finnish news archives and we have used bag-of-words presentations of these corpora as training data for the topic model. We have performed topic analysis experiments with both the training data itself and with arbitrary text parsed from internet sources. The results suggest that the effectiveness of news search strongly depends on the quality of the training data and its linguistic analysis. In the association analysis, we use a combined methodology for detecting novel word associations in the text. For detecting novel associations we use the background corpus from which we extract common word associations. In parallel, we collect the statistics of word co-occurrences from the documents of interest and search for associations with larger likelihood in these documents than in the background. We have demonstrated the applicability of these methods for Software Newsroom. The results indicate that the background-foreground model has significant potential in news search. The experiments also indicate great promise in employing background-foreground word associations for other applications. A combined application of the two methods is planned as well as the application of the methods on social media using a pre-translator of social media language.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Developing a Formal Model for Mind Maps

Author: Papatheodorou Christos
Siochos Vasilis
Publication venue
Publication date: 01/03/2011
Field of study

Mind map is a graphical technique, which is used to represent words, concepts, tasks or other connected items or arranged around central topic or idea. Mind maps are widely used, therefore exist plenty of software programs to create or edit them, while there is none format for the model representation, neither a standard format. This paper presents and effort to propose a formal mind map model aiming to describe the structure, content, semantics and social connections. The structure describes the basic mind map graph consisted of a node set, an edge set, a cloud set and a graphical connections set. The content includes the set of the texts and objects linked to the nodes. The social connections are the mind maps of other users, which form the neighborhood of the mind map owner in a social networking system. Finally, the mind map semantics is any true logic connection between mind map textual parts and a concept. Each of these elements of the model is formally described building the suggested mind map model. Its establishment will support the application of algorithms and methods towards their information extraction