27 research outputs found

    Tracking Events in Social Media

    Get PDF
    Tracking topical events in social media streams, such as Twitter, provides a means for users to keep up-to-date on topics of interest to them. This tracking may last a period of days, or even weeks. These events and topics might be provided by users explicitly, or generated for users from selected news articles. Push notification from social media provides a method to push the updates directly to the users on their mobile devices or desktops. In this thesis, we start with a lexical comparison between carefully edited prose and social media posts, providing an improved understanding of word usage within social media. Compared with carefully edited prose, such as news articles and Wikipedia articles, the language of social media is informal in the extreme. By using word embeddings, we identify words whose usage differs greatly between a Wikipedia corpus and a Twitter corpus. Following from this work, we explore a general method for developing succinct queries, reflecting the topic of a given news article, for the purpose of tracking the associated news event within a social media stream. A series of probe queries are generated from an initial set of candidate keywords extracted from the article. By analyzing the results of these probes, we rank and trim the candidate set to create a succinct query. The method can also be used for linking and searching among different collections. Given a query for topical events, push notification to users directly from social media streams provides a method for them to keep up-to-date on topics of personal interest. We determine that the key to effective notification lies in controlling of update volume, by establishing and maintaining appropriate thresholds for pushing updates. We explore and evaluate multiple threshold setting strategies. Push notifications should be relevant to the personal interest, and timely, with pushes occurring as soon as after the actual event occurrence as possible and novel for providing non-duplicate information. An analysis of existing evaluation metrics for push notification reflects different assumptions regarding user requirements. This analysis leads to a framework that places different weights and penalties on different behaviours and can guide the future development of a family of evaluation metrics that more accurately models user needs. Throughout the thesis, rank similarity measures are applied to compare rankings generated by various experiments. As a final component, we develop a family of rank similarity metrics based on maximized effectiveness difference, each derived from a traditional information retrieval evaluation measure. Computing this maximized effectiveness difference (MED) requires the solution of an optimization problem that varies in difficulty, depending on the associated measure. We present solutions for several standard effectiveness measures, including nDCG, MAP, and ERR. Through experimental validation, we show that MED reveals meaningful differences between retrieval runs. Mathematically, MED is a metric, regardless of the associated measure. Prior work has established a number of other desiderata for rank similarity in the context of search, and we demonstrate that MED satisfies these requirements. Unlike previous proposals, MED allows us to directly translate assumptions about user behavior from any established effectiveness measure to create a corresponding rank similarity measure. In addition, MED cleanly accommodates partial relevance judgments, and if complete relevance information is available, it reduces to a simple difference between effectiveness values

    Event summarization on social media stream: retrospective and prospective tweet summarization

    Get PDF
    Le contenu généré dans les médias sociaux comme Twitter permet aux utilisateurs d'avoir un aperçu rétrospectif d'évènement et de suivre les nouveaux développements dès qu'ils se produisent. Cependant, bien que Twitter soit une source d'information importante, il est caractérisé par le volume et la vélocité des informations publiées qui rendent difficile le suivi de l'évolution des évènements. Pour permettre de mieux tirer profit de ce nouveau vecteur d'information, deux tâches complémentaires de recherche d'information dans les médias sociaux ont été introduites : la génération de résumé rétrospectif qui vise à sélectionner les tweets pertinents et non redondant récapitulant "ce qui s'est passé" et l'envoi des notifications prospectives dès qu'une nouvelle information pertinente est détectée. Notre travail s'inscrit dans ce cadre. L'objectif de cette thèse est de faciliter le suivi d'événement, en fournissant des outils de génération de synthèse adaptés à ce vecteur d'information. Les défis majeurs sous-jacents à notre problématique découlent d'une part du volume, de la vélocité et de la variété des contenus publiés et, d'autre part, de la qualité des tweets qui peut varier d'une manière considérable. La tâche principale dans la notification prospective est l'identification en temps réel des tweets pertinents et non redondants. Le système peut choisir de retourner les nouveaux tweets dès leurs détections où bien de différer leur envoi afin de s'assurer de leur qualité. Dans ce contexte, nos contributions se situent à ces différents niveaux : Premièrement, nous introduisons Word Similarity Extended Boolean Model (WSEBM), un modèle d'estimation de la pertinence qui exploite la similarité entre les termes basée sur le word embedding et qui n'utilise pas les statistiques de flux. L'intuition sous- jacente à notre proposition est que la mesure de similarité à base de word embedding est capable de considérer des mots différents ayant la même sémantique ce qui permet de compenser le non-appariement des termes lors du calcul de la pertinence. Deuxièmement, l'estimation de nouveauté d'un tweet entrant est basée sur la comparaison de ses termes avec les termes des tweets déjà envoyés au lieu d'utiliser la comparaison tweet à tweet. Cette méthode offre un meilleur passage à l'échelle et permet de réduire le temps d'exécution. Troisièmement, pour contourner le problème du seuillage de pertinence, nous utilisons un classificateur binaire qui prédit la pertinence. L'approche proposée est basée sur l'apprentissage supervisé adaptatif dans laquelle les signes sociaux sont combinés avec les autres facteurs de pertinence dépendants de la requête. De plus, le retour des jugements de pertinence est exploité pour re-entrainer le modèle de classification. Enfin, nous montrons que l'approche proposée, qui envoie les notifications en temps réel, permet d'obtenir des performances prometteuses en termes de qualité (pertinence et nouveauté) avec une faible latence alors que les approches de l'état de l'art tendent à favoriser la qualité au détriment de la latence. Cette thèse explore également une nouvelle approche de génération du résumé rétrospectif qui suit un paradigme différent de la majorité des méthodes de l'état de l'art. Nous proposons de modéliser le processus de génération de synthèse sous forme d'un problème d'optimisation linéaire qui prend en compte la diversité temporelle des tweets. Les tweets sont filtrés et regroupés d'une manière incrémentale en deux partitions basées respectivement sur la similarité du contenu et le temps de publication. Nous formulons la génération du résumé comme étant un problème linéaire entier dans lequel les variables inconnues sont binaires, la fonction objective est à maximiser et les contraintes assurent qu'au maximum un tweet par cluster est sélectionné dans la limite de la longueur du résumé fixée préalablement.User-generated content on social media, such as Twitter, provides in many cases, the latest news before traditional media, which allows having a retrospective summary of events and being updated in a timely fashion whenever a new development occurs. However, social media, while being a valuable source of information, can be also overwhelming given the volume and the velocity of published information. To shield users from being overwhelmed by irrelevant and redundant posts, retrospective summarization and prospective notification (real-time summarization) were introduced as two complementary tasks of information seeking on document streams. The former aims to select a list of relevant and non-redundant tweets that capture "what happened". In the latter, systems monitor the live posts stream and push relevant and novel notifications as soon as possible. Our work falls within these frameworks and focuses on developing a tweet summarization approaches for the two aforementioned scenarios. It aims at providing summaries that capture the key aspects of the event of interest to help users to efficiently acquire information and follow the development of long ongoing events from social media. Nevertheless, tweet summarization task faces many challenges that stem from, on one hand, the high volume, the velocity and the variety of the published information and, on the other hand, the quality of tweets, which can vary significantly. In the prospective notification, the core task is the relevancy and the novelty detection in real-time. For timeliness, a system may choose to push new updates in real-time or may choose to trade timeliness for higher notification quality. Our contributions address these levels: First, we introduce Word Similarity Extended Boolean Model (WSEBM), a relevance model that does not rely on stream statistics and takes advantage of word embedding model. We used word similarity instead of the traditional weighting techniques. By doing this, we overcome the shortness and word mismatch issues in tweets. The intuition behind our proposition is that context-aware similarity measure in word2vec is able to consider different words with the same semantic meaning and hence allows offsetting the word mismatch issue when calculating the similarity between a tweet and a topic. Second, we propose to compute the novelty score of the incoming tweet regarding all words of tweets already pushed to the user instead of using the pairwise comparison. The proposed novelty detection method scales better and reduces the execution time, which fits real-time tweet filtering. Third, we propose an adaptive Learning to Filter approach that leverages social signals as well as query-dependent features. To overcome the issue of relevance threshold setting, we use a binary classifier that predicts the relevance of the incoming tweet. In addition, we show the gain that can be achieved by taking advantage of ongoing relevance feedback. Finally, we adopt a real-time push strategy and we show that the proposed approach achieves a promising performance in terms of quality (relevance and novelty) with low cost of latency whereas the state-of-the-art approaches tend to trade latency for higher quality. This thesis also explores a novel approach to generate a retrospective summary that follows a different paradigm than the majority of state-of-the-art methods. We consider the summary generation as an optimization problem that takes into account the topical and the temporal diversity. Tweets are filtered and are incrementally clustered in two cluster types, namely topical clusters based on content similarity and temporal clusters that depends on publication time. Summary generation is formulated as integer linear problem in which unknowns variables are binaries, the objective function is to be maximized and constraints ensure that at most one post per cluster is selected with respect to the defined summary length limit

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading

    Explicit web search result diversification

    Get PDF
    Queries submitted to a web search engine are typically short and often ambiguous. With the enormous size of the Web, a misunderstanding of the information need underlying an ambiguous query can misguide the search engine, ultimately leading the user to abandon the originally submitted query. In order to overcome this problem, a sensible approach is to diversify the documents retrieved for the user's query. As a result, the likelihood that at least one of these documents will satisfy the user's actual information need is increased. In this thesis, we argue that an ambiguous query should be seen as representing not one, but multiple information needs. Based upon this premise, we propose xQuAD---Explicit Query Aspect Diversification, a novel probabilistic framework for search result diversification. In particular, the xQuAD framework naturally models several dimensions of the search result diversification problem in a principled yet practical manner. To this end, the framework represents the possible information needs underlying a query as a set of keyword-based sub-queries. Moreover, xQuAD accounts for the overall coverage of each retrieved document with respect to the identified sub-queries, so as to rank highly diverse documents first. In addition, it accounts for how well each sub-query is covered by the other retrieved documents, so as to promote novelty---and hence penalise redundancy---in the ranking. The framework also models the importance of each of the identified sub-queries, so as to appropriately cater for the interests of the user population when diversifying the retrieved documents. Finally, since not all queries are equally ambiguous, the xQuAD framework caters for the ambiguity level of different queries, so as to appropriately trade-off relevance for diversity on a per-query basis. The xQuAD framework is general and can be used to instantiate several diversification models, including the most prominent models described in the literature. In particular, within xQuAD, each of the aforementioned dimensions of the search result diversification problem can be tackled in a variety of ways. In this thesis, as additional contributions besides the xQuAD framework, we introduce novel machine learning approaches for addressing each of these dimensions. These include a learning to rank approach for identifying effective sub-queries as query suggestions mined from a query log, an intent-aware approach for choosing the ranking models most likely to be effective for estimating the coverage and novelty of multiple documents with respect to a sub-query, and a selective approach for automatically predicting how much to diversify the documents retrieved for each individual query. In addition, we perform the first empirical analysis of the role of novelty as a diversification strategy for web search. As demonstrated throughout this thesis, the principles underlying the xQuAD framework are general, sound, and effective. In particular, to validate the contributions of this thesis, we thoroughly assess the effectiveness of xQuAD under the standard experimentation paradigm provided by the diversity task of the TREC 2009, 2010, and 2011 Web tracks. The results of this investigation demonstrate the effectiveness of our proposed framework. Indeed, xQuAD attains consistent and significant improvements in comparison to the most effective diversification approaches in the literature, and across a range of experimental conditions, comprising multiple input rankings, multiple sub-query generation and coverage estimation mechanisms, as well as queries with multiple levels of ambiguity. Altogether, these results corroborate the state-of-the-art diversification performance of xQuAD

    Condensing Information: From Supervised To Crowdsourced Learning

    Full text link
    The main focus of this dissertation is new and improved ways of bringing high quality content to the users by leveraging the power of machine learning. Starting with a large amount of data we want to condense it into an easily digestible form by removing redundant and irrelevant parts and retaining only important information that is of interest to the user. Learning how to perform this from data allows us to use more complex models that better capture the notion of good content. Starting with supervised learning, this thesis proposes using structured prediction in conjunction with support vector machines to learn how to produce extractive summaries of textual documents. Representing summaries as a multivariate objects allows for modeling the dependencies between the summary components. An efficient approach to learning and predicting summaries is still possible by using a submodular objective/scoring function despite complex output space. The discussed approach can also be adapted to unsupervised setting and used to condense information in novel ways while retaining the same efficient submodular framework. Incorporating temporal dimension into summarization objective lead to a new way of visualizing flow of ideas and identifying novel contributions in a time-stamped corpus, which in turn help users gain a high level insight into evolution of it. Lastly, instead of trying to explicitly define an automated function used to condense information, one can leverage crowdsourcing. In particular, this thesis considers user feedback on online user-generated content to construct and improve content rankings. An analysis of a real-world dataset is presented and results suggest more accurate models of actual user voting patterns. Based on this new knowledge, an improved content ranking algorithm is proposed that delivers good content to the users in a shorter timeframe

    Social informatics

    Get PDF
    5th International Conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013, Proceedings</p

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Fostering awareness and collaboration in large-class lectures

    Get PDF
    For decades, higher education has been shaped by large-class lectures, which are characterized by large anonymous audiences. Well known issues of large-class lectures are a rather low degree of interactivity and a notable passivity of students, which are aggravated by the social environment created by large audiences. However, research indicates that an active involvement is indispensable for learning to be successful. Active partaking in lectures is thus often a goal of technology- supported lectures. An outstanding feature of social media is certainly their capabilities of facilitating interactions in large groups of participants. Social media thus seem to be a suitable basis for technology-enhanced learning in large-class lectures. However, existing general-purpose social media are often accompanied by several shortcomings that are assumed to hinder their proper use in lectures. This thesis therefore deals with the conception of a social medium, called Backstage, specially tailored for use in large-class lectures. Backstage provides both lecturer- as well as student-initiated communication by means of an Audience Response System and a backchannel. Audience Response Systems allow running quizzes in lectures, e.g., to assess knowledge, and can thus be seen as a technological support of question asking by the lecturer. These systems collect and aggregate the students' answers and report the results back to the audience in real-time. Audience Response Systems have shown to be a very effective means for sustaining lecture- relevant interactivity in lectures. Using a backchannel, students can initiate communication with peers or the lecturer. The backchannel is built upon microblogging, which has become a very popular communication medium in recent years. A key characteristic of microblogging is that messages are very concise, comprising only few words. The brief form of communication makes microblogging quite appealing for a backchannel in lectures. A preliminary evaluation of a first prototype conducted at an early stage of the project, however, indicated that a conventional digital backchannel is prone to information overload. Even a relatively small group can quickly render the backchannel discourse incomprehensible. This incomprehensibility is rooted in a lack of interactional coherence, a rather low communication efficiency, a high information entropy, and a lack of connection between the backchannel and the frontchannel, i.e., the lecture’s discourse. This thesis investigates remedies to these issues. To this aim, lecture slides are integrated in the backchannel to structure and to provide context for the backchannel discourse. The backchannel communication is revised to realize a collaborative annotation of slides by typed backchannel posts. To reduce information entropy backchannel posts have to be assigned to predefined categories. To establish a connection with the frontchannel, backchannel posts have to be stuck on appropriate locations on slides. The lecture slides also improve communication efficiency by routing, which means that the backchannel can filter such that it only shows the posts belonging to the currently displayed slide. Further improvements and modifications, e.g., of the Audience Response System, are described in this thesis. This thesis also reports on an evaluation of Backstage in four courses. The outcomes are promising. Students welcomed the use of Backstage. Backstage not only succeeded in increasing interactivity but also contributed to social awareness, which is a prerequisite of active participation. Furthermore, the backchannel communication was highly lecture-relevant. As another important result, an additional study conducted in collaboration with educational scientists was able to show that students in Backstage-supported lectures used their mobile devices to a greater extent for lecture-relevant activities compared to students in conventional lectures, in which mobile devices were mostly used for lecture-unrelated activities. To establish social control of the backchannel, this thesis investigates rating and ranking of backchannel posts. Furthermore, this thesis proposes a reputation system that aims at incentivizing desirable behavior in the backchannel. The reputation system is based on an eigenvector centrality similar to Google's PageRank. It is highly customizable and also allows considering quiz performance in the computation of reputation. All these approaches, rating, ranking as well as reputation systems have proven to be very effective mechanisms of social control in general-purpose social media.Seit Jahrzenten wird die universitäre Lehre durch Massenvorlesungen, die sich durch sehr große anonyme Hörerschaften auszeichnen, geprägt. Wohlbekannte Probleme von Massenvorlesungen sind ein sehr niedriger Grad an Interaktivität als auch eine augeprägte Passivität von Studenten, die auch durch die sozialen Rahmenbedingungen in großen Hörerschaften begünstigt werden. Dabei ist bekannt, dass eine aktive Auseinandersetzung mit dem Lernstoff für ein erfolgreiches Lernen unabdingbar ist. Eine aktive Teilnahme in Vorlesungen ist daher oft ein Ziel technologieunterstützter Vorlesungen. Ein herausragendes Merkmal von sozialen Medien ist sicherlich die Fähigkeit, Interaktionen in großen Gruppen zu ermöglichen. Soziale Medien scheinen deshalb eine geeignete Grundlage für technologie- unterstütztes Lernen zu sein. Jedoch sind allgemeine soziale Medien häufig auch mit Unzulänglichkeiten behaftet, die eine zweckmäßige Nutzung in Vorlesungen erschweren. Diese Arbeit beschäftigt sich deshalb mit der Konzipierung eines sozialen Mediums genannt Backstage, das speziell für die Nutzung in Vorlesungen zugeschnitten ist. Backstage ermöglicht sowohl dozenten- als auch eine studenteninitiierte Kommunikation mit Hilfe eines Audience Response Systems und eines Backchannels. Audience Response Systeme ermöglichen die Durchführung von Quizzen in Vorlesungen, beispielsweise um Wissen abzufragen, und können so als eine technologische Unterstützung des Fragenstellen durch den Dozenten betrachtet werden. Diese Systeme sammeln und aggregieren die Antworten der Studenten und liefern in Echtzeit die Ergebnisse zurück an die Hörerschaft. Es konnte gezeigt werden, dass Audience Response Systeme effektive Mittel zur Aufrechterhaltung vorlesungsbezogener Interaktivität sind. Durch einen Backchannel können auch Studenten Kommunikation mit anderen Studenten oder dem Dozenten initiieren. Der auf Backstage verfügbare Backchannel basiert auf Microblogging, was sich über die letzten Jahre zu einem sehr beliebten Kommunikationsmedium entwickelt hat. Eine Schlüsseleigenschaft des Microbloggings ist die Kürze von Nachrichten, die aus nur wenigen Wörtern bestehen. Die knappe Kommunikationsform macht Microblogging sehr attraktiv als Backchannel für Vorlesungen. Eine vorläufige Evaluation des ersten Prototyps, die zu einem frühen Zeitpunkt im Projekt durchgeführt wurde, zeigte jedoch, dass ein konventionelles Backchannel dazu neigt, die Teilnehmer zu überlasten. Sogar der Backchannel-Diskurs einer relativ kleinen Gruppe kann schnell unüberschaubar werden. Die Unüberschaubarkeit hat ihre Ursachen in einer mangelnden interaktionalen Kohärenz, eine vergleichsweise niedrige Kommunikationseffizienz, eine hohe Informationsentropie und eine fehlende Verknüpfung zwischen Backchannel und dem Vorlesungsvortrag. Diese Arbeit untersucht mögliche Abhilfen für die genannten Probleme. So werden Vorlesungsfolien integriert, um damit den Austausch auf dem Backchannel zu strukturieren und in einen Kontext zu bringen. Die Backchannel-Kommunikation wird zudem neu konzipiert, so dass es ein kollaboratives Annotieren von Folien mit Hilfe von getypten Backchannel-Nachrichten umsetzt. Die Typisierung von Backchannel-Nachrichten dient dazu, die Informationsentropie zu reduzieren. Um eine Verknüpfung mit dem Vorlesungsvortrag herzustellen, müssen zudem Backchannel-Nachrichten an die betreffenden Stellen auf Folien positioniert werden. Die Vorlesungsfolien verbessern auch die Kommunikationseffizienz durch das Routing, so dass der Backchannel nur die Nachrichten anzeigt, die zur aktuell angezeigten Folie gehören. Weitere Verbesserungen und Anpassungen des Systems, z.B. des Audience Response Systems, werden in dieser Arbeit beschrieben. Diese Arbeit berichtet über eine Evaluation von Backstage in vier großen Vorlesungen. Die Ergebnisse sind vielversprechend. So begrüßten die Studenten den Einsatz von Backstage. Backstage erhöhte nicht nur die Interaktivität sondern trug auch zur sozialen Awareness bei, die eine Voraussetzung für eine aktive Teilnahme ist. Die Backchannel-Kommunikation war zu einem hohen Grad vorlesungsbezogen. Zudem konnte in einer weiteren Studie, die zusammen mit Pädagogen durchgeführt wurde, gezeigt werden, dass Studenten ihre mobilen Endgeräte in Backstage-unterstützten Vorlesungen mehr für vorlesungsbezogene Aktivitäten genutzt haben als in konventionellen Vorlesungen, in welchen die mobilen Endgeräte hauptsächlich für vorlesungsfremde Aktivitäten genutzt wurden. Um soziale Kontrolle auf dem Backchannel zu etablieren, untersucht diese Arbeit Rating und Ranking von Backchannel-Nachrichten. Darüber hinaus schlägt diese Arbeit ein Reputationssystem vor, das als Ziel hat, einen Anreiz für erwünschtes Verhalten auf dem Backchannel zu schaffen. Das Reputationssystem basiert auf einer Eigenvektor-Zentralität, die an Googles PageRank angelehnt ist. Es ist zu einem hohen Grad anpassbar und ermöglicht auch die Berücksichtigung von Quizleistungen in der Berechnung von Reputation. Alle diese Ansätze, Rating, Ranking und Reputationssysteme haben sich in allgemeinen sozialen Medien als sehr effektive Mittel für soziale Kontrolle erwiesen

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok
    corecore