472 research outputs found

    Semantic Knowledge Graphs for the News: A Review

    Get PDF
    ICT platforms for news production, distribution, and consumption must exploit the ever-growing availability of digital data. These data originate from different sources and in different formats; they arrive at different velocities and in different volumes. Semantic knowledge graphs (KGs) is an established technique for integrating such heterogeneous information. It is therefore well-aligned with the needs of news producers and distributors, and it is likely to become increasingly important for the news industry. This article reviews the research on using semantic knowledge graphs for production, distribution, and consumption of news. The purpose is to present an overview of the field; to investigate what it means; and to suggest opportunities and needs for further research and development.publishedVersio

    Using linguistic graph similarity to search for sentences in news articles

    Get PDF
    With the volume of daily news growing to sizes too big to handle for any individual human, there is a clear need for effective search algorithms. Since traditional bag-of-words approaches are inherently limited since they ignore much of the information that is embedded in the structure of the text, we propose a linguistic approach to search called Destiny in this paper. With Destiny, sentences, both from news items and the user queries, are represented as graphs where the nodes represent the words in the sentence and the edges represent the grammatical relations between the words. The proposed algorithm is evaluated against a TF-IDF baseline using a custom corpus of user-rated sentences. Destiny significantly outperforms TF-IDF in terms of Mean Average Precision, normalized Discounted Cumulative Gain, and Spearman's Rho

    Automated Detection of Financial Events in News Text

    Get PDF
    Today’s financial markets are inextricably linked with financial events like acquisitions, profit announcements, or product launches. Information extracted from news messages that report on such events could hence be beneficial for financial decision making. The ubiquity of news, however, makes manual analysis impossible, and due to the unstructured nature of text, the (semi-)automatic extraction and application of financial events remains a non-trivial task. Therefore, the studies composing this dissertation investigate 1) how to accurately identify financial events in news text, and 2) how to effectively use such extracted events in financial applications. Based on a detailed evaluation of current event extraction systems, this thesis presents a competitive, knowledge-driven, semi-automatic system for financial event extraction from text. A novel pattern language, which makes clever use of the system’s underlying knowledge base, allows for the definition of simple, yet expressive event extraction rules that can be applied to natural language texts. The system’s knowledge-driven internals remain synchronized with the latest market developments through the accompanying event-triggered update language for knowledge bases, enabling the definition of update rules. Additional research covered by this dissertation investigates the practical applicability of extracted events. In automated stock trading experiments, the best performing trading rules do not only make use of traditional numerical signals, but also employ news-based event signals. Moreover, when cleaning stock data from disruptions caused by financial events, financial risk analyses yield more accurate results. These results suggest that events detected in news can be used advantageously as supplementary parameters in financial applications

    Personalized Financial News Recommendation Algorithm Based on Ontology

    Get PDF
    AbstractTo deal with the challenge of information overload, in this paper, we propose a financial news recommendation algorithm which help users find the articles that are interesting to read. To settle the ambiguity problem, a new presented OF-IDF method is employed to represent the unstructured text data in the form of key concepts, synonyms and synsets which are all stored in the domain ontology. For users, the recommendation algorithm build the profiles based on their behaviors to detect the genuine interests and predict current interests automatically and in real time by applying the thinking of relevance feedback. Finally, the experiment conducted on a financial news dataset demonstrates that the proposed algorithm significantly outperforms the performance of a traditional recommender

    Personalized News Recommender using Twitter

    Get PDF
    Online news reading has become a widely popular way to read news articles from news sources around the globe. With the enormous amount of news articles available, users are easily swamped by information of little interest to them. News recommender systems are one approach to help users find interesting articles to read. News recommender systems present the articles to individual users based on their interests rather than presenting articles in order of their occurrence. In this thesis, we present our research on developing personalized news recommendation system with the help of a popular micro-blogging service Twitter . The news articles are ranked based on the popularity of the article that is identified with the help of the tweets from the Twitter\u27s public timeline. Also, user profiles are built based on the user\u27s interests and the news articles are ranked by matching the characteristics of the user profile. With the help of these two approaches, we present a hybrid news recommendation model that recommends interesting news stories to the user based on their popularity and their relevance to the user profile

    Recommender systems in model-driven engineering: A systematic mapping review

    Full text link
    Recommender systems are information filtering systems used in many online applications like music and video broadcasting and e-commerce platforms. They are also increasingly being applied to facilitate software engineering activities. Following this trend, we are witnessing a growing research interest on recommendation approaches that assist with modelling tasks and model-based development processes. In this paper, we report on a systematic mapping review (based on the analysis of 66 papers) that classifies the existing research work on recommender systems for model-driven engineering (MDE). This study aims to serve as a guide for tool builders and researchers in understanding the MDE tasks that might be subject to recommendations, the applicable recommendation techniques and evaluation methods, and the open challenges and opportunities in this field of researchThis work has been funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 813884 (Lowcomote [134]), by the Spanish Ministry of Science (projects MASSIVE, RTI2018-095255-B-I00, and FIT, PID2019-108965GB-I00) and by the R&D programme of Madrid (Project FORTE, P2018/TCS-431

    Approaches to implement and evaluate aggregated search

    Get PDF
    La recherche d'information agrégée peut être vue comme un troisième paradigme de recherche d'information après la recherche d'information ordonnée (ranked retrieval) et la recherche d'information booléenne (boolean retrieval). Les deux paradigmes les plus explorés jusqu'à aujourd'hui retournent un ensemble ou une liste ordonnée de résultats. C'est à l'usager de parcourir ces ensembles/listes et d'en extraire l'information nécessaire qui peut se retrouver dans plusieurs documents. De manière alternative, la recherche d'information agrégée ne s'intéresse pas seulement à l'identification des granules (nuggets) d'information pertinents, mais aussi à l'assemblage d'une réponse agrégée contenant plusieurs éléments. Dans nos travaux, nous analysons les travaux liés à la recherche d'information agrégée selon un schéma général qui comprend 3 parties: dispatching de la requête, recherche de granules d'information et agrégation du résultat. Les approches existantes sont groupées autours de plusieurs perspectives générales telle que la recherche relationnelle, la recherche fédérée, la génération automatique de texte, etc. Ensuite, nous nous sommes focalisés sur deux pistes de recherche selon nous les plus prometteuses: (i) la recherche agrégée relationnelle et (ii) la recherche agrégée inter-verticale. * La recherche agrégée relationnelle s'intéresse aux relations entre les granules d'information pertinents qui servent à assembler la réponse agrégée. En particulier, nous nous sommes intéressés à trois types de requêtes notamment: requête attribut (ex. président de la France, PIB de l'Italie, maire de Glasgow, ...), requête instance (ex. France, Italie, Glasgow, Nokia e72, ...) et requête classe (pays, ville française, portable Nokia, ...). Pour ces requêtes qu'on appelle requêtes relationnelles nous avons proposés trois approches pour permettre la recherche de relations et l'assemblage des résultats. Nous avons d'abord mis l'accent sur la recherche d'attributs qui peut aider à répondre aux trois types de requêtes. Nous proposons une approche à large échelle capable de répondre à des nombreuses requêtes indépendamment de la classe d'appartenance. Cette approche permet l'extraction des attributs à partir des tables HTML en tenant compte de la qualité des tables et de la pertinence des attributs. Les différentes évaluations de performances effectuées prouvent son efficacité qui dépasse les méthodes de l'état de l'art. Deuxièmement, nous avons traité l'agrégation des résultats composés d'instances et d'attributs. Ce problème est intéressant pour répondre à des requêtes de type classe avec une table contenant des instances (lignes) et des attributs (colonnes). Pour garantir la qualité du résultat, nous proposons des pondérations sur les instances et les attributs promouvant ainsi les plus représentatifs. Le troisième problème traité concerne les instances de la même classe (ex. France, Italie, Allemagne, ...). Nous proposons une approche capable d'identifier massivement ces instances en exploitant les listes HTML. Toutes les approches proposées fonctionnent à l'échelle Web et sont importantes et complémentaires pour la recherche agrégée relationnelle. Enfin, nous proposons 4 prototypes d'application de recherche agrégée relationnelle. Ces derniers peuvent répondre des types de requêtes différents avec des résultats relationnels. Plus précisément, ils recherchent et assemblent des attributs, des instances, mais aussi des passages et des images dans des résultats agrégés. Un exemple est la requête ``Nokia e72" dont la réponse sera composée d'attributs (ex. prix, poids, autonomie batterie, ...), de passages (ex. description, reviews, ...) et d'images. Les résultats sont encourageants et illustrent l'utilité de la recherche agrégée relationnelle. * La recherche agrégée inter-verticale s'appuie sur plusieurs moteurs de recherche dits verticaux tel que la recherche d'image, recherche vidéo, recherche Web traditionnelle, etc. Son but principal est d'assembler des résultats provenant de toutes ces sources dans une même interface pour répondre aux besoins des utilisateurs. Les moteurs de recherche majeurs et la communauté scientifique nous offrent déjà une série d'approches. Notre contribution consiste en une étude sur l'évaluation et les avantages de ce paradigme. Plus précisément, nous comparons 4 types d'études qui simulent des situations de recherche sur un total de 100 requêtes et 9 sources différentes. Avec cette étude, nous avons identifiés clairement des avantages de la recherche agrégée inter-verticale et nous avons pu déduire de nombreux enjeux sur son évaluation. En particulier, l'évaluation traditionnelle utilisée en RI, certes la moins rapide, reste la plus réaliste. Pour conclure, nous avons proposé des différents approches et études sur deux pistes prometteuses de recherche dans le cadre de la recherche d'information agrégée. D'une côté, nous avons traité trois problèmes importants de la recherche agrégée relationnelle qui ont porté à la construction de 4 prototypes d'application avec des résultats encourageants. De l'autre côté, nous avons mis en place 4 études sur l'intérêt et l'évaluation de la recherche agrégée inter-verticale qui ont permis d'identifier les enjeux d'évaluation et les avantages du paradigme. Comme suite à long terme de ce travail, nous pouvons envisager une recherche d'information qui intègre plus de granules relationnels et plus de multimédia.Aggregated search or aggregated retrieval can be seen as a third paradigm for information retrieval following the Boolean retrieval paradigm and the ranked retrieval paradigm. In the first two, we are returned respectively sets and ranked lists of search results. It is up to the time-poor user to scroll this set/list, scan within different documents and assemble his/her information need. Alternatively, aggregated search not only aims the identification of relevant information nuggets, but also the assembly of these nuggets into a coherent answer. In this work, we present at first an analysis of related work to aggregated search which is analyzed with a general framework composed of three steps: query dispatching, nugget retrieval and result aggregation. Existing work is listed aside different related domains such as relational search, federated search, question answering, natural language generation, etc. Within the possible research directions, we have then focused on two directions we believe promise the most namely: relational aggregated search and cross-vertical aggregated search. * Relational aggregated search targets relevant information, but also relations between relevant information nuggets which are to be used to assemble reasonably the final answer. In particular, there are three types of queries which would easily benefit from this paradigm: attribute queries (e.g. president of France, GDP of Italy, major of Glasgow, ...), instance queries (e.g. France, Italy, Glasgow, Nokia e72, ...) and class queries (countries, French cities, Nokia mobile phones, ...). We call these queries as relational queries and we tackle with three important problems concerning the information retrieval and aggregation for these types of queries. First, we propose an attribute retrieval approach after arguing that attribute retrieval is one of the crucial problems to be solved. Our approach relies on the HTML tables in the Web. It is capable to identify useful and relevant tables which are used to extract relevant attributes for whatever queries. The different experimental results show that our approach is effective, it can answer many queries with high coverage and it outperforms state of the art techniques. Second, we deal with result aggregation where we are given relevant instances and attributes for a given query. The problem is particularly interesting for class queries where the final answer will be a table with many instances and attributes. To guarantee the quality of the aggregated result, we propose the use of different weights on instances and attributes to promote the most representative and important ones. The third problem we deal with concerns instances of the same class (e.g. France, Germany, Italy ... are all instances of the same class). Here, we propose an approach that can massively extract instances of the same class from HTML lists in the Web. All proposed approaches are applicable at Web-scale and they can play an important role for relational aggregated search. Finally, we propose 4 different prototype applications for relational aggregated search. They can answer different types of queries with relevant and relational information. Precisely, we not only retrieve attributes and their values, but also passages and images which are assembled into a final focused answer. An example is the query ``Nokia e72" which will be answered with attributes (e.g. price, weight, battery life ...), passages (e.g. description, reviews ...) and images. Results are encouraging and they illustrate the utility of relational aggregated search. * The second research direction that we pursued concerns cross-vertical aggregated search, which consists of assembling results from different vertical search engines (e.g. image search, video search, traditional Web search, ...) into one single interface. Here, different approaches exist in both research and industry. Our contribution concerns mostly evaluation and the interest (advantages) of this paradigm. We propose 4 different studies which simulate different search situations. Each study is tested with 100 different queries and 9 vertical sources. Here, we could clearly identify new advantages of this paradigm and we could identify different issues with evaluation setups. In particular, we observe that traditional information retrieval evaluation is not the fastest but it remains the most realistic. To conclude, we propose different studies with respect to two promising research directions. On one hand, we deal with three important problems of relational aggregated search following with real prototype applications with encouraging results. On the other hand, we have investigated on the interest and evaluation of cross-vertical aggregated search. Here, we could clearly identify some of the advantages and evaluation issues. In a long term perspective, we foresee a possible combination of these two kinds of approaches to provide relational and cross-vertical information retrieval incorporating more focus, structure and multimedia in search results

    Personalizacija sadržaja novinskih webskih portala pomoću tehnika izlučivanja informacija i težinskih Voronoievih dijagrama

    Get PDF
    News web portals present information, in previously defined topic taxonomy, in both multimedia as well as textual format, that cover all aspects of our daily lives. The information presented has a high refresh rate and as such offers a local as well as a global snapshot of the world. This thesis deals with the presentation of information extraction techniques (from web news portals) and their use in standardization of categorization schemes and automatic classification of newly published content. As the personalization method, weighted Voronoi diagrams are proposed. The aim of the study is to create a virtual profile based on the semantic value of information of visited nodes (web pages formatted with HTML language) at the individual level. The results can greatly contribute to the applicability of the personalization data to specific information sources, including various web news portals. Also, by creating a publicly available collection of prepared data future research in this domain is enabled. Scientific contribution of this doctoral thesis is therefore: a universal classification scheme, that is based on the ODP taxonomy data, is developed, a way for information extraction about user preferences, based on the analysis of user behavior data when using the Web browser, is defined, personalization system, based on the weighted Voronoi diagrams, is implemented.Jedan od načina rješavanja problema nastalih hiperprodukcijom informacija je putem personalizacije izvora informacija, u našem slučaju WWW okruženja, kreiranjem virtualnih profila temeljenih na analizi ponašajnih karakteristika korisnika s ciljem gradiranja važnosti informacija na individualnoj bazi. Sama personalizacija je najviše korištena u području pretraživanja informacija. U pregledu dosadašnjih istraživanja valja napomenuti nekoliko različitih pristupa koji su korišteni u personalizaciji dostupnog sadržaja: ontologijski pristupi, kontekstualni modeli, rudarenje podataka. Ti pristupi su najzastupljeniji u pregledanoj literaturi. Analizom literature također je uočen problem nedostatka ujednačene taksonomije pojmova koji se koriste za anotaciju informacijskih čvorova. Prevladavajući pristup anotacijije korištenje sustava označavanja koji se temelji na korisničkom unosu. Pregledani radovi ukazuju da korisnici na različitim sustavima vežu iste anotacije za iste i/ili slične objekte kod popularnih anotacija, da problem sinonima postoji ali da je zanemariv uz dovoljnu količinu podataka te da se anotacije korištene od strane običnih korisnika i stručnjaka domene preklapaju u 52% slučajeva. Ti podaci upućuju na problem nedostatka unificiranog sustava označavanja informacijskog čvora. Sustavi označavanja nose sa sobom veliku količinu "informacijskog šuma" zbog individualne prirode označavanja informacijskog čvora koji je izravno vezan za korisnikovo poznavanje domene informacijskog čvora. Kao potencijalno rješenje ovog uočenog nedostatka predlaže se korištenje postojećih taksonomija definiranih putem web direktorija. Pregled literature, od nekoliko mogućih web direktorija, najviše spominje ODP web direktorij kao najkvalitetniju taksonomiju hijerarhijske domenske kategorizacije informacijskih čvorova. Korištenje ODP kao taksonomije je navedeno unekoliko radova proučenih u sklopu obavljenog predistraživanja. Korištenjem ODP taksonomije za klasifikaciju informacijskih čvorova omogućuje se određivanje domenske pripadnosti. Ta činjenica omogućuje dodjelu vrijednosti pripadnosti informacijskog čvora pojedinoj domeni. S obzirom na kompleksnu strukturu ODP taksonomije (12 hijerarhijskih razina podjele, 17 kategorija na prvoj razini) i velikom broju potencijalnih kategorija, predlaže korištenje ODP taksonomije za klasifikaciju informacijskog čvora do razine 6. Uz uputu o broju hijerarhijskih razina koje se preporučuju za korištenje prilikom analize ODP strukture, također ističe potrebu za dubinskom klasifikacijom dokumenata. Analizom literature primijećeno je da se problemu personalizacije pristupa prvenstveno u domeni pretraživanja informacija putem WWW sučelja te da je personalizacija informacija dostupnih putem web portala slabo istražena. Kroz brojne radove koji su konzultirani prilikom pripreme predistraživačke faze kao izvori podataka za analizu iskorišteni su različiti izvori informacija: serverske log datoteke, osobna povijest pregledavanja putem preglednikovih log datoteka, aplikacije za praćenje korisnikove interakcije sa sustavom , kolačići i drugi. Podaci prikupljeni putem jednog ili više gore navedenih izvora daju nam uvid u individualno kretanje korisnika unutar definiranog informacijskog i vremenskog okvira. U pregledanoj literaturi se tako prikupljeni podaci koriste za personalizaciju informacija no ne na individualnoj razini nego na temelju grupiranja korisnika u tematski slične grupe/cjeline. Cilj ovog rada je testirati postojeće metode, koje su prepoznate od koristi za daljnji rad, te unapređenje tih metoda težinskim Voronoi dijagramima radi ostvarivanja personalizacije na individualnoj razini. Korištenje težinskih Voronoi dijagrama do sada nije zabilježen u literaturi pa samim time predstavlja inovaciju na području personalizacije informacija. Od pomoći će u tom procesu biti i radovi koji se temeljno bave prepoznavanjem uzoraka korištenja informacijskih čvorova, kojih ima značajan broj te se ne mogu svi spomenuti. Postojanje ponašajnog uzorka povezanog bilo s dugoročnim i/ili kratkoročnim podacima o korisnikovu kretanju kroz informacijski prostor omogućuje kvalitetnije filtriranje i personalizaciju dostupnih informacija. S obzirom da je cilj ovog rada prikazati mogućnost individualne personalizacije, prepoznat je potencijal korištenja težinskih Voronoi dijagrama za potrebe izgradnje virtualnog semantičkog profila te personalizaciju informacija

    Personalizacija sadržaja novinskih webskih portala pomoću tehnika izlučivanja informacija i težinskih Voronoievih dijagrama

    Get PDF
    News web portals present information, in previously defined topic taxonomy, in both multimedia as well as textual format, that cover all aspects of our daily lives. The information presented has a high refresh rate and as such offers a local as well as a global snapshot of the world. This thesis deals with the presentation of information extraction techniques (from web news portals) and their use in standardization of categorization schemes and automatic classification of newly published content. As the personalization method, weighted Voronoi diagrams are proposed. The aim of the study is to create a virtual profile based on the semantic value of information of visited nodes (web pages formatted with HTML language) at the individual level. The results can greatly contribute to the applicability of the personalization data to specific information sources, including various web news portals. Also, by creating a publicly available collection of prepared data future research in this domain is enabled. Scientific contribution of this doctoral thesis is therefore: a universal classification scheme, that is based on the ODP taxonomy data, is developed, a way for information extraction about user preferences, based on the analysis of user behavior data when using the Web browser, is defined, personalization system, based on the weighted Voronoi diagrams, is implemented.Jedan od načina rješavanja problema nastalih hiperprodukcijom informacija je putem personalizacije izvora informacija, u našem slučaju WWW okruženja, kreiranjem virtualnih profila temeljenih na analizi ponašajnih karakteristika korisnika s ciljem gradiranja važnosti informacija na individualnoj bazi. Sama personalizacija je najviše korištena u području pretraživanja informacija. U pregledu dosadašnjih istraživanja valja napomenuti nekoliko različitih pristupa koji su korišteni u personalizaciji dostupnog sadržaja: ontologijski pristupi, kontekstualni modeli, rudarenje podataka. Ti pristupi su najzastupljeniji u pregledanoj literaturi. Analizom literature također je uočen problem nedostatka ujednačene taksonomije pojmova koji se koriste za anotaciju informacijskih čvorova. Prevladavajući pristup anotacijije korištenje sustava označavanja koji se temelji na korisničkom unosu. Pregledani radovi ukazuju da korisnici na različitim sustavima vežu iste anotacije za iste i/ili slične objekte kod popularnih anotacija, da problem sinonima postoji ali da je zanemariv uz dovoljnu količinu podataka te da se anotacije korištene od strane običnih korisnika i stručnjaka domene preklapaju u 52% slučajeva. Ti podaci upućuju na problem nedostatka unificiranog sustava označavanja informacijskog čvora. Sustavi označavanja nose sa sobom veliku količinu "informacijskog šuma" zbog individualne prirode označavanja informacijskog čvora koji je izravno vezan za korisnikovo poznavanje domene informacijskog čvora. Kao potencijalno rješenje ovog uočenog nedostatka predlaže se korištenje postojećih taksonomija definiranih putem web direktorija. Pregled literature, od nekoliko mogućih web direktorija, najviše spominje ODP web direktorij kao najkvalitetniju taksonomiju hijerarhijske domenske kategorizacije informacijskih čvorova. Korištenje ODP kao taksonomije je navedeno unekoliko radova proučenih u sklopu obavljenog predistraživanja. Korištenjem ODP taksonomije za klasifikaciju informacijskih čvorova omogućuje se određivanje domenske pripadnosti. Ta činjenica omogućuje dodjelu vrijednosti pripadnosti informacijskog čvora pojedinoj domeni. S obzirom na kompleksnu strukturu ODP taksonomije (12 hijerarhijskih razina podjele, 17 kategorija na prvoj razini) i velikom broju potencijalnih kategorija, predlaže korištenje ODP taksonomije za klasifikaciju informacijskog čvora do razine 6. Uz uputu o broju hijerarhijskih razina koje se preporučuju za korištenje prilikom analize ODP strukture, također ističe potrebu za dubinskom klasifikacijom dokumenata. Analizom literature primijećeno je da se problemu personalizacije pristupa prvenstveno u domeni pretraživanja informacija putem WWW sučelja te da je personalizacija informacija dostupnih putem web portala slabo istražena. Kroz brojne radove koji su konzultirani prilikom pripreme predistraživačke faze kao izvori podataka za analizu iskorišteni su različiti izvori informacija: serverske log datoteke, osobna povijest pregledavanja putem preglednikovih log datoteka, aplikacije za praćenje korisnikove interakcije sa sustavom , kolačići i drugi. Podaci prikupljeni putem jednog ili više gore navedenih izvora daju nam uvid u individualno kretanje korisnika unutar definiranog informacijskog i vremenskog okvira. U pregledanoj literaturi se tako prikupljeni podaci koriste za personalizaciju informacija no ne na individualnoj razini nego na temelju grupiranja korisnika u tematski slične grupe/cjeline. Cilj ovog rada je testirati postojeće metode, koje su prepoznate od koristi za daljnji rad, te unapređenje tih metoda težinskim Voronoi dijagramima radi ostvarivanja personalizacije na individualnoj razini. Korištenje težinskih Voronoi dijagrama do sada nije zabilježen u literaturi pa samim time predstavlja inovaciju na području personalizacije informacija. Od pomoći će u tom procesu biti i radovi koji se temeljno bave prepoznavanjem uzoraka korištenja informacijskih čvorova, kojih ima značajan broj te se ne mogu svi spomenuti. Postojanje ponašajnog uzorka povezanog bilo s dugoročnim i/ili kratkoročnim podacima o korisnikovu kretanju kroz informacijski prostor omogućuje kvalitetnije filtriranje i personalizaciju dostupnih informacija. S obzirom da je cilj ovog rada prikazati mogućnost individualne personalizacije, prepoznat je potencijal korištenja težinskih Voronoi dijagrama za potrebe izgradnje virtualnog semantičkog profila te personalizaciju informacija
    corecore