Search CORE

Determining the most representative image on a Web page

Author: Frasincar F. (Flavius)
Vyas K. (Krishna)
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

We investigate how to determine the most representative image on a Web page. This problem has not been thoroughly investigated and, up to today, only expert-based algorithms have been proposed in the literature. We attempt to improve the performance of known algorithms with the use of Support Vector Machines (SVM). Besides, our algorithm distinguishes itself from existing literature with the introduction of novel image features, including previously unused meta-data protocols. Also, we design and attempt a less-restrictive ranking methodology in the image preprocessing stage of our algorithm. We find that the application of the SVM framework with our improved classification methodology increases the F1 score from 27.2% to 38.5%, as compared to a state-of-the-art method. Introducing novel image features and applying backward feature selection, we find that the F1 score rises to 40.0%. Lastly, we use a class-weighted SVM in order to resolve the imbalance in number of representative images. This final modification improves the classification performance of our algorithm even further to 43.9%, outperforming our benchmark algorithms, including those of Facebook and Google. Suggested beneficiaries are the search engine community, image retrieval community, including the commercial sector due to superior performance

A Temporal Web Ontology Language

Author: Frasincar F. (Flavius)
Kaymak U. (Uzay)
Milea V. (Viorel)
Publication venue: Milea, V. (Viorel)
Publication date: 01/01/2009
Field of study

The Web Ontology Language (OWL) is the most expressive standard language for modeling ontologies on the Semantic Web. In this paper, we present a temporal extension of the very expressive fragment SHIN(D) of the OWL-DL language resulting in the tOWL language. Through a layered approach we introduce 3 extensions: i) Concrete Domains, that allows the representation of restrictions using concrete domain binary predicates, ii) Temporal Representation, that introduces timepoints, relations between timepoints, intervals, and Allen’s 13 interval relations into the language, and iii) TimeSlices/Fluents, that implements a perdurantist view on individuals and allows for the representation of complex temporal aspects, such as process state transitions. We illustrate the expressiveness of the newly introduced language by providing a TBox representation of Leveraged Buy Out (LBO) processes in financial applications and an ABox representation of one specific LBO

EUR Research Repository

News recommendations using CF-IDF

Author: Frasincar F.
Hogenboom A.C.
Jong de, F.M.G.
Kaymak U.
Publication venue: BNAIC
Publication date: 01/01/2011
Field of study

Most of the traditional recommendation algorithms are based on TF-IDF, a term-based weighting method. This paper proposes a new method for recommending news items based on the weighting of the occurrences of references to concepts, which we call Concept Frequency-Inverse Document Frequency (CFIDF). In an experimental setup we apply CF-IDF to a set of newswires in which we detect 1; 167 instances of a set of 65 concepts from a domain ontology. The proposed method yields significantly better results with respect to accuracy, recall, and F1 than the TF-IDF method we use as a basis for comparison

A lexical approach for taxonomy mapping

Author: Frasincar F. (Flavius)
Nederstigt L.J. (Lennart)
Vandic D. (Damir)
Publication venue
Publication date: 01/03/2016
Field of study

Obtaining a useful complete overview of Web-based product information has become difficult nowadays due to the ever-growing amount of information available on online shops. Findings from previous studies suggest that better search capabilities, such as the exploitation of annotated data, are needed to keep online shopping transparent for the user. Annotations can, for example, help present information from multiple sources in a uniform manner. In order to support the product data integration process, we propose an algorithm that can autonomously map heterogeneous product taxonomies from different online shops. The proposed approach uses word sense disambiguation techniques, approximate lexical matching, and a mechanism that deals with composite categories. Our algorithm’s performance compared favorably against two other state-of-the-art taxonomy mapping algorithms on three real-life datasets. The results show that the F1-measure for our algorithm is on average 60% higher than a state-of-the-art product taxonomy mapping algorithm

Prediction of the MSCI EURO index based on fuzzy grammar fragments extracted from European central bank statements

Author: Almeida R.J.
Frasincar F.
Kaymak U.
Milea D.V.
Sharef N.M.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2010
Field of study

We focus on predicting the movement of the MSCI EURO index based on European Central Bank (ECB) statements. For this purpose we learn and extract fuzzy grammars from the text of the ECB statements. Based on a set of selected General Inquirer (GI) categories, the extracted fuzzy grammars are grouped around individual content categories. The frequency at which these fuzzy grammars are encountered in the text constitute input to a Fuzzy Inference System (FIS). The FIS maps these frequencies to the levels of the MSCI EURO index. Ultimately, the goal is to predict whether the MSCI EURO index will exhibit upward or downward movement based on the content of ECB statements, as quantified through the use of fuzzy grammars and GI content categories

Financial news analysis using a semantic web approach

Author: Frasincar F.
Kaymak U.
Mast L.
Micu A.
Milea D.V.
Publication venue: 'IGI Global'
Publication date: 01/01/2008
Field of study

In this paper we present StockWatcher, an OWL-based web application that enables the extraction of relevant news items from RSS feeds concerning the NASDAQ-100 listed companies. The application's goal is to present a customized, aggregated view of the news categorized by different topics. We distinguish between four relevant news categories: i) news regarding the company itself, ii) news regarding direct competitors of the company, iii) news regarding important people of the company, and iv) news regarding the industry in which the company is active. At the same time, the system presented in this chapter is able to rate these news items based on their relevance. We identify three possible effects that a news message can have on the company, and thus on the stock price of that company: i) positive, ii) negative, and iii) neutral. Currently, StockWatcher provides support for the NASDAQ-100 companies. The selection of the relevant news items is based on a customizable user portfolio that may consist of one or more of these companies

Computational content analysis of European Central Bank statements

Author: Almeida R.J.
Frasincar F.
Kaymak U.
Milea D.V.
Sharef N.M.
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we present a framework for the computational content analysis of European Central Bank (ECB) statements. Based on this framework, we provide two approaches that can be used in a practical context. Both approaches use the content of ECB statements to predict upward and downward movement in the MSCI EURO index. General Inquirer (GI) is used for the quantification of the content of the statements. In the first approach, we rely on the frequency of adjectives in the text of the ECB statements in relation to the content categories they represent. The second approach uses fuzzy grammar fragments composed of economic terms and content categories. Our results indicate that the two proposed approaches perform better than a random classifier for predicting upward or downward movement of the MSCI EURO index

Semantic web-based knowledge acquisition using key events from news

Author: F Frasincar
F P Hogenboom
U Kaymak
Publication venue
Publication date: 30/04/2020
Field of study

Abstract Hermes is an ontology-based framework for building news personalization services, which focuses on news classification and knowledge base updating. The framework also allows for news querying and result presentation. In this paper, we focus on the techniques involved in keeping Hermes' internal knowledge base up-to-date. Essentially, our semi-automatic approach to knowledge acquisition from news is based on ontologies and lexico-semantic patterns

CiteSeerX

Automatically Building Financial Sentiment Lexicons While Accounting for Negation

Author: Bos T. (Thomas)
Frasincar F. (Flavius)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/02/2021
Field of study

Financial investors make trades based on available information. Previous research has proved that microblogs are a useful source for supporting stock market decisions. However, the financial domain lacks specific sentiment lexicons that could be utilized to extract the sentiment from these microblogs. In this research, we investigate automatic approaches that can be used to build financial sentiment lexicons. We introduce weighted versions of the Pointwise Mutual Information approaches to build sentiment lexicons automatically. Furthermore, existing sentiment lexicons often neglect negation while building the sentiment lexicons. In this research, we also propose two methods (Negated Word and Flip Sentiment) to extend the sentiment building approaches to take into account negation when constructing a sentiment lexicon. We build the financial sentiment lexicons by leveraging 200,000 messages from StockTwits. We evaluate the constructed financial sentiment lexicons in two different sentiment classification tasks (unsupervised and supervised). In addition, the created financial sentiment lexicons are compared with each other and with other existing sentiment lexicons. The best performing financial sentiment lexicon is built by combining our Weighted Normalized Pointwise Mutual Information approach with the Negated Word appro