Search CORE

72 research outputs found

Event detection in Twitter with an event knowledge base

Author: Grobelnik Marko
Mladenić Dunja
Rei Luis
Publication venue: Institut Jožef Stefan
Publication date: 08/01/2016
Field of study

Digital repository of Slovenian research organizations

Autonomous Sensor Data Cleaning in Stream Mining Setting

Author: Dunja Mladenić
Klemen Kenda
Publication venue: IRENET, Society for Advancing Innovation and Research in Economy
Publication date: 01/01/2018
Field of study

Background: Internet of Things (IoT), earth observation and big scientific experiments are sources of extensive amounts of sensor big data today. We are faced with large amounts of data with low measurement costs. A standard approach in such cases is a stream mining approach, implying that we look at a particular measurement only once during the real-time processing. This requires the methods to be completely autonomous. In the past, very little attention was given to the most time-consuming part of the data mining process, i.e. data pre-processing. Objectives: In this paper we propose an algorithm for data cleaning, which can be applied to real-world streaming big data. Methods/Approach: We use the short-term prediction method based on the Kalman filter to detect admissible intervals for future measurements. The model can be adapted to the concept drift and is useful for detecting random additive outliers in a sensor data stream. Results: For datasets with low noise, our method has proven to perform better than the method currently commonly used in batch processing scenarios. Our results on higher noise datasets are comparable. Conclusions: We have demonstrated a successful application of the proposed method in real-world scenarios including the groundwater level, server load and smart-grid data

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

ASSIGNING KEYWORDS TO DOCUMENTS USING MACHINE LEARNING

Author: Dunja Mladenić
Marko Grobelnik
Publication venue: Faculty of Organization and Informatics University of Zagreb
Publication date: 01/01/1999
Field of study

This paper describes the usage of machine learning techniques to assign keywords to documents. The large hierarchy of documents available on the Web, the Yahoo hierarchy, is used here as a real-world problem domain. Machine learning techniques developed for learning on text data are used here in the hierarchical classification structure. The high number of features is reduced by taking into account the hierarchical structure and using a feature subset selection based on the method used in information retrieval. Documents are represented as word-vectors that include word sequences (n-grams) instead of just single words. The hierarchical structure of the examples and class values is taken into account when defining the subproblems and forming training examples for them. Additionally, a hierarchical structure of class values is used in classification, where only promising paths in the hierarchy are considered

CiteSeerX

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Editorial

Author: Dunja Mladenić
Marko Tadić
Ray J. Paul
Publication venue: SRCE - University Computing Centre
Publication date: 01/01/2010
Field of study

Hrčak - Portal of scientific journals of Croatia

Contextualized Question Answering

Author: Blaž Fortuna
Boštjan Pajntar
Dunja Mladenić
Inna Novalija
Lorand Dali
Luka Bradeško
Marko Grobelnik
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2010
Field of study

The paper describes a system which enables accurate and easy-to-use contextualized question answering and it provides document overview functionalities. The possibility of asking natural language questions enables a friendly interaction for the user.The contextualization is achieved by using an ontology. The answers are provided based on a domain specific document collection of choice. The approach consists of several phases as follows: data preparation, data enhancement, data indexing and handling questions. Every module uses state of the art technologies that are shown to work in a complex pipeline to make available question answering on top of a given document repository with the context of ontologies, such as Cyc, ASFA and WordNet. The functioning of the proposed approach is demonstrated on English document collections on Aquatic Sciences and Fisheries — ASFA, using Cyc ontology, ASFA thesaurus as domain specific ontology and WordNet as general ontology. Experimental evaluation has shown that the usage of ontologies increases the number of answers retrieved by about 60%. However, the number of answers that are actually correct increases by only 40% when using ontologies

Crossref

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Recommended from our members

NewsMeSH: a new classifier designed to annotate health news with MeSH headings

Author: Arrúe Gabaráin Mónica
Belar Oihana
Bidaurrazaga Joseba
Carlin Paul
Epelde Gorka
Fuart Flavio
Grobelnik Marko
Henderson Christine
Konttila Jenni
Mladenić Dunja
Novalija Inna
Pita-Costa Joao
Pääkkönen Jarmo
Rei Luis
Staines Anthony
Stopar Luka
Wallace J. G.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Motivation In the age of big data, the amount of scientific information available online dwarfs the ability of current tools to support researchers in locating and securing access to the necessary materials. Well-structured open data and the smart systems that make the appropriate use of it are invaluable and can help health researchers and professionals to find the appropriate information by, e.g., configuring the monitoring of information or refining a specific query on a disease. Methods We present an automated text classifier approach based on the MEDLINE/MeSH thesaurus, trained on the manual annotation of more than 26 million expert-annotated scientific abstracts. The classifier was developed tailor-fit to the public health and health research domain experts, in the light of their specific challenges and needs. We have applied the proposed methodology on three specific health domains: the Coronavirus, Mental Health and Diabetes, considering the pertinence of the first, and the known relations with the other two health topics. Results A classifier is trained on the MEDLINE dataset that can automatically annotate text, such as scientific articles, news articles or medical reports with relevant concepts from the MeSH thesaurus. Conclusions The proposed text classifier shows promising results in the evaluation of health-related news. The application of the developed classifier enables the exploration of news and extraction of health-related insights, based on the MeSH thesaurus, through a similar workflow as in the usage of PubMed, with which most health researchers are familiar

Open Research Online

University of Oulu Repository - Jultika

Ulster University's Research Portal