101,604 research outputs found
Negative Statements Considered Useful
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities
Forecasting with time series imaging
Feature-based time series representations have attracted substantial
attention in a wide range of time series analysis methods. Recently, the use of
time series features for forecast model averaging has been an emerging research
focus in the forecasting community. Nonetheless, most of the existing
approaches depend on the manual choice of an appropriate set of features.
Exploiting machine learning methods to extract features from time series
automatically becomes crucial in state-of-the-art time series analysis. In this
paper, we introduce an automated approach to extract time series features based
on time series imaging. We first transform time series into recurrence plots,
from which local features can be extracted using computer vision algorithms.
The extracted features are used for forecast model averaging. Our experiments
show that forecasting based on automatically extracted features, with less
human intervention and a more comprehensive view of the raw time series data,
yields highly comparable performances with the best methods in the largest
forecasting competition dataset (M4) and outperforms the top methods in the
Tourism forecasting competition dataset
BlogForever D2.4: Weblog spider prototype and associated methodology
The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype
Fuzzy Content Mining for Targeted Advertisement
Content-targeted advertising system is becoming an increasingly important part of the funding source of free web services. Highly efficient content analysis is the pivotal key of such a system. This project aims to establish a content analysis engine involving fuzzy logic that is able to automatically analyze real user-posted Web documents such as blog entries. Based on the analysis result, the system matches and retrieves the most appropriate Web advertisements. The focus and complexity is on how to better estimate and acquire the keywords that represent a given Web document. Fuzzy Web mining concept will be applied to synthetically consider multiple factors of Web content. A Fuzzy Ranking System is established based on certain fuzzy (and some crisp) rules, fuzzy sets, and membership functions to get the best candidate keywords. Once it is has obtained the keywords, the system will retrieve corresponding advertisements from certain providers through Web services as matched advertisements, similarly to retrieving a products list from Amazon.com. In 87% of the cases, the results of this system can match the accuracy of the Google Adwords system. Furthermore, this expandable system will also be a solid base for further research and development on this topic
- …