Search CORE

53,832 research outputs found

PERCEIVE: Precipitation Data Characterization by means on Frequent Spatio-Temporal Sequences

Author: Farasin Alessandro
Garza Paolo
Publication venue: Rochester Institute of Technology
Publication date
Field of study

Nowadays large amounts of climatology data, including daily precipitation data, are collected by means of sensors located in different locations of the world. The data driven analysis of these large data sets by means of scalable machine learning and data mining techniques allows extracting interesting knowledge from data, inferring interesting patterns and correlations among sets of spatio-temporal events and characterizing them. In this paper, we describe the PERCEIVE framework. PERCEIVE is a data-driven framework based on frequent spatio-temporal sequences and aims at extracting frequent correlations among spatio-temporal precipitation events. It is implemented by using R and Apache Spark, for scalability reasons, and provides also a visualization module that can be used to intuitively show the extracted patterns. A preliminary set of experiments show the efficiency and the effectiveness of PERCEIVE

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Learning Sentence-internal Temporal Relations

Author: Lapata M.
Lascarides A.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2006
Field of study

In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

Creating Full Individual-level Location Timelines from Sparse Social Media Data

Author: Chon Yohan
Jurgens David
Litman Todd
Sadilek Adam
Yu A. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/11/2019
Field of study

In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual's social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.Comment: 10 pages, 8 figures, 2 table

arXiv.org e-Print Archive

Crossref

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Author: Chu Chong
Nielsen Rasmus
Wu Yufeng
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

A log mining approach for process monitoring in SCADA

Author: Bolzoni Damiano
Hadziosmanovic Dina
Hartel Pieter
Publication venue: Springer
Publication date: 01/01/2012
Field of study

SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow

Springer - Publisher Connector

University of Twente Research Information

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Adaptive text mining: Inferring structure from sequences

Author: Witten Ian H.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively

Research Commons@Waikato

Classifying types of gesture and inferring intent

Author: Nehaniv C.L.
Publication venue: AISB
Publication date: 01/01/2005
Field of study

In order to infer intent from gesture, a rudimentary classification of types of gestures into five main classes is introduced. The classification is intended as a basis for incorporating the understanding of gesture into human-robot interaction (HRI). Some requirements for the operational classification of gesture by a robot interacting with humans are also suggested

University of Hertfordshire Research Archive