6,418 research outputs found

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    European exchange trading funds trading with locally weighted support vector regression

    Get PDF
    In this paper, two different Locally Weighted Support Vector Regression (wSVR) algorithms are generated and applied to the task of forecasting and trading five European Exchange Traded Funds. The trading application covers the recent European Monetary Union debt crisis. The performance of the proposed models is benchmarked against traditional Support Vector Regression (SVR) models. The Radial Basis Function, the Wavelet and the Mahalanobis kernel are explored and tested as SVR kernels. Finally, a novel statistical SVR input selection procedure is introduced based on a principal component analysis and the Hansen, Lunde, and Nason (2011) model confidence test. The results demonstrate the superiority of the wSVR models over the traditional SVRs and of the v-SVR over the ε-SVR algorithms. We note that the performance of all models varies and considerably deteriorates in the peak of the debt crisis. In terms of the kernels, our results do not confirm the belief that the Radial Basis Function is the optimum choice for financial series

    Spectral-spatial classification of hyperspectral images: three tricks and a new supervised learning setting

    Get PDF
    Spectral-spatial classification of hyperspectral images has been the subject of many studies in recent years. In the presence of only very few labeled pixels, this task becomes challenging. In this paper we address the following two research questions: 1) Can a simple neural network with just a single hidden layer achieve state of the art performance in the presence of few labeled pixels? 2) How is the performance of hyperspectral image classification methods affected when using disjoint train and test sets? We give a positive answer to the first question by using three tricks within a very basic shallow Convolutional Neural Network (CNN) architecture: a tailored loss function, and smooth- and label-based data augmentation. The tailored loss function enforces that neighborhood wavelengths have similar contributions to the features generated during training. A new label-based technique here proposed favors selection of pixels in smaller classes, which is beneficial in the presence of very few labeled pixels and skewed class distributions. To address the second question, we introduce a new sampling procedure to generate disjoint train and test set. Then the train set is used to obtain the CNN model, which is then applied to pixels in the test set to estimate their labels. We assess the efficacy of the simple neural network method on five publicly available hyperspectral images. On these images our method significantly outperforms considered baselines. Notably, with just 1% of labeled pixels per class, on these datasets our method achieves an accuracy that goes from 86.42% (challenging dataset) to 99.52% (easy dataset). Furthermore we show that the simple neural network method improves over other baselines in the new challenging supervised setting. Our analysis substantiates the highly beneficial effect of using the entire image (so train and test data) for constructing a model.Comment: Remote Sensing 201

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    PRIMJENA METODOLOGIJA MEKOGA RAČUNARSTVA U PREDVIĐANJU 28-DNEVNE TLAČNE ČVRSTOĆE MLAZNOGA BETONA: KOMPARATIVNA USPOREDBA INDIVIDUALNOGA I HIBRIDNOGA MODELA

    Get PDF
    Shotcreting is a popular construction technique with wide-ranging applications in mining and civil engineering. Compressive strength is a primary mechanical property of shotcrete with particular importance for project safety, which highly depends on its mix design. But in practice, there is no reliable and accurate method to predict this strength. In this study, existing experimental data related to shotcretes with 59 different mix designs are used to develop a series of soft computing methodologies, including individual artificial neural network, support vector regression, and M5P model tree and their hybrids with the fuzzy c-means clustering algorithm so as to predict the 28-day compressive strength of shotcrete. Analysis of the results shows the superiority of the hybrid model over the individual models in predicting the compressive strength of shotcrete. Overall, data clustering prior to use of machine learning techniques leads to certain improvement in their performance and reliability and generalizability of their results. In particular, the M5P model tree exhibits excellent capability in anticipating the compressive strength of shotcrete.Mlazni beton popularna je konstrukcijska tehnika široke uporabe u rudarstvu i građevinarstvu. Tlačna čvrstoća primarno je mehaničko svojstvo mlaznoga betona s posebnom važnošću za sigurnost projekta, ovisno o sastavu betona. U praksi ne postoji pouzdana i točna metoda za predviđanje toga svojstva. Ovdje su prikazani eksperimentalni podatci za 59 različitih sastava mlaznoga betona, na kojima je razvijen niz metodologija temeljem mekoga računarstva, uključujući pojedinačnu umjetnu neuronsku mrežu, podržanu vektorskom regresijom, stablastim dijagramima, njihovim hibridima na temelju klastera vrijednosti c-sredina, a s ciljem predviđanja promjene tlačne čvrstoće mlaznoga betona tijekom 28 dana. Općenito su klasteri podataka već prije uporabe strojnoga učenja znatno pomogli u kvaliteti, pouzdanosti i općenitosti rezultata. Posebno je istaknut stablasti model M5P kao onaj koji izvrsno predviđa tlačnu čvrstoću mlaznoga betona

    Advanced Optimization Methods and Big Data Applications in Energy Demand Forecast

    Get PDF
    The use of data collectors in energy systems is growing more and more. For example, smart sensors are now widely used in energy production and energy consumption systems. This implies that huge amounts of data are generated and need to be analyzed in order to extract useful insights from them. Such big data give rise to a number of opportunities and challenges for informed decision making. In recent years, researchers have been working very actively in order to come up with effective and powerful techniques in order to deal with the huge amount of data available. Such approaches can be used in the context of energy production and consumption considering the amount of data produced by all samples and measurements, as well as including many additional features. With them, automated machine learning methods for extracting relevant patterns, high-performance computing, or data visualization are being successfully applied to energy demand forecasting. In light of the above, this Special Issue collects the latest research on relevant topics, in particular in energy demand forecasts, and the use of advanced optimization methods and big data techniques. Here, by energy, we mean any kind of energy, e.g., electrical, solar, microwave, or win
    corecore