369,714 research outputs found

    Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

    Full text link
    Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

    Tsallis non-extensive statistics, intermittent turbulence, SOC and chaos in the solar plasma. Part one: Sunspot dynamics

    Full text link
    In this study, the nonlinear analysis of the sunspot index is embedded in the non-extensive statistical theory of Tsallis. The triplet of Tsallis, as well as the correlation dimension and the Lyapunov exponent spectrum were estimated for the SVD components of the sunspot index timeseries. Also the multifractal scaling exponent spectrum, the generalized Renyi dimension spectrum and the spectrum of the structure function exponents were estimated experimentally and theoretically by using the entropy principle included in Tsallis non extensive statistical theory, following Arimitsu and Arimitsu. Our analysis showed clearly the following: a) a phase transition process in the solar dynamics from high dimensional non Gaussian SOC state to a low dimensional non Gaussian chaotic state, b) strong intermittent solar turbulence and anomalous (multifractal) diffusion solar process, which is strengthened as the solar dynamics makes phase transition to low dimensional chaos in accordance to Ruzmaikin, Zeleny and Milovanov studies c) faithful agreement of Tsallis non equilibrium statistical theory with the experimental estimations of i) non-Gaussian probability distribution function, ii) multifractal scaling exponent spectrum and generalized Renyi dimension spectrum, iii) exponent spectrum of the structure functions estimated for the sunspot index and its underlying non equilibrium solar dynamics.Comment: 40 pages, 11 figure

    Financial advisors: a case of babysitters?

    Get PDF
    We merge administrative information from a large German discount brokerage firm with regional data to examine if financial advisors improve portfolio performance. Our data track accounts of 32,751 randomly selected individual customers over 66 months and allow direct comparison of performance across self-managed accounts and accounts run by, or in consultation with, independent financial advisors. In contrast to the picture painted by simple descriptive statistics, econometric analysis that corrects for the endogeneity of the choice of having a financial advisor suggests that advisors are associated with lower total and excess account returns, higher portfolio risk and probabilities of losses, and higher trading frequency and portfolio turnover relative to what account owners of given characteristics tend to achieve on their own. Regression analysis of who uses an IFA suggests that IFAs are matched with richer, older investors rather than with poorer, younger ones

    Measures of Analysis of Time Series (MATS): A MATLAB Toolkit for Computation of Multiple Measures on Time Series Data Bases

    Get PDF
    In many applications, such as physiology and finance, large time series data bases are to be analyzed requiring the computation of linear, nonlinear and other measures. Such measures have been developed and implemented in commercial and freeware softwares rather selectively and independently. The Measures of Analysis of Time Series ({\tt MATS}) {\tt MATLAB} toolkit is designed to handle an arbitrary large set of scalar time series and compute a large variety of measures on them, allowing for the specification of varying measure parameters as well. The variety of options with added facilities for visualization of the results support different settings of time series analysis, such as the detection of dynamics changes in long data records, resampling (surrogate or bootstrap) tests for independence and linearity with various test statistics, and discrimination power of different measures and for different combinations of their parameters. The basic features of {\tt MATS} are presented and the implemented measures are briefly described. The usefulness of {\tt MATS} is illustrated on some empirical examples along with screenshots.Comment: 25 pages, 9 figures, two tables, the software can be downloaded at http://eeganalysis.web.auth.gr/indexen.ht
    • 

    corecore