2,341 research outputs found

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    mARC: Memory by Association and Reinforcement of Contexts

    Full text link
    This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

    Classifying the suras by their lexical semantics :an exploratory multivariate analysis approach to understanding the Qur'an

    Get PDF
    PhD ThesisThe Qur'an is at the heart of Islamic culture. Careful, well-informed interpretation of it is fundamental both to the faith of millions of Muslims throughout the world, and also to the non-Islamic world's understanding of their religion. There is a long and venerable tradition of Qur'anic interpretation, and it has necessarily been based on literary-historical methods for exegesis of hand-written and printed text. Developments in electronic text representation and analysis since the second half of the twentieth century now offer the opportunity to supplement traditional techniques by applying the newly-emergent computational technology of exploratory multivariate analysis to interpretation of the Qur'an. The general aim of the present discussion is to take up that opportunity. Specifically, the discussion develops and applies a methodology for discovering the thematic structure of the Qur'an based on a fundamental idea in a range of computationally oriented disciplines: that, with respect to some collection of texts, the lexical frequency profiles of the individual texts are a good indicator of their semantic content, and thus provide a reliable criterion for their conceptual categorization relative to one another. This idea is applied to the discovery of thematic interrelationships among the suras that constitute the Qur'an by abstracting lexical frequency data from them and then analyzing that data using exploratory multivariate methods in the hope that this will generate hypotheses about the thematic structure of the Qur'an. The discussion is in eight main parts. The first part introduces the discussion. The second gives an overview of the structure and thematic content of the Qur'an and of the tradition of Qur'anic scholarship devoted to its interpretation. The third part xvi defines the research question to be addressed together with a methodology for doing so. The fourth reviews the existing literature on the research question. The fifth outlines general principles of data creation and applies them to creation of the data on which the analysis of the Qur'an in this study is based. The sixth outlines general principles of exploratory multivariate analysis, describes in detail the analytical methods selected for use, and applies them to the data created in part five. The seventh part interprets the results of the analyses conducted in part six with reference to the existing results in Qur'anic interpretation described in part two. And, finally, the eighth part draws conclusions relative to the research question and identifies directions along which the work presented in this study can be developed

    Information Preserving Processing of Noisy Handwritten Document Images

    Get PDF
    Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%

    Exploring novel designs of NLP solvers: Architecture and Implementation of WORHP

    Get PDF
    Mathematical Optimization in general and Nonlinear Programming in particular, are applied by many scientific disciplines, such as the automotive sector, the aerospace industry, or the space agencies. With some established NLP solvers having been available for decades, and with the mathematical community being rather conservative in this respect, many of their programming standards are severely outdated. It is safe to assume that such usability shortcomings impede the wider use of NLP methods; a representative example is the use of static workspaces by legacy FORTRAN codes. This dissertation gives an account of the construction of the European NLP solver WORHP by using and combining software standards and techniques that have not previously been applied to mathematical software to this extent. Examples include automatic code generation, a consistent reverse communication architecture and the elimination of static workspaces. The result is a novel, industrial-grade NLP solver that overcomes many technical weaknesses of established NLP solvers and other mathematical software

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
    corecore