2,341 research outputs found
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
Advances in Image Processing, Analysis and Recognition Technology
For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Classifying the suras by their lexical semantics :an exploratory multivariate analysis approach to understanding the Qur'an
PhD ThesisThe Qur'an is at the heart of Islamic culture. Careful, well-informed interpretation of
it is fundamental both to the faith of millions of Muslims throughout the world, and
also to the non-Islamic world's understanding of their religion. There is a long and
venerable tradition of Qur'anic interpretation, and it has necessarily been based on
literary-historical methods for exegesis of hand-written and printed text.
Developments in electronic text representation and analysis since the second half of
the twentieth century now offer the opportunity to supplement traditional techniques
by applying the newly-emergent computational technology of exploratory
multivariate analysis to interpretation of the Qur'an. The general aim of the present
discussion is to take up that opportunity.
Specifically, the discussion develops and applies a methodology for discovering the
thematic structure of the Qur'an based on a fundamental idea in a range of
computationally oriented disciplines: that, with respect to some collection of texts, the
lexical frequency profiles of the individual texts are a good indicator of their semantic
content, and thus provide a reliable criterion for their conceptual categorization
relative to one another. This idea is applied to the discovery of thematic
interrelationships among the suras that constitute the Qur'an by abstracting lexical
frequency data from them and then analyzing that data using exploratory multivariate
methods in the hope that this will generate hypotheses about the thematic structure of
the Qur'an.
The discussion is in eight main parts. The first part introduces the discussion. The
second gives an overview of the structure and thematic content of the Qur'an and of
the tradition of Qur'anic scholarship devoted to its interpretation. The third part
xvi
defines the research question to be addressed together with a methodology for doing
so. The fourth reviews the existing literature on the research question. The fifth
outlines general principles of data creation and applies them to creation of the data on
which the analysis of the Qur'an in this study is based. The sixth outlines general
principles of exploratory multivariate analysis, describes in detail the analytical
methods selected for use, and applies them to the data created in part five. The
seventh part interprets the results of the analyses conducted in part six with reference
to the existing results in Qur'anic interpretation described in part two. And, finally, the
eighth part draws conclusions relative to the research question and identifies
directions along which the work presented in this study can be developed
Information Preserving Processing of Noisy Handwritten Document Images
Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%
Exploring novel designs of NLP solvers: Architecture and Implementation of WORHP
Mathematical Optimization in general and Nonlinear Programming in particular, are applied by many scientific disciplines, such as the automotive sector, the aerospace industry, or the space agencies. With some established NLP solvers having been available for decades, and with the mathematical community being rather conservative in this respect, many of their programming standards are severely outdated. It is safe to assume that such usability shortcomings impede the wider use of NLP methods; a representative example is the use of static workspaces by legacy FORTRAN codes. This dissertation gives an account of the construction of the European NLP solver WORHP by using and combining software standards and techniques that have not previously been applied to mathematical software to this extent. Examples include automatic code generation, a consistent reverse communication architecture and the elimination of static workspaces. The result is a novel, industrial-grade NLP solver that overcomes many technical weaknesses of established NLP solvers and other mathematical software
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
- …