7 research outputs found
How to Rank Answers in Text Mining
In this thesis, we mainly focus on case studies about answers. We present the methodology CEW-DTW and assess its performance about ranking quality. Based on the CEW-DTW, we improve this methodology by combining Kullback-Leibler divergence with CEW-DTW, since Kullback-Leibler divergence can check the difference of probability distributions in two sequences.
However, CEW-DTW and KL-CEW-DTW do not care about the effect of noise and keywords from the viewpoint of probability distribution. Therefore, we develop a new methodology, the General Entropy, to see how probabilities of noise and keywords affect answer qualities. We firstly analyze some properties of the General Entropy, such as the value range of the General Entropy. Especially, we try to find an objective goal, which can be regarded as a standard to assess answers. Therefore, we introduce the maximum general entropy. We try to use the general entropy methodology to find an imaginary answer with the maximum entropy from the mathematical viewpoint (though this answer may not exist). This answer can also be regarded as an “ideal” answer. By comparing maximum entropy probabilities and global probabilities of noise and keywords respectively, the maximum entropy probability of noise is smaller than the global probability of noise, maximum entropy probabilities of chosen keywords are larger than global probabilities of keywords in some conditions. This allows us to determinably select the max number of keywords. We also use Amazon dataset and a small group of survey to assess the general entropy.
Though these developed methodologies can analyze answer qualities, they do not incorporate the inner connections among keywords and noise. Based on the Markov transition matrix, we develop the Jump Probability Entropy. We still adapt Amazon dataset to compare maximum jump entropy probabilities and global jump probabilities of noise and keywords respectively.
Finally, we give steps about how to get answers from Amazon dataset, including obtaining original answers from Amazon dataset, removing stopping words and collinearity. We compare our developed methodologies to see if these methodologies are consistent. Also, we introduce Wald–Wolfowitz runs test and compare it with developed methodologies to verify their relationships. Depending on results of comparison, we get conclusions about consistence of these methodologies and illustrate future plans
Recommended from our members
A Theory of Collective Cell Migration and the Design of Stochastic Surveillance Strategies
In nature, complex emergent behavior arises in groups of biological entities often as a result of simple local interactions between neighbors in space or on a network. In such cases, scientific inquiry is typically aimed at inferring these local rules. Conversely, in teams of robots, the goal is to create decentralized control laws which results in efficient global behavior. These behaviors are designed for tasks such as maintaining formation control, performing effective coverage control or persistently monitoring an environment. With this in mind, we consider the following: 1> the emergence of collective cell migration from local contact and mechanical feedback and 2> the design of unpredictable surveillance strategies for teams of robots.Collective cell migration is an essential part of tissue and organ morphogenesis during embryonic development, as well as of various disease processes, such as cancer. The vast majority of theoretical descriptions of collective cell behavior focus on large numbers of cells, but fail to accurately capture the dynamics of small groups of cells. Here we introduce a low-dimensional theoretical description that successfully describes single cell migration, cell collisions, collective dynamics in small groups of cells, and force propagation during sheet expansion, all within a common theoretical framework. We also explain the counter-intuitive observation that pairs of cells repel each other upon collision while they coordinate their motion in larger clusters.Conventional monitoring strategies used by teams of robots are deterministic in nature making it possible for intelligent intruders who study the motion of the patrolling agent to compromise the patrol route. This problem can be solved by designing random walkers on graphs which naturally incorporate unpredictability. Within this framework, we study and provide the first analytic expression for the first meeting time of multiple random walkers, in terms of their transition matrices. We also study two problems related to maximizing unpredictability: given graph and visit frequency constraints, 1> maximize the entropy rate generated by a Markov chain, and 2> maximize the return time entropy associated with the Markov chain, where the return time entropy is the weighted average over all graph nodes of the entropy of the first return times of the Markov chain