70 research outputs found
Parallels of human language in the behavior of bottlenose dolphins
A short review of similarities between dolphins and humans with the help of
quantitative linguistics and information theory
Exploring the law of text geographic information
Textual geographic information is indispensable and heavily relied upon in
practical applications. The absence of clear distribution poses challenges in
effectively harnessing geographic information, thereby driving our quest for
exploration. We contend that geographic information is influenced by human
behavior, cognition, expression, and thought processes, and given our intuitive
understanding of natural systems, we hypothesize its conformity to the Gamma
distribution. Through rigorous experiments on a diverse range of 24 datasets
encompassing different languages and types, we have substantiated this
hypothesis, unearthing the underlying regularities governing the dimensions of
quantity, length, and distance in geographic information. Furthermore,
theoretical analyses and comparisons with Gaussian distributions and Zipf's law
have refuted the contingency of these laws. Significantly, we have estimated
the upper bounds of human utilization of geographic information, pointing
towards the existence of uncharted territories. Also, we provide guidance in
geographic information extraction. Hope we peer its true countenance uncovering
the veil of geographic information.Comment: IP
Exploring the Law of Numbers: Evidence from China's Real Estate
The renowned proverb, Numbers do not lie, underscores the reliability and
insight that lie beneath numbers, a concept of undisputed importance,
especially in economics and finance etc. Despite the prosperity of Benford's
Law in the first digit analysis, its scope fails to remain comprehensiveness
when it comes to deciphering the laws of number. This paper delves into number
laws by taking the financial statements of China real estate as a
representative, quantitatively study not only the first digit, but also depict
the other two dimensions of numbers: frequency and length. The research
outcomes transcend mere reservations about data manipulation and open the door
to discussions surrounding number diversity and the delineation of the usage
insights. This study wields both economic significance and the capacity to
foster a deeper comprehension of numerical phenomena.Comment: DS
Scientists' bounded mobility on the epistemic landscape
Despite persistent efforts in revealing the temporal patterns in scientific
careers, little attention has been paid to the spatial patterns of scientific
activities in the knowledge space. Here, drawing on millions of papers in six
disciplines, we consider scientists' publication sequence as "walks" on the
quantifiable epistemic landscape constructed from large-scale bibliometric
corpora by combining embedding and manifold learning algorithms, aiming to
reveal the individual research topic dynamics and association between research
radius with academic performance, along their careers. Intuitively, the
visualization shows the localized and bounded nature of mobile trajectories. We
further find that the distributions of scientists' transition radius and
transition pace are both left-skewed compared with the results of controlled
experiments. Then, we observe the mixed exploration and exploitation pattern
and the corresponding strategic trade-off in the research transition, where
scientists both deepen their previous research with frequency bias and explore
new research with knowledge proximity bias. We further develop a bounded
exploration-exploitation (BEE) model to reproduce the observed patterns.
Moreover, the association between scientists' research radius and academic
performance shows that extensive exploration will not lead to a sustained
increase in academic output but a decrease in impact. In addition, we also note
that disruptive findings are more derived from an extensive transition, whereas
there is a saturation in this association. Our study contributes to the
comprehension of the mobility patterns of scientists in the knowledge space,
thereby providing significant implications for the development of scientific
policy-making.Comment: article paper, 47 pages, 29 figures, 4 table
Statistical language learning
Theoretical arguments based on the "poverty of the stimulus" have denied a
priori the possibility that abstract linguistic representations can be learned
inductively from exposure to the environment, given that the linguistic input
available to the child is both underdetermined and degenerate. I reassess such
learnability arguments by exploring a) the type and amount of statistical
information implicitly available in the input in the form of distributional and
phonological cues; b) psychologically plausible inductive mechanisms for
constraining the search space; c) the nature of linguistic representations,
algebraic or statistical. To do so I use three methodologies: experimental
procedures, linguistic analyses based on large corpora of naturally occurring
speech and text, and computational models implemented in computer
simulations.
In Chapters 1,2, and 5, I argue that long-distance structural dependencies
- traditionally hard to explain with simple distributional analyses based on ngram
statistics - can indeed be learned associatively provided the amount of
intervening material is highly variable or invariant (the Variability effect). In
Chapter 3, I show that simple associative mechanisms instantiated in Simple
Recurrent Networks can replicate the experimental findings under the same
conditions of variability. Chapter 4 presents successes and limits of such results
across perceptual modalities (visual vs. auditory) and perceptual presentation
(temporal vs. sequential), as well as the impact of long and short training
procedures. In Chapter 5, I show that generalisation to abstract categories from
stimuli framed in non-adjacent dependencies is also modulated by the Variability
effect. In Chapter 6, I show that the putative separation of algebraic and
statistical styles of computation based on successful speech segmentation versus
unsuccessful generalisation experiments (as published in a recent Science paper)
is premature and is the effect of a preference for phonological properties of the
input. In chapter 7 computer simulations of learning irregular constructions
suggest that it is possible to learn from positive evidence alone, despite Gold's
celebrated arguments on the unlearnability of natural languages. Evolutionary
simulations in Chapter 8 show that irregularities in natural languages can emerge
from full regularity and remain stable across generations of simulated agents. In
Chapter 9 I conclude that the brain may endowed with a powerful statistical
device for detecting structure, generalising, segmenting speech, and recovering
from overgeneralisations. The experimental and computational evidence gathered
here suggests that statistical language learning is more powerful than heretofore
acknowledged by the current literature
Modeling semantic compositionality of relational patterns
AbstractVector representation is a common approach for expressing the meaning of a relational pattern. Most previous work obtained a vector of a relational pattern based on the distribution of its context words (e.g., arguments of the relational pattern), regarding the pattern as a single âwordâ. However, this approach suffers from the data sparseness problem, because relational patterns are productive, i.e., produced by combinations of words. To address this problem, we propose a novel method for computing the meaning of a relational pattern based on the semantic compositionality of constituent words. We extend the Skip-gram model (Mikolov et al., 2013) to handle semantic compositions of relational patterns using recursive neural networks. The experimental results show the superiority of the proposed method for modeling the meanings of relational patterns, and demonstrate the contribution of this work to the task of relation extraction
Sentiment analysis on Bangla conversation using machine learning approach
Nowadays, online communication is more convenient and popular than face-to-face conversation. Therefore, people prefer online communication over face-to-face meetings. Enormous people use online chatting systems to speak with their loved ones at any given time throughout the world. People create massive quantities of conversation every second because of their online engagement. People's feelings during the conversation period can be gleaned as useful information from these conversations. Text analysis and conclusion of any material as summarization can be done using sentiment analysis by natural language processing. The use of communication for customer service portals in various e-commerce platforms and crime investigations based on digital evidence is increasing the need for sentiment analysis of a conversation. Other languages, such as English, have well-developed libraries and resources for natural language processing, yet there are few studies conducted on Bangla. It is more challenging to extract sentiments from Bangla conversational data due to the language's grammatical complexity. As a result, it opens vast study opportunities. So, support vector machine, multinomial naĂŻve Bayes, k-nearest neighbors, logistic regression, decision tree, and random forest was used. From the dataset, extracted information was labeled as positive and negative
Statistical language learning
Theoretical arguments based on the "poverty of the stimulus" have denied a priori the possibility that abstract linguistic representations can be learned inductively from exposure to the environment, given that the linguistic input available to the child is both underdetermined and degenerate. I reassess such learnability arguments by exploring a) the type and amount of statistical information implicitly available in the input in the form of distributional and phonological cues; b) psychologically plausible inductive mechanisms for constraining the search space; c) the nature of linguistic representations, algebraic or statistical. To do so I use three methodologies: experimental procedures, linguistic analyses based on large corpora of naturally occurring speech and text, and computational models implemented in computer simulations. In Chapters 1,2, and 5, I argue that long-distance structural dependencies - traditionally hard to explain with simple distributional analyses based on ngram statistics - can indeed be learned associatively provided the amount of intervening material is highly variable or invariant (the Variability effect). In Chapter 3, I show that simple associative mechanisms instantiated in Simple Recurrent Networks can replicate the experimental findings under the same conditions of variability. Chapter 4 presents successes and limits of such results across perceptual modalities (visual vs. auditory) and perceptual presentation (temporal vs. sequential), as well as the impact of long and short training procedures. In Chapter 5, I show that generalisation to abstract categories from stimuli framed in non-adjacent dependencies is also modulated by the Variability effect. In Chapter 6, I show that the putative separation of algebraic and statistical styles of computation based on successful speech segmentation versus unsuccessful generalisation experiments (as published in a recent Science paper) is premature and is the effect of a preference for phonological properties of the input. In chapter 7 computer simulations of learning irregular constructions suggest that it is possible to learn from positive evidence alone, despite Gold's celebrated arguments on the unlearnability of natural languages. Evolutionary simulations in Chapter 8 show that irregularities in natural languages can emerge from full regularity and remain stable across generations of simulated agents. In Chapter 9 I conclude that the brain may endowed with a powerful statistical device for detecting structure, generalising, segmenting speech, and recovering from overgeneralisations. The experimental and computational evidence gathered here suggests that statistical language learning is more powerful than heretofore acknowledged by the current literature.EThOS - Electronic Theses Online ServiceEuropean Union (EU) (HPRN-CT-1999-00065)GBUnited Kingdo
Acoustic sequences in non-human animals: a tutorial review and prospectus.
Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise - let alone understand - the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, 'Analysing vocal sequences in animals'. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality.This review was developed at an investigative workshop, âAnalyzing Animal Vocal Communication Sequencesâ that took place on October 21â23 2013 in Knoxville, Tennessee, sponsored by the National Institute for Mathematical and Biological Synthesis (NIMBioS). NIMBioS is an Institute sponsored by the National Science Foundation, the U.S. Department of Homeland Security, and the U.S. Department of Agriculture through NSF Awards #EF-0832858 and #DBI-1300426, with additional support from The University of Tennessee, Knoxville. In addition to the authors, Vincent Janik participated in the workshop. D.T.B.âs research is currently supported by NSF DEB-1119660. M.A.B.âs research is currently supported by NSF IOS-0842759 and NIH R01DC009582. M.A.R.âs research is supported by ONR N0001411IP20086 and NOPP (ONR/BOEM) N00014-11-1-0697. S.L.DeR.âs research is supported by the U.S. Office of Naval Research. R.F.-i-C.âs research was supported by the grant BASMATI (TIN2011-27479-C04-03) from the Spanish Ministry of Science and Innovation. E.C.G.âs research is currently supported by a National Research Council postdoctoral fellowship. E.E.V.âs research is supported by CONACYT, Mexico, award number I010/214/2012.This is the accepted manuscript. The final version is available at http://dx.doi.org/10.1111/brv.1216
- âŠ