5,608 research outputs found
Multiplex Communities and the Emergence of International Conflict
Advances in community detection reveal new insights into multiplex and
multilayer networks. Less work, however, investigates the relationship between
these communities and outcomes in social systems. We leverage these advances to
shed light on the relationship between the cooperative mesostructure of the
international system and the onset of interstate conflict. We detect
communities based upon weaker signals of affinity expressed in United Nations
votes and speeches, as well as stronger signals observed across multiple layers
of bilateral cooperation. Communities of diplomatic affinity display an
expected negative relationship with conflict onset. Ties in communities based
upon observed cooperation, however, display no effect under a standard model
specification and a positive relationship with conflict under an alternative
specification. These results align with some extant hypotheses but also point
to a paucity in our understanding of the relationship between community
structure and behavioral outcomes in networks.Comment: arXiv admin note: text overlap with arXiv:1802.0039
A New Similarity Measure for Document Classification and Text Mining
Accurate, efficient and fast processing of textual data and classification of electronic documents have become an important key factor in knowledge management and related businesses in today’s world. Text mining, information retrieval, and document classification systems have a strong positive impact on digital libraries and electronic content management, e-marketing, electronic archives, customer relationship management, decision support systems, copyright infringement, and plagiarism detection, which strictly affect economics, businesses, and organizations. In this study, we propose a new similarity measure that can be used with k-nearest neighbors (k-NN) and Rocchio algorithms, which are some of the well-known algorithms for document classification, information retrieval, and some other text mining purposes. We have tested our novel similarity measure with some structured textual data sets and we have compared the results with some other standard distance metrics and similarity measures such as Cosine similarity, Euclidean distance, and Pearson correlation coefficient. We have obtained some promising results, which show that this proposed similarity measure could be alternatively used within all suitable algorithms, methods, and models for text mining, document classification, and relevant knowledge management systems.
Keywords: text mining, document classification, similarity measures, k-NN, Rocchio algorith
Automatic coding of short text responses via clustering in educational assessment
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
- …