In complex, high dimensional and unstructured data it is often difficult to
extract meaningful patterns. This is especially the case when dealing with
textual data. Recent studies in machine learning, information theory and
network science have developed several novel instruments to extract the
semantics of unstructured data, and harness it to build a network of relations.
Such approaches serve as an efficient tool for dimensionality reduction and
pattern detection. This paper applies semantic network science to extract
ideological proximity in the international arena, by focusing on the data from
General Debates in the UN General Assembly on the topics of high salience to
international community. UN General Debate corpus (UNGDC) covers all high-level
debates in the UN General Assembly from 1970 to 2014, covering all UN member
states. The research proceeds in three main steps. First, Latent Dirichlet
Allocation (LDA) is used to extract the topics of the UN speeches, and
therefore semantic information. Each country is then assigned a vector
specifying the exposure to each of the topics identified. This intermediate
output is then used in to construct a network of countries based on information
theoretical metrics where the links capture similar vectorial patterns in the
topic distributions. Topology of the networks is then analyzed through network
properties like density, path length and clustering. Finally, we identify
specific topological features of our networks using the map equation framework
to detect communities in our networks of countries