2,905 research outputs found

    Retrieving with good sense

    Get PDF
    Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

    TALKING TO ME? CREATING NETWORKS FROM ONLINE COMMUNITY LOGS

    Get PDF
    Online communities offer many potential sources of value to individuals and organisations. However, the effectiveness of online communities in delivering benefits such as knowledge sharing depends on the network of social relations within a community. Research in this area aims to understand and op-timize such networks. Researchers in this area employ diverse network creation methods, with little focus on the selection process, the fit of the selected method, or its relative accuracy. In this study we evaluate and compare the performance of four network creation methods. First we review the litera-ture to identify four network creation methods (algorithms) and their underlying assumptions. Using several data sets from an online community we test and compare the accuracy of each method against a baseline (‘actual’) network determined by content analysis. We use visual inspection, network cor-relation analysis and sensitivity analysis to highlight similarities and differences between the methods, and find some differences significant enough to impact study results. Based on our observations we argue for more careful selection of network creation methods. We propose two key guidelines for re-search into social networks that uses unstructured data from online communities. The study contrib-utes to the rigour of methodological decisions underpinning research in this area

    The early history and emergence of molecular functions and modular scale-free network behavior

    Get PDF
    The formation of protein structural domains requires that biochemical functions, defined by conserved amino acid sequence motifs, be embedded into a structural scaffold. Here we trace domain history onto a bipartite network of elementary functional loop (EFL) sequences and domain structures defined at the fold superfamily (FSF) level of Structural Classification of Proteins (SCOP). The resulting ‘elementary functionome’ network and its EFL and FSF graph projections unfold evolutionary ‘waterfalls’ describing emergence of primordial functions. Waterfalls reveal how ancient EFLs are shared by FSF structures in two initial waves of functional innovation that involve founder ‘p-loop’ and ‘winged helix’ domain structures. They also uncover a dynamics of modular motif embedding in domain structures that is ongoing, which transfers ‘preferential’ cooption properties of ancient EFLs to emerging FSFs. Remarkably, we find that the emergence of molecular functions induces hierarchical modularity and power law behavior in network evolution as the networks of motifs and structures expand metabolic pathways and translation

    institutional entrepreneurship and change in consumer protection policy in the telecommunications sector innovations in the text based analysis approach

    Get PDF
    AbstractThis article analyses the institutional entrepreneurship within independent regulatory agencies (IRAs) as a variable explaining policy change over time and seeks to offer new insight for the identification and analysis of structure-agency relationships. The article contributes to the institutional entrepreneurship research agenda by connecting changes in IRA consumer protection policy to changes in agency leadership (specifically, agency presidents). The method used relies upon a quantitative and qualitative text analysis approach to connect and pinpoint structure-agency dynamics over time. The empirical sections compare and contrast the results obtained through the content analysis of the annual reports issued between 2000 and 2015 by the Italian Communications Authority (Agcom), and illustrate variations between periodic changes to Agcom's presidency and changes in ideas, strategies and tools in the field of consumer protection in the telecommunications sector

    Implementing Semantic Document Search Using a Bounded Random Walk in a Probabilistic Graph

    Get PDF
    Given a set of documents and an input query that is expressed using natural language, the problem of document search is retrieving all relevant documents ordered by the degree of relevance. Semantic document search fetches not only documents that contain words from the input query, but also documents that are semantically relevant. For example, the query friendly pets will consider documents that contain the words dog and cat , among others. One way to implement semantic search is to use a probabilistic graph in which the input query is connected to the documents through paths that contain semantically similar words and phrases, where we use WordNet to initially populate the graph. Each edge in the graph is labeled with the conditional probability that the destination node is relevant given that the source node is relevant. Our semantic document search algorithm works in two phases. In the first phase, we find all documents in the graph that are close to the input query and create a bounded subgraph that includes the query, the found documents, and the paths that connect them. In the second phase, we simulate multiple random walks. Each random walk starts at the input query and continues until a document is reached, a jump outside the bounding subgraph is made, or the number of allowed jumps is exhausted. This allows us to rank the documents based on the number of random walks that terminated in them. We experimentally validated the algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. We show that we achieve higher value for the mean average precision (MAP) measure than a keywords-based search algorithm and a previously published algorithm that relies on a variation of the probabilistic graph

    The seriation problem in the presence of a double Fiedler value

    Get PDF
    Seriation is a problem consisting of seeking the best enumeration order of a set of units whose interrelationship is described by a bipartite graph. An algorithm for spectral seriation based on the use of the Fiedler vector of the Laplacian matrix associated to the problem was developed by Atkins et al. under the assumption that the Fiedler value is simple. In this paper, we analyze the case in which the Fiedler value of the Laplacian is not simple, discuss its effect on the set of the admissible solutions, and study possible approaches to actually perform the computation. Examples and numerical experiments illustrate the effectiveness of the proposed methods

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Illicit Activity Detection in Large-Scale Dark and Opaque Web Social Networks

    Get PDF
    Many online chat applications live in a grey area between the legitimate web and the dark net. The Telegram network in particular can aid criminal activities. Telegram hosts “chats” which consist of varied conversations and advertisements. These chats take place among automated “bots” and human users. Classifying legitimate activity from illegitimate activity can aid law enforcement in finding criminals. Social network analysis of Telegram chats presents a difficult problem. Users can change their username or create new accounts. Users involved in criminal activity often do this to obscure their identity. This makes establishing the unique identity behind a given username challenging. Thus we explored classifying users from their language usage in their chat messages.The volume and velocity of Telegram chat data place it well within the domain of big data. Machine learning and natural language processing (NLP) tools are necessary to classify this chat data. We developed NLP tools for classifying users and the chat group to which their messages belong. We found that legitimate and illegitimate chat groups could be classified with high accuracy. We also were able to classify bots, humans, and advertisements within conversations
    • 

    corecore