12 research outputs found

    Network Aided Classification and Detection of Data

    No full text
    Two important technological aspects of the Big data paradigm have been the emergence of massivescale Online Social Networks (OSNs) (such as Facebook and Twitter), and the rise of theopen data movement that has resulted in the creation of richly structured online datasets, such asWikipedia, Citeseer and the US federal government’s data.gov initiative. The examples of OSNsand online datasets cited above share the common feature that they can be thought of as OnlineInformation Graphs, in the sense that the information embedded in them has a natural graphstructure.In this thesis, we consider using this underlying Online Information Graph as a statistical priorto enhance classification accuracy of some hard machine learning problems. Specifically, we lookat instances where the graph is undirected and propose using the graph to define an Ising -Markov Random Field (MRF) prior. To begin with, we validate the Ising prior using a novel hypothesistesting framework based approach. Having validated the Ising prior, we demonstrate itsutility by showcasing Network Aided Vector classification (NAC) of real world data from fieldsas varied as vote prediction in the US senate, movie earnings level classification (using IMDbdataset) and county crime-level classification (using the US census data). We then consider aspecial case of the classification problem which involves Network Aided Detection (NAD) of aglobal sentiment in an OSN. To this end, we consider Latent Sentiment (LS) detection as well asMajority Sentiment detection. We analyze the performance of the trivial sentiment detector forLS detection using a novel communications-oriented viewpoint, where we view the underlyingnetwork as providing a weak channel code that transmits one bit of information (the binarysentiment) and perform error exponent analysis for various underlying graph models. We alsoaddress the problem of optimal Maximum A posterior Probability (MAP) detection of majoritysentiment in the highly noisy labels weak network effect (NW) scenario, deriving the High Temperature(HT) expansion formula for the partial partition function of the Ising model using thecode-puncturing idea from coding theory and then proposing an approximate MAP detector thatoutperforms the Maximum Likelihood (ML) detector and the trivial detector
    corecore