4 research outputs found

    Discovering semantic aspects of socially constructed knowledge hierarchy to boost the relevance of Web searching

    Get PDF
    The research intends to boost the relevance of Web search results by classifyingWebsnippet into socially constructed hierarchical search concepts, such as the mostcomprehensive human edited knowledge structure, the Open Directory Project (ODP). Thesemantic aspects of the search concepts (categories) in the socially constructed hierarchicalknowledge repositories are extracted from the associated textual information contributed bysocieties. The textual information is explored and analyzed to construct a category-documentset, which is subsequently employed to represent the semantics of the socially constructedsearch concepts. Simple API for XML (SAX), a component of JAXP (Java API for XMLProcessing) is utilized to read in and analyze the two RDF format ODP data files, structure.rdfand content.rdf. kNN, which is trained by the constructed category-document set, is used tocategorized the Web search results. The categorized Web search results are then ontologicallyfiltered based on the interactions of Web information seekers. Initial experimental resultsdemonstrate that the proposed approach can improve precision by 23.5%

    Probabilistic Personalized Recommendation Models For Heterogeneous Social Data

    Get PDF
    Content recommendation has risen to a new dimension with the advent of platforms like Twitter, Facebook, FriendFeed, Dailybooth, and Instagram. Although this uproar of data has provided us with a goldmine of real-world information, the problem of information overload has become a major barrier in developing predictive models. Therefore, the objective of this The- sis is to propose various recommendation, prediction and information retrieval models that are capable of leveraging such vast heterogeneous content. More specifically, this Thesis focuses on proposing models based on probabilistic generative frameworks for the following tasks: (a) recommending backers and projects in Kickstarter crowdfunding domain and (b) point of interest recommendation in Foursquare. Through comprehensive set of experiments over a variety of datasets, we show that our models are capable of providing practically useful results for recommendation and information retrieval tasks

    Data and Text Mining Techniques for In-Domain and Cross-Domain Applications

    Get PDF
    In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modalities, each of which has a different representation, distribution, scale and density. For example, text is usually represented as discrete sparse word count vectors, whereas an image is represented by pixel intensities, and so on. Nowadays plenty of Data Mining and Machine Learning techniques are proposed in literature, which have already achieved significant success in many knowledge engineering areas, including classification, regression and clustering. Anyway some challenging issues remain when tackling a new problem: how to represent the problem? What approach is better to use among the huge quantity of possibilities? What is the information to be used in the Machine Learning task and how to represent it? There exist any different domains from which borrow knowledge? This dissertation proposes some possible representation approaches for problems in different domains, from text mining to genomic analysis. In particular, one of the major contributions is a different way to represent a classical classification problem: instead of using an instance related to each object (a document, or a gene, or a social post, etc.) to be classified, it is proposed to use a pair of objects or a pair object-class, using the relationship between them as label. The application of this approach is tested on both flat and hierarchical text categorization datasets, where it potentially allows the efficient addition of new categories during classification. Furthermore, the same idea is used to extract conversational threads from an unregulated pool of messages and also to classify the biomedical literature based on the genomic features treated
    corecore