3,432 research outputs found

    A Personalized Facet-Weight Based Ranking Method for Service Component Retrieval

    Get PDF
    With the recent advanced computing, networking technologies and embedded systems, the computing paradigm has switched from mainframe and desktop computing to ubiquitous computing, one of whose visions is to provide intelligent, personalized and comprehensive services to users. As a new paradigm, Active Services is proposed to generate such services by retrieving, adapting, and composing of existing service components to satisfy user requirements. As the popularity of this paradigm and hence the number of service components increases, how to efficiently retrieve components to maximally meet user requirements has become a fundamental and significant problem. However, traditional facet-based retrieval methods only simply list out all the results without any kind of ranking and do not lay any emphasis on the differences of importance on each facet value in user requirements, which makes it hard for user to quickly select suitable components from the resulting list. To solve the problems, this paper proposes a novel personalized facet-weight based ranking method for service component retrieval, which assigns a weight for each facet to distinguish the importance of the facets, and constructs a personalized model to automatically calculate facet-weights for users according to their histo -rical retrieval records of the facet values and the weight setting. We optimize the parameters of the personalized model, evaluate the performance of the proposed retrieval method, and compare with the traditional facet-based matching methods. The experimental results show promising results in terms of retrieval accuracy and execution time

    Enriching ontological user profiles with tagging history for multi-domain recommendations

    Get PDF
    Many advanced recommendation frameworks employ ontologies of various complexities to model individuals and items, providing a mechanism for the expression of user interests and the representation of item attributes. As a result, complex matching techniques can be applied to support individuals in the discovery of items according to explicit and implicit user preferences. Recently, the rapid adoption of Web2.0, and the proliferation of social networking sites, has resulted in more and more users providing an increasing amount of information about themselves that could be exploited for recommendation purposes. However, the unification of personal information with ontologies using the contemporary knowledge representation methods often associated with Web2.0 applications, such as community tagging, is a non-trivial task. In this paper, we propose a method for the unification of tags with ontologies by grounding tags to a shared representation in the form of Wordnet and Wikipedia. We incorporate individuals' tagging history into their ontological profiles by matching tags with ontology concepts. This approach is preliminary evaluated by extending an existing news recommendation system with user tagging histories harvested from popular social networking sites

    Un environnement de spécification et de découverte pour la réutilisation des composants logiciels dans le développement des logiciels distribués

    Get PDF
    Notre travail vise Ă  Ă©laborer une solution efficace pour la dĂ©couverte et la rĂ©utilisation des composants logiciels dans les environnements de dĂ©veloppement existants et couramment utilisĂ©s. Nous proposons une ontologie pour dĂ©crire et dĂ©couvrir des composants logiciels Ă©lĂ©mentaires. La description couvre Ă  la fois les propriĂ©tĂ©s fonctionnelles et les propriĂ©tĂ©s non fonctionnelles des composants logiciels exprimĂ©es comme des paramĂštres de QoS. Notre processus de recherche est basĂ© sur la fonction qui calcule la distance sĂ©mantique entre la signature d'un composant et la signature d'une requĂȘte donnĂ©e, rĂ©alisant ainsi une comparaison judicieuse. Nous employons Ă©galement la notion de " subsumption " pour comparer l'entrĂ©e-sortie de la requĂȘte et des composants. AprĂšs sĂ©lection des composants adĂ©quats, les propriĂ©tĂ©s non fonctionnelles sont employĂ©es comme un facteur distinctif pour raffiner le rĂ©sultat de publication des composants rĂ©sultats. Nous proposons une approche de dĂ©couverte des composants composite si aucun composant Ă©lĂ©mentaire n'est trouvĂ©, cette approche basĂ©e sur l'ontologie commune. Pour intĂ©grer le composant rĂ©sultat dans le projet en cours de dĂ©veloppement, nous avons dĂ©veloppĂ© l'ontologie d'intĂ©gration et les deux services " input/output convertor " et " output Matching ".Our work aims to develop an effective solution for the discovery and the reuse of software components in existing and commonly used development environments. We propose an ontology for describing and discovering atomic software components. The description covers both the functional and non functional properties which are expressed as QoS parameters. Our search process is based on the function that calculates the semantic distance between the component interface signature and the signature of a given query, thus achieving an appropriate comparison. We also use the notion of "subsumption" to compare the input/output of the query and the components input/output. After selecting the appropriate components, the non-functional properties are used to refine the search result. We propose an approach for discovering composite components if any atomic component is found, this approach based on the shared ontology. To integrate the component results in the project under development, we developed the ontology integration and two services " input/output convertor " and " output Matching "

    Understanding PubMed Search Results using Topic Models and Interactive Information Visualization

    Get PDF
    With data increasing exponentially, extracting and understanding information, themes and relationships from larger collections of documents is becoming more and more important to researchers in many areas. PubMed, which comprises more than 25 million citations, uses Medical Subject Headings (MeSH) to index articles to better facilitate their management, searching and indexing. However, researchers are still challenged to find and then get a meaningful overview of a set of documents in a specific area of interest. This is due in part to several limitations of MeSH terms, including: the need to monitor and expand the vocabulary; the lack of concept coverage for newly developing areas; human inconsistency in assigning codes; and the time required to manually index an exponentially growing corpus. Another reason for this challenge is that neither PubMed itself nor its related Web tools can help users see high level themes and hidden semantic structures in the biomedical literature. Topic models are a class of statistical machine learning algorithms that when given a set of natural language documents, extract the semantic themes (topics) from the set of documents, describe the topics for each document, and the semantic similarity of topics and documents. Researchers have shown that these latent themes can help humans better understand and search documents. Unlike MeSH terms, which are created based on important concepts throughout the literature, topics extracted from a subset of documents are specific to those documents. Thus they can find document-specific themes that may not exist in MeSH terms. Such themes may give a subject area-specific set of themes for browsing search results, and provide a broader overview of the search results. This first part of this dissertation presents the TopicalMeSH representation, which exploits the ‘correspondence’ between topics generated using latent Dirichlet allocation (LDA) and MeSH terms to create new document representations that combine MeSH terms and latent topic vectors. In an evaluation with 15 systematic drug review corpora, TopicalMeSH performed better than MeSH in both document retrieval and classification tasks. The second part of this work introduces the “Hybrid Topic”, an alternative LDA approach that uses a ‘bag-of-MeSH&words’ approach, instead of just ‘bag-of-words’, to test whether the addition of labels (e.g. MeSH descriptors) can improve the quality and facilitate the interpretation of LDA-generated topics. An evaluation of this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus demonstrated that the coherence of ‘hybrid topics’ is higher than that of regular bag-of-words topics in both specialized and general copora. The last part of this dissertation presents a visualization tool based on the ‘hybrid topics’ model that could allow users to interactively use topic models and MeSH terms to efficiently and effectively retrieve relevant information from tons of PubMed search results. A preliminary user study has been conducted with 6 participants. All of them agree that this tool can quickly help them understand PubMed search results and identify target articles

    Unstable and Stable Classifications of Scombroid Fishes

    Get PDF
    Many cladists believe that a classification should strictly reflect a cladistic hypothesis. Consequently, they propose classifications that often differ markedly from existing ones and are potentially unstable due to phylogenetic uncertainty. This is problematic for economically or ecologically important organisms since changing classifications can cause confusion in their management as resources. The classification of the 44 genera of scombroid fishes (the mackerels, tunas, billfishes, and their relatives) illustrates this problem of instability. Previous cladistic analyses and analyses presented in this paper, using different data sets, result in many different cladistic hypotheses. In addition, the inferred cladograms are unstable because of different plausible interpretations of character coding. A slight change in coding of a single character, the presence of splint-like gill rakers, changes cladistic relationships substantially. These many alternative cladistic hypotheses for scombroids can be converted into various cladistic classifications, all of which are substantially different from the classification currently in use. In contrast, a quantitative evolutionary systematic method produces a classification that is unchanged despite variations in the cladistic hypothesis. The evolutionary classification has the advantage of being consistent with the classification currently in use, it summarizes anagenetic information, and it can be considered a new form of cladistic classification since a cladistic hypothesis can-be unequivocally retrieved from an annotated form of the classification

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user
    • 

    corecore