1,167 research outputs found

    Semantically intelligent semi-automated ontology integration

    Get PDF
    An ontology is a way of information categorization and storage. Web Ontologies provide help in retrieving the required and precise information over the web. However, the problem of heterogeneity between ontologies may occur in the use of multiple ontologies of the same domain. The integration of ontologies provides a solution for the heterogeneity problem. Ontology integration is a solution to problem of interoperability in the knowledge based systems. Ontology integration provides a mechanism to find the semantic association between a pair of reference ontologies based on their concepts. Many researchers have been working on the problem of ontology integration; however, multiple issues related to ontology integration are still not addressed. This dissertation involves the investigation of the ontology integration problem and proposes a layer based enhanced framework as a solution to the problem. The comparison between concepts of reference ontologies is based on their semantics along with their syntax in the concept matching process of ontology integration. The semantic relationship of a concept with other concepts between ontologies and the provision of user confirmation (only for the problematic cases) are also taken into account in this process. The proposed framework is implemented and validated by providing a comparison of the proposed concept matching technique with the existing techniques. The test case scenarios are provided in order to compare and analyse the proposed framework in the analysis phase. The results of the experiments completed demonstrate the efficacy and success of the proposed framework

    Open source software ecosystems quality analysis from data sources

    Get PDF
    Background: Open source software (OSS) and software ecosystems (SECOs) are two consolidated research areas in software engineering. The adoption of OSS by firms, governments, researchers and practitioners has been increasing rapidly in the last decades, and in consequence, they find themselves in a new kind of ecosystem composed by software communities,foundations, developers and partners, namely Open Source Software Ecosystem (OSSECO). In order to perform a systematic quality evaluation of a SECO, it is necessary to define certain types of concrete elements. This means that measures and evaluations should be described (e.g., through thresholds or expert judgment). The quality evaluation of an OSSECO may serve several purposes, for example: adopters of the products of the OSSECO may want to know about the liveliness of the OSSECO (e.g., recent updates); software developers may want to know about the activeness (e.g., how many collaborators are involved and how active they are); and the OSSECO community itself to know about the OSSECO health (e.g., evolving in the right direction). However, the current approaches for evaluating software quality (even those specific for open source software) do not cover all the aspects relevant in an OSSECO from an ecosystem perspective. Goal: The main goal of this PhD thesis is to support the OSSECO quality evaluation by designing a framework that supports the quality evaluation of OSSECOs. Methods: To accomplish this goal, we have used and approach based on design science methodology by Wieringa [1] and the characterization of software engineering proposed by M. Shaw [2], in order to produce a set of artefacts to contribute in thequality evaluation of OSSECOs and to learn about the effects of using these artefacts in practice. Results: We have conducted a systematic mapping to characterize OSSECOs and designed the QuESo framework (a framework to evaluate the OSSECO quality) composed by three artifacts: (i) QuESo-model, a quality model for OSSECOs; (ii) QuESoprocess, a process for conducting OSSECO quality evaluations using the QuESo-model; and (iii) QuESo-tool, a software component to support semi-automatic quality evaluation of OSSECOs. Furthermore, this framework has been validated with a case study on Eclipse. Conclusions: This thesis has contributed to increase the knowledge and understanding of OSSECOs, and to support the qualityevaluation of OSSECOs. [ntecedentes: el software de código abierto (OSS, por sus siglas en inglés) y los ecosistemas de software (SECOs, por sus siglas en inglés) son dos áreas de investigación consolidadas en ingeniería de software. La adopción de OSS por parte de empresas, gobiernos, investigadores y profesionales se ha incrementado rápidamente en las últimas décadas, y, en consecuencia, todos ellos hacen parte de un nuevo tipo de ecosistema formado por comunidades de software, fundaciones, desarrolladores y socios denominado ecosistema de software de código abierto. (OSSECO, por sus siglas en inglés)). Para realizar una evaluación sistemática de la calidad de un SECO, es necesario definir ciertos tipos de elementos concretos. Esto significa que tanto las métricas como las evaluaciones deben ser descritos (por ejemplo, a través de datos históricos o el conocimiento de expertos). La evaluación de la calidad de un OSSECO puede ser de utilidad desde diferentes perspectivas, por ejemplo: los que adoptan los productos del OSSECO pueden querer conocer la vitalidad del OSSECO (por ejemplo, el número de actualizaciones recientes); los desarrolladores de software pueden querer saber sobre la actividad del OSSECO (por ejemplo, cuántos colaboradores están involucrados y qué tan activos son); incluso la propia comunidad del OSSECO para conocer el estado de salud del OSSECO (por ejemplo, si está evolucionando en la dirección correcta). Sin embargo, los enfoques actuales para evaluar la calidad del software (incluso aquellos específicos para el software de código abierto) no cubren todos los aspectos relevantes en un OSSECO desde una perspectiva ecosistémica. Objetivo: El objetivo principal de esta tesis doctoral es apoyar la evaluación de la calidad de OSSECO mediante el diseño de un marco de trabajo que ayude a la evaluación de la calidad de un OSSECO. Métodos: Para lograr este objetivo, hemos utilizado un enfoque basado en la metodología design science propuesta por Wieringa [1]. Adicionalmente, nos hemos basado en la caracterización de la ingeniería de software propuesta por M. Shaw [2], con el fin de construir un conjunto de artefactos que contribuyan en la evaluación de la calidad de un OSSECO y para conocer los efectos del uso de estos artefactos en la práctica. Resultados: Hemos realizado un mapeo sistemático para caracterizar los OSSECOs y hemos diseñado el marco de trabajo denominado QuESo (es un marco de trabajo para evaluar la calidad de los OSSECOs). QuESo a su vez está compuesto por tres artefactos: (i) QuESo-model, un modelo de calidad para OSSECOs; (ii) QuESo-process, un proceso para llevar a cabo las evaluaciones de calidad de OSSECOs utilizando el modelo QuESo; y (iii) QuESo-tool, un conjunto de componentes de software que apoyan la evaluación de calidad de los OSSECOs de manera semiautomática. QuESo ha sido validado con un estudio de caso sobre Eclipse. Conclusiones: esta tesis ha contribuido a aumentar el conocimiento y la comprensión de los OSSECOs, y tambien ha apoyado la evaluación de la calidad de los OSSECO

    Open source software ecosystems quality analysis from data sources

    Get PDF
    Background: Open source software (OSS) and software ecosystems (SECOs) are two consolidated research areas in software engineering. The adoption of OSS by firms, governments, researchers and practitioners has been increasing rapidly in the last decades, and in consequence, they find themselves in a new kind of ecosystem composed by software communities,foundations, developers and partners, namely Open Source Software Ecosystem (OSSECO). In order to perform a systematic quality evaluation of a SECO, it is necessary to define certain types of concrete elements. This means that measures and evaluations should be described (e.g., through thresholds or expert judgment). The quality evaluation of an OSSECO may serve several purposes, for example: adopters of the products of the OSSECO may want to know about the liveliness of the OSSECO (e.g., recent updates); software developers may want to know about the activeness (e.g., how many collaborators are involved and how active they are); and the OSSECO community itself to know about the OSSECO health (e.g., evolving in the right direction). However, the current approaches for evaluating software quality (even those specific for open source software) do not cover all the aspects relevant in an OSSECO from an ecosystem perspective. Goal: The main goal of this PhD thesis is to support the OSSECO quality evaluation by designing a framework that supports the quality evaluation of OSSECOs. Methods: To accomplish this goal, we have used and approach based on design science methodology by Wieringa [1] and the characterization of software engineering proposed by M. Shaw [2], in order to produce a set of artefacts to contribute in thequality evaluation of OSSECOs and to learn about the effects of using these artefacts in practice. Results: We have conducted a systematic mapping to characterize OSSECOs and designed the QuESo framework (a framework to evaluate the OSSECO quality) composed by three artifacts: (i) QuESo-model, a quality model for OSSECOs; (ii) QuESoprocess, a process for conducting OSSECO quality evaluations using the QuESo-model; and (iii) QuESo-tool, a software component to support semi-automatic quality evaluation of OSSECOs. Furthermore, this framework has been validated with a case study on Eclipse. Conclusions: This thesis has contributed to increase the knowledge and understanding of OSSECOs, and to support the qualityevaluation of OSSECOs. [ntecedentes: el software de código abierto (OSS, por sus siglas en inglés) y los ecosistemas de software (SECOs, por sus siglas en inglés) son dos áreas de investigación consolidadas en ingeniería de software. La adopción de OSS por parte de empresas, gobiernos, investigadores y profesionales se ha incrementado rápidamente en las últimas décadas, y, en consecuencia, todos ellos hacen parte de un nuevo tipo de ecosistema formado por comunidades de software, fundaciones, desarrolladores y socios denominado ecosistema de software de código abierto. (OSSECO, por sus siglas en inglés)). Para realizar una evaluación sistemática de la calidad de un SECO, es necesario definir ciertos tipos de elementos concretos. Esto significa que tanto las métricas como las evaluaciones deben ser descritos (por ejemplo, a través de datos históricos o el conocimiento de expertos). La evaluación de la calidad de un OSSECO puede ser de utilidad desde diferentes perspectivas, por ejemplo: los que adoptan los productos del OSSECO pueden querer conocer la vitalidad del OSSECO (por ejemplo, el número de actualizaciones recientes); los desarrolladores de software pueden querer saber sobre la actividad del OSSECO (por ejemplo, cuántos colaboradores están involucrados y qué tan activos son); incluso la propia comunidad del OSSECO para conocer el estado de salud del OSSECO (por ejemplo, si está evolucionando en la dirección correcta). Sin embargo, los enfoques actuales para evaluar la calidad del software (incluso aquellos específicos para el software de código abierto) no cubren todos los aspectos relevantes en un OSSECO desde una perspectiva ecosistémica. Objetivo: El objetivo principal de esta tesis doctoral es apoyar la evaluación de la calidad de OSSECO mediante el diseño de un marco de trabajo que ayude a la evaluación de la calidad de un OSSECO. Métodos: Para lograr este objetivo, hemos utilizado un enfoque basado en la metodología design science propuesta por Wieringa [1]. Adicionalmente, nos hemos basado en la caracterización de la ingeniería de software propuesta por M. Shaw [2], con el fin de construir un conjunto de artefactos que contribuyan en la evaluación de la calidad de un OSSECO y para conocer los efectos del uso de estos artefactos en la práctica. Resultados: Hemos realizado un mapeo sistemático para caracterizar los OSSECOs y hemos diseñado el marco de trabajo denominado QuESo (es un marco de trabajo para evaluar la calidad de los OSSECOs). QuESo a su vez está compuesto por tres artefactos: (i) QuESo-model, un modelo de calidad para OSSECOs; (ii) QuESo-process, un proceso para llevar a cabo las evaluaciones de calidad de OSSECOs utilizando el modelo QuESo; y (iii) QuESo-tool, un conjunto de componentes de software que apoyan la evaluación de calidad de los OSSECOs de manera semiautomática. QuESo ha sido validado con un estudio de caso sobre Eclipse. Conclusiones: esta tesis ha contribuido a aumentar el conocimiento y la comprensión de los OSSECOs, y tambien ha apoyado la evaluación de la calidad de los OSSECOsPostprint (published version

    Open source software ecosystems : a systematic mapping

    Get PDF
    Context: Open source software (OSS) and software ecosystems (SECOs) are two consolidated research areas in software engineering. OSS influences the way organizations develop, acquire, use and commercialize software. SECOs have emerged as a paradigm to understand dynamics and heterogeneity in collaborative software development. For this reason, SECOs appear as a valid instrument to analyze OSS systems. However, there are few studies that blend both topics together. Objective: The purpose of this study is to evaluate the current state of the art in OSS ecosystems (OSSECOs) research, specifically: (a) what the most relevant definitions related to OSSECOs are; (b) what the particularities of this type of SECO are; and (c) how the knowledge about OSSECO is represented. Method: We conducted a systematic mapping following recommended practices. We applied automatic and manual searches on different sources and used a rigorous method to elicit the keywords from the research questions and selection criteria to retrieve the final papers. As a result, 82 papers were selected and evaluated. Threats to validity were identified and mitigated whenever possible. Results: The analysis allowed us to answer the research questions. Most notably, we did the following: (a) identified 64 terms related to the OSSECO and arranged them into a taxonomy; (b) built a genealogical tree to understand the genesis of the OSSECO term from related definitions; (c) analyzed the available definitions of SECO in the context of OSS; and (d) classified the existing modelling and analysis techniques of OSSECOs. Conclusion: As a summary of the systematic mapping, we conclude that existing research on several topics related to OSSECOs is still scarce (e.g., modelling and analysis techniques, quality models, standard definitions, etc.). This situation calls for further investigation efforts on how organizations and OSS communities actually understand OSSECOs.Peer ReviewedPostprint (author's final draft

    Shangri-La: a medical case-based retrieval tool

    Get PDF
    Large amounts of medical visual data are produced in hospitals daily and made available continuously via publications in the scientific literature, representing the medical knowledge. However, it is not always easy to find the desired information and in clinical routine the time to fulfil an information need is often very limited. Information retrieval systems are a useful tool to provide access to these documents/images in the biomedical literature related to information needs of medical professionals. Shangri–La is a medical retrieval system that can potentially help clinicians to make decisions on difficult cases. It retrieves articles from the biomedical literature when querying a case description and attached images. The system is based on a multimodal retrieval approach with a focus on the integration of visual information connected to text. The approach includes a query–adaptive multimodal fusion criterion that analyses if visual features are suitable to be fused with text for the retrieval. Furthermore, image modality information is integrated in the retrieval step. The approach is evaluated using the ImageCLEFmed 2013 medical retrieval benchmark and can thus be compared to other approaches. Results show that the final approach outperforms the best multimodal approach submitted to ImageCLEFmed 2013

    A Semantic neighborhood approach to relatedness evaluation on well-founded domain ontologies

    Get PDF
    In the context of natural language processing and information retrieval, ontologies can improve the results of the word sense disambiguation (WSD) techniques. By making explicit the semantics of the term, ontology-based semantic measures play a crucial role in determining how different ontology classes have a similar or related meaning. In this context, it is common to use semantic similarity as a basis for WSD. However, the measures generally consider only taxonomic relationships, which negatively affect the discrimination of two ontology classes that are related by the other relationship types. On the other hand, semantic relatedness measures consider diverse types of relationships to determine how much two classes on the ontology are related. However, these measures, especially the path-based approaches, have as the main drawback a high computational complexity to calculate the relatedness value. Also, for both types of semantic measures, it is unpractical to store all similarity or relatedness values between all ontology classes in memory, especially for ontologies with a large number of classes. In this work, we propose a novel approach based on semantic neighbors that aim to improve the performance of the knowledge-based measures in relatedness analysis. We also explain how to use this proposal into the path and feature-based measures. We evaluate our proposal on WSD using an existent domain ontology for a well-core description. This ontology contains 929 classes related to rock facies. Also, we use a set of sentences from four different corpora on the Oil&Gas domain. In the experiments, we compare our proposal with state-of-the-art semantic relatedness measures, such as path-based, feature-based, information content, and hybrid methods regarding the F-score, evaluation time, and memory consumption. The experimental results show that the proposed method obtains F-score gains in WSD, as well as a low evaluation time and memory consumption concerning the traditional knowledge-based measures.No contexto do processamento de linguagem natural e recuperação de informações, as ontologias podem melhorar os resultados das técnicas de desambiguação. Ao tornar explícita a semântica do termo, as medidas semânticas baseadas em ontologia desempenham um papel crucial para determinar como diferentes classes de ontologia têm um significado semelhante ou relacionado. Nesse contexto, é comum usar similaridade semântica como base para a desembiguação. No entanto, as medidas geralmente consideram apenas relações taxonômicas, o que afeta negativamente a discriminação de duas classes de ontologia relacionadas por outros tipos de relações. Por outro lado, as medidas de relacionamento semântico consideram diversos tipos de relacionamentos ontológicos para determinar o quanto duas classes estão relacionadas. No entanto, essas medidas, especialmente as abordagens baseadas em caminhos, têm como principal desvantagem uma alta complexidade computacional para sua execução. Além disso, tende a ser impraticável armazenar na memória todos os valores de similaridade ou relacionamento entre todas as classes de uma ontologia, especialmente para ontologias com um grande número de classes. Neste trabalho, propomos uma nova abordagem baseada em vizinhos semânticos que visa melhorar o desempenho das medidas baseadas em conhecimento na análise de relacionamento. Também explicamos como usar esta proposta em medidas baseadas em caminhos e características. Avaliamos nossa proposta na desambiguação utilizando uma ontologia de domínio preexistente para descrição de testemunhos. Esta ontologia contém 929 classes relacionadas a fácies de rocha. Além disso, usamos um conjunto de sentenças de quatro corpora diferentes no domínio Petróleo e Gás. Em nossos experimentos, comparamos nossa proposta com medidas de relacionamento semântico do estado-daarte, como métodos baseados em caminhos, características, conteúdo de informação, e métodos híbridos em relação ao F-score, tempo de avaliação e consumo de memória. Os resultados experimentais mostram que o método proposto obtém ganhos de F-score na desambiguação, além de um baixo tempo de avaliação e consumo de memória em relação às medidas tradicionais baseadas em conhecimento

    Process Productivity Improvements through Semantic and Linked Data Technologies

    Get PDF
    Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José María Álvarez Rodríguez.- Secretario: Rafael Valencia García.- Vocal: Alejandro Rodríguez Gonzále

    A semantic metadata enrichment software ecosystem (SMESE) : its prototypes for digital libraries, metadata enrichments and assisted literature reviews

    Get PDF
    Contribution 1: Initial design of a semantic metadata enrichment ecosystem (SMESE) for Digital Libraries The Semantic Metadata Enrichments Software Ecosystem (SMESE V1) for Digital Libraries (DLs) proposed in this paper implements a Software Product Line Engineering (SPLE) process using a metadata-based software architecture approach. It integrates a components-based ecosystem, including metadata harvesting, text and data mining and machine learning models. SMESE V1 is based on a generic model for standardizing meta-entity metadata and a mapping ontology to support the harvesting of various types of documents and their metadata from the web, databases and linked open data. SMESE V1 supports a dynamic metadata-based configuration model using multiple thesauri. The proposed model defines rules-based crosswalks that create pathways to different sources of data and metadata. Each pathway checks the metadata source structure and performs data and metadata harvesting. SMESE V1 proposes a metadata model in six categories of metadata instead of the four currently proposed in the literature for DLs; this makes it possible to describe content by defined entity, thus increasing usability. In addition, to tackle the issue of varying degrees of depth, the proposed metadata model describes the most elementary aspects of a harvested entity. A mapping ontology model has been prototyped in SMESE V1 to identify specific text segments based on thesauri in order to enrich content metadata with topics and emotions; this mapping ontology also allows interoperability between existing metadata models. Contribution 2: Metadata enrichments ecosystem based on topics and interests The second contribution extends the original SMESE V1 proposed in Contribution 1. Contribution 2 proposes a set of topic- and interest-based content semantic enrichments. The improved prototype, SMESE V3 (see following figure), uses text analysis approaches for sentiment and emotion detection and provides machine learning models to create a semantically enriched repository, thus enabling topic- and interest-based search and discovery. SMESE V3 has been designed to find short descriptions in terms of topics, sentiments and emotions. It allows efficient processing of large collections while keeping the semantic and statistical relationships that are useful for tasks such as: 1. topic detection, 2. contents classification, 3. novelty detection, 4. text summarization, 5. similarity detection. Contribution 3: Metadata-based scientific assisted literature review The third contribution proposes an assisted literature review (ALR) prototype, STELLAR V1 (Semantic Topics Ecosystem Learning-based Literature Assisted Review), based on machine learning models and a semantic metadata ecosystem. Its purpose is to identify, rank and recommend relevant papers for a literature review (LR). This third prototype can assist researchers, in an iterative process, in finding, evaluating and annotating relevant papers harvested from different sources and input into the SMESE V3 platform, available at any time. The key elements and concepts of this prototype are: 1. text and data mining, 2. machine learning models, 3. classification models, 4. researchers annotations, 5. semantically enriched metadata. STELLAR V1 helps the researcher to build a list of relevant papers according to a selection of metadata related to the subject of the ALR. The following figure presents the model, the related machine learning models and the metadata ecosystem used to assist the researcher in the task of producing an ALR on a specific topic

    Enriching Affect Analysis Through Emotion and Sarcasm Detection

    Get PDF
    Affect detection from text is the task of detecting affective states such as sentiment, mood and emotions from natural language text including news comments, product reviews, discussion posts, tweets and so on. Broadly speaking, affect detection includes the related tasks of sentiment analysis, emotion detection and sarcasm detection, amongst others. In this dissertation, we seek to enrich textual affect analysis from two perspectives: emotion and sarcasm. Emotion detection entails classifying the text into fine-grained categories of emotions such as happiness, sadness, surprise, and so on, whereas sarcasm detection seeks to identify the presence or absence of sarcasm in text. The task of emotion detection is particularly challenging due to limited number of resources and as it involves a greater number of categories of emotions in which to undertake classification, with no fixed number or types of emotions. Similarly, the recently proposed task of sarcasm detection is complicated due to the inherent sophisticated nature of sarcasm, where one typically says or writes the opposite of what they mean. This dissertation consists of five contributions. First, we address word-emotion association, a fundamental building block of most, if not all, emotion detection systems. Current approaches to emotion detection rely on a handful of manually annotated resources such as lexicons and datasets for deriving word-emotion association. Instead, we propose novel models for augmenting word-emotion association to support unsupervised learning which does not require labeled training data and can be extended to flexible taxonomies of emotions. Second, we study the problem of affective word representations, where affectively similar words are projected into neighboring regions of an n-dimensional embedding space. While existing techniques usually consider the lexical semantics and syntax of co-occurring words, thus rating emotionally dissimilar words occurring in similar contexts as highly similar, we integrate a rich spectrum of emotions into representation learning in order to cluster emotionally similar words closer, and emotionally dissimilar words farther from each other. The generated emotion-enriched word representations are found to be better at capturing relevant features useful for sentence-level emotion classification and emotion similarity tasks. Third, we investigate the problem of computational sarcasm detection. Generally, sarcasm detection is treated as a linguistic and lexical phenomena with limited emphasis on the emotional aspects of sarcasm. In order to address this gap, we propose novel models of enriching sarcasm detection by incorporating affective knowledge. In particular, document-level features obtained from affective word representations are utilized in designing classification systems. Through extensive evaluation on six datasets from three diverse domains of text, we demonstrate the potential of exploiting automatically induced features without the need for considerable manual feature engineering. Motivated by the importance of affective knowledge in detecting sarcasm, the fourth contribution of this thesis seeks to dig deeper and study the role of transitions and relationships between different emotions in order to discover which emotions serve as more informative and discriminative features for distinguishing sarcastic utterances in text. Lastly, we show the usefulness of our proposed affective models by applying them in a non-affective framework of predicting the helpfulness of online reviews
    corecore