362 research outputs found

    An Interactive Platform for Multilingual Linguistic Resource Enrichment

    Get PDF
    The world is extremely diverse and its diversity is obvious in the cultural differences and the large number of spoken languages being used all over the world. In this sense, we need to collect and organize a huge amount of knowledge obtained from multiple resources differing from one another in many aspects. A possible approach for doing that is to think of designing effective tools for construction and maintenance of linguistic resources based on well-defined knowledge representation methodologies capable of dealing with diversity and the continuous evolvement of human knowledge. In this paper, we present a linguistic resource management platform which allows for knowledge organization in a language-independent manner and provides the appropriate mapping from a language independent concept to one or more language specific lexicalization. The paper explains the knowledge representation methodology used in constructing the platform together with the iterative process followed in designing and implementing the first version of the platform, named UKC-1 and the updated refined version, named UKC-2

    An analysis of word embedding spaces and regularities

    Get PDF
    Word embeddings are widely use in several applications due to their ability to capture semantic relationships between words as relations between vectors in high dimensional spaces. One of the main problems to obtain the information is to deal with the phenomena known as the Curse of Dimensionality, the fact that some intuitive results for well known distances are not valid in high dimensional contexts. In this thesis we explore the problem to distinguish between synonyms or antonyms pairs of words and non-related pairs of words attending just to the distance between the words of the pair. We considerer several norms and explore the problem in the two principal kinds of embeddings, GloVe and Word2Vec

    Sharing Semantic Resources

    Get PDF
    The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, “enabling computers and people to work in cooperation”. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources

    01 Text Processing 1 - Data Mining - Ingegneria e Scienze Informatiche, Cesena

    Get PDF
    dati strutturati, semi-strutturati e destrutturati, information retrieval e text mining, rappresentazione di documenti, modelli di ricerca booleani, il processo di indicizzazione di documenti, tokenizzazione, normalizzazione, lemmatizzazione, algoritmi di stemming, ricerche con indici, altre ottimizzazioni nella ricerc

    SEARCH RESULT SYNONYMY INDEXING FOR SOCIAL NETWORK USING LATENT SEMANTIC ANALYSIS

    Get PDF
    Information retrieval (IR) has fashion the way of people acquiring information in Internet. Among of these are known as a search feature. Although semantic search is increasingly popular, not all the web has the technology to apply in their system mainly because of the various reasons of cost. Social network are known for their abstract and inconsistent of semantic data. This assessment is using term frequency vector of Wikipedia content to gather the most frequently terms within corpus as synonym data. After an overview of traditional search engine mechanism works and how synonym of a word associates meaning, the review solves broader and wider of data retrieval index by collecting same-meaning query in from search-data registry in social network by abstraction using Latent Semantic Analysis Synset. Conceptual relationships of set of query could be specified by taxonomy or it could be less passive inarticulate by statistical related to other words

    Ontologies Supporting Intelligent Agent-Based Assistance

    Get PDF
    Intelligent agent-based assistants are systems that try to simplify peoples work based on computers. Recent research on intelligent assistance has presented significant results in several and different situations. Building such a system is a difficult task that requires expertise in numerous artificial intelligence and engineering disciplines. A key point in this kind of system is knowledge handling. The use of ontologies for representing domain knowledge and for supporting reasoning is becoming wide-spread in many areas, including intelligent assistance. In this paper we present how ontologies can be used to support intelligent assistance in a multi-agent system context. We show how ontologies may be spread over the multi-agent system architecture, highlighting their role controlling user interaction and service description. We present in detail an ontology-based conversational interface for personal assistants, showing how to design an ontology for semantic interpretation and how the interpretation process uses it for semantic analysis. We also present how ontologies are used to describe decentralized services based on a multi-agent architecture

    Summarizing Text Using Lexical Chains

    Get PDF
    The current technology of automatic text summarization imparts an important role in the information retrieval and text classification, and it provides the best solution to the information overload problem. And the text summarization is a process of reducing the size of a text while protecting its information content. When taking into consideration the size and number of documents which are available on the Internet and from the other sources, the requirement for a highly efficient tool on which produces usable summaries is clear. We present a better algorithm using lexical chain computation. The algorithm one which makes lexical chains a computationally feasible for the user. And using these lexical chains the user will generate a summary, which is much more effective compared to the solutions available and also closer to the human generated summary

    SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

    Get PDF
    Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner

    Provenance in Open Data Entity-Centric Aggregation

    Get PDF
    An increasing number of web services these days require combining data from several data providers into an aggregated database. Usually this aggregation is based on the linked data approach. On the other hand, the entity-centric model is a promising data model that outperforms the linked data approach because it solves the lack of explicit semantics and the semantic heterogeneity problems. However, current open data which is available on the web as raw datasets can not be used in the entity-centric model before processing them with an import process to extract the data elements and insert them correctly in the aggregated entity-centric database. It is essential to certify the quality of these imported data elements, especially the background knowledge part which acts as input to semantic computations, because the quality of this part affects directly the quality of the web services which are built on top of it. Furthermore, the aggregation of entities and their attribute values from different sources raises three problems: the need to trace the source of each element, the need to trace the links between entities which can be considered equivalent and the need to handle possible conflicts between different values when they are imported from various data sources. In this thesis, we introduce a new model to certify the quality of a back ground knowledge base which separates linguistic and language independent elements. We also present a pipeline to import entities from open data repositories to add the missing implicit semantics and to eliminate the semantic heterogeneity. Finally, we show how to trace the source of attribute values coming from different data providers; how to choose a strategy for handling possible conflicts between these values; and how to keep the links between identical entities which represent the same real world entity
    • …
    corecore