8,939 research outputs found

    Magpie: towards a semantic web browser

    Get PDF
    Web browsing involves two tasks: finding the right web page and then making sense of its content. So far, research has focused on supporting the task of finding web resources through ‘standard’ information retrieval mechanisms, or semantics-enhanced search. Much less attention has been paid to the second problem. In this paper we describe Magpie, a tool which supports the interpretation of web pages. Magpie offers complementary knowledge sources, which a reader can call upon to quickly gain access to any background knowledge relevant to a web resource. Magpie automatically associates an ontologybased semantic layer to web resources, allowing relevant services to be invoked within a standard web browser. Hence, Magpie may be seen as a step towards a semantic web browser. The functionality of Magpie is illustrated using examples of how it has been integrated with our lab’s web resources

    Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

    Get PDF
    The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

    Normalizing Resource Identifiers using Lexicons in the Global Change Information System: Linking Earth Science Identifiers, Concepts, and Communities

    Get PDF
    Earth Science informatics involves collaboration between multiple groups of people with diverse specializations and goals,often using variations in terminology to refer to common resources. The uniformity of the resource identifiers often does not cross organizational boundaries. Because of this, permanent, widely used, unambiguous identifiers for resources are elusive. We examine real world cases of changing and inconsistent identifiers which inherently work against persistence and uniformity. We also present a solution which mediates factors in these situations; namely the creation of lexicons:mappings of sets of terms to URIs which are curated within the Global Change Information System (GCIS). We discuss aspects of the GCIS which facilitate the use of lexicons: an information model which disambiguates resources, a RESTful API which provides metadata through content-negotiation, and a strategy for long term curation of URIs, including mechanisms for handling changes to URIs and variations in terms used by different communities while providing persistent URIs and preserving relationships between resources We provide working definitions of terms,contexts, and lexicons, and relate them to the practical challenges of disambiguation and curation. We also discuss the mechanisms employed and architecture of the GCIS, and how these choices facilitate representation of persistent identifiers and mappings of them to identifiers used colloquially within various earth science communities of practice

    Sentiment analysis in context: Investigating the use of BERT and other techniques for ChatBot improvement

    Get PDF
    openIn an increasingly digitized world, where large amounts of data are generated daily, its efficient analysis has become more and more stringent. Natural Language Processing (NLP) offers a solution by exploiting the power of artificial intelligence to process texts, to understand their content and to perform specific tasks. The thesis is based on an internship at Pat Srl, a company devoted to create solutions to support digital innovation, process automation, and service quality with the ultimate goal of improving leadership and customer satisfaction. The primary objective of this thesis is to develop a sentiment analysis model in order to improve the customer experience for clients using the ChatBot system created by the company itself. This task has gained significant attention in recent years as it can be applied to different fields, including social media monitoring, market research, brand monitoring or customer experience and feedback analysis. Following a careful analysis of the available data, a comprehensive evaluation of various models was conducted. Notably, BERT, a large language model that has provided promising results in several NLP tasks, emerged among all. Different approaches utilizing the BERT models were explored, such as the fine-tuning modality or the architectural structure. Moreover, some preprocessing steps of the data were emphasized and studied, due to the particular nature of the sentiment analysis task. During the course of the internship, the dataset underwent revisions aimed to mitigate the problem of inaccurate predictions. Additionally, techniques for data balancing were tested and evaluated, enhancing the overall quality of the analysis. Another important aspect of this project involved the deployment of the model. In a business environment, it is essential to carefully consider and balance resources before transitioning to production. The model distribution was carried out using specific tools, such as Docker and Kubernetes. These specialized technologies played a pivotal role in ensuring efficient and seamless deployment.In an increasingly digitized world, where large amounts of data are generated daily, its efficient analysis has become more and more stringent. Natural Language Processing (NLP) offers a solution by exploiting the power of artificial intelligence to process texts, to understand their content and to perform specific tasks. The thesis is based on an internship at Pat Srl, a company devoted to create solutions to support digital innovation, process automation, and service quality with the ultimate goal of improving leadership and customer satisfaction. The primary objective of this thesis is to develop a sentiment analysis model in order to improve the customer experience for clients using the ChatBot system created by the company itself. This task has gained significant attention in recent years as it can be applied to different fields, including social media monitoring, market research, brand monitoring or customer experience and feedback analysis. Following a careful analysis of the available data, a comprehensive evaluation of various models was conducted. Notably, BERT, a large language model that has provided promising results in several NLP tasks, emerged among all. Different approaches utilizing the BERT models were explored, such as the fine-tuning modality or the architectural structure. Moreover, some preprocessing steps of the data were emphasized and studied, due to the particular nature of the sentiment analysis task. During the course of the internship, the dataset underwent revisions aimed to mitigate the problem of inaccurate predictions. Additionally, techniques for data balancing were tested and evaluated, enhancing the overall quality of the analysis. Another important aspect of this project involved the deployment of the model. In a business environment, it is essential to carefully consider and balance resources before transitioning to production. The model distribution was carried out using specific tools, such as Docker and Kubernetes. These specialized technologies played a pivotal role in ensuring efficient and seamless deployment

    Natural language interfaces to relational databases

    Get PDF
    MĂĄster Universitario en LĂłgica, ComputaciĂłn e Inteligencia Artificia

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Bits and bytes of financial regulation

    Get PDF
    Since 2008, banks have spent more than €342 billion on settlements, enforcement actions, and fines, and until 2020 according to Reuters this value is expected to rise to €400 billion. As a result, technological solutions were implemented to help Financial Institutions deal with the increasing compliance burden and regulators addressing the constant difficulties of enforcing and monitoring regulatory requirements to limit risks and promote financial stability. This led to the emergence of a whole new movement in the Financial Industry - Regulatory Technology. In this dissertation, the aim is to analyze how technology can help Financial Institutions deal with risky behavior and regulatory demands in the most efficient and cost-effective way and to show how extremely complex this process can be, by following the deployment of an electronic communications surveillance tool within a top-tier firm. Electronic communications are crucial parts of investigations such as the subprime mortgage crisis, the London Interbank Offered Rate and the currency market manipulation scandals or the COMEX gold and silver futures markets spoofing scandal. To appropriately address the nature of these threats, holistic risk assessment tools that gather these records (e-mail, chat, voice, trade logs, etc.), discover correlations and provide a credible output that necessitates supervisory review are of extreme importance. The challenge for Front-Office Supervisors is finding the proverbial “needle in a haystack” – the combination of Email, Chats, transactions records, voice logs, and other reports – that should be flagged for suspicious activity and reviewed in conjunction with Compliance and Anti-Fraud teams.Desde 2008, os bancos jĂĄ gastaram mais de €342 biliĂ”es em acordos, açÔes de fiscalização e multas, e atĂ© 2020 segundo a Reuters, este valor deverĂĄ subir para €400 biliĂ”es. Como resultado, foram implementadas soluçÔes tecnolĂłgicas para ajudar as InstituiçÔes Financeiras na superação do aumento exponencial de requisitos regulatĂłrios, e para fortalecer a capacidade de resposta dos reguladores face Ă s constantes dificuldades de impor e monitorizar esses mesmos requisitos com o objetivo de limitar os riscos incorridos e por sua vez promover a estabilidade financeira. O que levou ao aparecimento de um novo movimento na IndĂșstria Financeira – Regulatory Technology (Regtech). Nesta dissertação, o objetivo Ă© analisar como a tecnologia pode ajudar as InstituiçÔes Financeiras a lidar com comportamentos indevidos e requisitos regulatĂłrios da forma mais eficiente e rentĂĄvel e mostrar quĂŁo extremamente complexo este processo pode ser, ao seguir de perto a implementação de uma ferramenta de vigilĂąncia de comunicaçÔes eletrĂłnicas dentro de uma grande Instituição Financeira. As comunicaçÔes eletrĂłnicas sĂŁo partes cruciais de investigaçÔes de escĂąndalos financeiros, como observado na crise do suprime, na manipulação da London Interbank Offered Rate, do mercado monetĂĄrio e dos mercados de futuros do ouro e prata na COMEX. Para lidar adequadamente com a natureza destas ameaças, ferramentas holĂ­sticas de supervisĂŁo reĂșnem registos (e-mail, conversas, voz, registos de transaçÔes etc.), descobrem correlaçÔes e fornecem um importante e credĂ­vel resultado que por sua vez requer revisĂŁo por parte dos supervisores. O desafio para os supervisores do Front-Office Ă© encontrar a proverbial "agulha no palheiro" - a combinação de e-mails, conversas, transaçÔes, registos de voz e outros relatĂłrios - que deve ser sinalizada como atividade suspeita e analisada em conjunto com as equipas de Compliance e Anti-fraude
    • 

    corecore