Search CORE

1,005 research outputs found

DB&IR Integration: Report on the Dagstuhl Seminar ''Ranked XML Querying''

Author: Amer-Yahia S.
Hiemstra Djoerd
Roelleke T.
Srivastava D.
Weikum G.
Publication venue: Dagstuhl
Publication date: 01/01/2008
Field of study

University of Twente Research Information

Towards Efficient Novel Materials Discovery

Author: Lenz-Himmer Maja-Olivia
Publication venue: Humboldt-Universität zu Berlin
Publication date: 25/01/2022
Field of study

Die Entdeckung von neuen Materialien mit speziellen funktionalen Eigenschaften ist eins der wichtigsten Ziele in den Materialwissenschaften. Das Screening des strukturellen und chemischen Phasenraums nach potentiellen neuen Materialkandidaten wird häufig durch den Einsatz von Hochdurchsatzmethoden erleichtert. Schnelle und genaue Berechnungen sind eins der Hauptwerkzeuge solcher Screenings, deren erster Schritt oft Geometrierelaxationen sind. In Teil I dieser Arbeit wird eine neue Methode der eingeschränkten Geometrierelaxation vorgestellt, welche die perfekte Symmetrie des Kristalls erhält, Resourcen spart sowie Relaxationen von metastabilen Phasen und Systemen mit lokalen Symmetrien und Verzerrungen erlaubt. Neben der Verbesserung solcher Berechnungen um den Materialraum schneller zu durchleuchten ist auch eine bessere Nutzung vorhandener Daten ein wichtiger Pfeiler zur Beschleunigung der Entdeckung neuer Materialien. Obwohl schon viele verschiedene Datenbanken für computerbasierte Materialdaten existieren ist die Nutzbarkeit abhängig von der Darstellung dieser Daten. Hier untersuchen wir inwiefern semantische Technologien und Graphdarstellungen die Annotation von Daten verbessern können. Verschiedene Ontologien und Wissensgraphen werden entwickelt anhand derer die semantische Darstellung von Kristallstrukturen, Materialeigenschaften sowie experimentellen Ergebenissen im Gebiet der heterogenen Katalyse ermöglicht werden. Wir diskutieren, wie der Ansatz Ontologien und Wissensgraphen zu separieren, zusammenbricht wenn neues Wissen mit künstlicher Intelligenz involviert ist. Eine Zwischenebene wird als Lösung vorgeschlagen. Die Ontologien bilden das Hintergrundwissen, welches als Grundlage von zukünftigen autonomen Agenten verwendet werden kann. Zusammenfassend ist es noch ein langer Weg bis Materialdaten für Maschinen verständlich gemacht werden können, so das der direkte Nutzen semantischer Technologien nach aktuellem Stand in den Materialwissenschaften sehr limitiert ist.The discovery of novel materials with specific functional properties is one of the highest goals in materials science. Screening the structural and chemical space for potential new material candidates is often facilitated by high-throughput methods. Fast and still precise computations are a main tool for such screenings and often start with a geometry relaxation to find the nearest low-energy configuration relative to the input structure. In part I of this work, a new constrained geometry relaxation is presented which maintains the perfect symmetry of a crystal, saves time and resources as well as enables relaxations of meta-stable phases and systems with local symmetries or distortions. Apart from improving such computations for a quicker screening of the materials space, better usage of existing data is another pillar that can accelerate novel materials discovery. While many different databases exists that make computational results accessible, their usability depends largely on how the data is presented. We here investigate how semantic technologies and graph representations can improve data annotation. A number of different ontologies and knowledge graphs are developed enabling the semantic representation of crystal structures, materials properties as well experimental results in the field of heterogeneous catalysis. We discuss the breakdown of the knowledge-graph approach when knowledge is created using artificial intelligence and propose an intermediate information layer. The underlying ontologies can provide background knowledge for possible autonomous intelligent agents in the future. We conclude that making materials science data understandable to machines is still a long way to go and the usefulness of semantic technologies in the domain of materials science is at the moment very limited

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

MPG.PuRe

Finding Images of Rare and Ambiguous Entities

Author: Kacimi El Hassani M.
Taneva B.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2011
Field of study

MPG.PuRe

Intuitionistic fuzzy XML query matching and rewriting

Author: Alzebdi M.
Alzebdi M.
Publication venue
Publication date: 01/01/2013
Field of study

With the emergence of XML as a standard for data representation, particularly on the web, the need for intelligent query languages that can operate on XML documents with structural heterogeneity has recently gained a lot of popularity. Traditional Information Retrieval and Database approaches have limitations when dealing with such scenarios. Therefore, fuzzy (flexible) approaches have become the predominant. In this thesis, we propose a new approach for approximate XML query matching and rewriting which aims at achieving soft matching of XML queries with XML data sources following different schemas. Unlike traditional querying approaches, which require exact matching, the proposed approach makes use of Intuitionistic Fuzzy Trees to achieve approximate (soft) query matching. Through this new approach, not only the exact answer of a query, but also approximate answers are retrieved. Furthermore, partial results can be obtained from multiple data sources and merged together to produce a single answer to a query. The proposed approach introduced a new tree similarity measure that considers the minimum and maximum degrees of similarity/inclusion of trees that are based on arc matching. New techniques for soft node and arc matching were presented for matching queries against data sources with highly varied structures. A prototype was developed to test the proposed ideas and it proved the ability to achieve approximate matching for pattern queries with a number of XML schemas and rewrite the original query so that it obtain results from the underlying data sources. This has been achieved through several novel algorithms which were tested and proved efficiency and low CPU/Memory cost even for big number of data sources

WestminsterResearch

큰 그래프 상에서의 개인화된 페이지 랭크에 대한 빠른 계산 기법

Author: 박성찬
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 이상구.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.그래프 내에서 개인화된 페이지랭크 (P ersonalized P age R ank, PPR 를 계산하는 것은 검색 , 추천 , 지식발견 등 여러 분야에서 광범위하게 활용되는 중요한 작업 이다 . 개인화된 페이지랭크를 계산하는 것은 고비용의 과정이 필요하므로 , 개인화된 페이지랭크를 계산하는 효율적이고 혁신적인 방법들이 다수 개발되어왔다 . 그러나 수백만 이상의 노드를 가진 대용량 그래프에 대한 효율적인 계산은 여전히 해결되지 않은 문제이다 . 그에 더하여 , 기존 제시된 알고리듬들은 그래프 갱신을 효율적으로 다루지 못하여 동적으로 변화하는 그래프를 다루는 데에 한계점이 크다 . 본 연구에서는 높은 정밀도를 보장하고 정밀도를 통제 가능한 , 빠르게 수렴하는 개인화된 페이지랭크 계산 알고리듬을 제시한다 . 전통적인 거듭제곱법 (Power 에 축차가속완화법 (Successive Over Relaxation) 과 초기 추측 값 보정법 (Initial Guess 을 활용한 벡터 재사용 전략을 적용하여 수렴 속도를 개선하였다 . 제시된 방법은 기존 거듭제곱법의 장점인 단순성과 엄밀성을 유지 하면서 도 수렴율과 계산속도를 크게 개선 한다 . 또한 개인화된 페이지랭크 벡터의 갱신을 위하여 이전에 계산 되어 저장된 벡터를 재사용하 여 , 갱신 에 드는 시간이 크게 단축된다 . 본 방법은 주어진 오차 한계에 도달하는 즉시 결과값을 산출하므로 정확도와 계산시간을 유연하게 조절할 수 있으며 이는 표본 기반 추정방법이나 정확한 값을 산출하는 역행렬 기반 방법 이 가지지 못한 특성이다 . 실험 결과 , 본 방법은 거듭제곱법에 비하여 20 배 이상 빠르게 수렴한다는 것이 확인되었으며 , 기 제시된 최고 성능 의 알고리 듬 보다 우수한 성능을 보이는 것 또한 확인되었다1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

SNU Open Repository and Archive

Un environnement de spécification et de découverte pour la réutilisation des composants logiciels dans le développement des logiciels distribués

Author: Khemakhem Sofien
Publication venue
Publication date: 08/07/2011
Field of study

Notre travail vise à élaborer une solution efficace pour la découverte et la réutilisation des composants logiciels dans les environnements de développement existants et couramment utilisés. Nous proposons une ontologie pour décrire et découvrir des composants logiciels élémentaires. La description couvre à la fois les propriétés fonctionnelles et les propriétés non fonctionnelles des composants logiciels exprimées comme des paramètres de QoS. Notre processus de recherche est basé sur la fonction qui calcule la distance sémantique entre la signature d'un composant et la signature d'une requête donnée, réalisant ainsi une comparaison judicieuse. Nous employons également la notion de " subsumption " pour comparer l'entrée-sortie de la requête et des composants. Après sélection des composants adéquats, les propriétés non fonctionnelles sont employées comme un facteur distinctif pour raffiner le résultat de publication des composants résultats. Nous proposons une approche de découverte des composants composite si aucun composant élémentaire n'est trouvé, cette approche basée sur l'ontologie commune. Pour intégrer le composant résultat dans le projet en cours de développement, nous avons développé l'ontologie d'intégration et les deux services " input/output convertor " et " output Matching ".Our work aims to develop an effective solution for the discovery and the reuse of software components in existing and commonly used development environments. We propose an ontology for describing and discovering atomic software components. The description covers both the functional and non functional properties which are expressed as QoS parameters. Our search process is based on the function that calculates the semantic distance between the component interface signature and the signature of a given query, thus achieving an appropriate comparison. We also use the notion of "subsumption" to compare the input/output of the query and the components input/output. After selecting the appropriate components, the non-functional properties are used to refine the search result. We propose an approach for discovering composite components if any atomic component is found, this approach based on the shared ontology. To integrate the component results in the project under development, we developed the ontology integration and two services " input/output convertor " and " output Matching "

Thèses en Ligne

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Improving an Open Source Geocoding Server

Author: Garcia Paje Victor
Publication venue: Lunds universitet/Institutionen för elektro- och informationsteknik
Publication date: 01/01/2015
Field of study

A common problem in geocoding is that the postal addresses as requested by the user differ from the addresses as described in the database. The online, open source geocoder called Nominatim is one of the most used geocoders nowadays. However, this geocoder lacks the interactivity that most of the online geocoders already offer. The Nominatim geocoder provides no feedback to the user while typing addresses. Also, the geocoder cannot deal with any misspelling errors introduced by the user in the requested address. This thesis is about extending the functionality of the Nominatim geocoder to provide fuzzy search and autocomplete features. In this work I propose a new index and search strategy for the OpenStreetMap reference dataset. Also, I extend the search algorithm to geocode new address types such as street intersections. Both the original Nominatim geocoder and the proposed solution are compared using metrics such as the precision of the results, match rate and keystrokes saved by the autocomplete feature. The test addresses used in this work are a subset selected among the Swedish addresses available in the OpenStreetMap data set. The results show that the proposed geocoder performs better when compared to the original Nominatim geocoder. In the proposed geocoder, the users get address suggestions as they type, adding interactivity to the original geocoder. Also, the proposed geocoder is able to find the right address in the presence of errors in the user query with a match rate of 98%.The demand of geospatial information is increasing during the last years. There are more and more mobile applications and services that require from the users to enter some information about where they are, or the address of the place they want to find for example. The systems that convert postal addresses or place descriptions into coordinates are called geocoders. How good or bad a geocoder is not only depends on the information the geocoder contains, but also on how easy is for the users to find the desired addresses. There are many well-known web sites that we use in our everyday life to find the location of an address. For example sites like Google Maps, Bing Maps or Yahoo Maps are accessed by millions of users every day to use such services. Among the main features of the mentioned geocoders are the ability to predict the address the user is writing in the search box, and sometimes even to correct any misspellings introduced by the user. To make it more complicated, the predictions and error corrections these systems perform are done in real time. The owners of these address search engines usually impose some restrictions on the number of addresses a user is allowed to search monthly, above which the user needs to pay a fee in order to keep using the system. This limit is usually high enough for the end user, but it might not be enough for the software developers that want to use geospatial data in their products. There is a free alternative to the address search engines mentioned above called Nominatim. Nominatim is an open source project whose purpose is to search addresses among the OpenStreetMap dataset. OpenStreetMap is a collaborative project that tries to map places in the real world into coordinates. The main drawback of Nominatim is that the usability is not as good as the competitors. Nominatim is unable to find addresses that are not correctly spelled, neither predicts the user needs. In order for this address search engine to be among the most used the prediction and error correction features need to be added. In this thesis work I extend the search algorithms of Nominatim to add the functionality mentioned above. The address search engine proposed in this thesis offers a free and open source alternative to users and systems that require access to geospatial data without restrictions