452 research outputs found

    Building a semantic search engine with games and crowdsourcing

    Get PDF
    Semantic search engines aim at improving conventional search with semantic information, or meta-data, on the data searched for and/or on the searchers. So far, approaches to semantic search exploit characteristics of the searchers like age, education, or spoken language for selecting and/or ranking search results. Such data allow to build up a semantic search engine as an extension of a conventional search engine. The crawlers of well established search engines like Google, Yahoo! or Bing can index documents but, so far, their capabilities to recognize the intentions of searchers are still rather limited. Indeed, taking into account characteristics of the searchers considerably extend both, the quantity of data to analyse and the dimensionality of the search problem. Well established search engines therefore still focus on general search, that is, "search for all", not on specialized search, that is, "search for a few". This thesis reports on techniques that have been adapted or conceived, deployed, and tested for building a semantic search engine for the very specific context of artworks. In contrast to, for example, the interpretation of X-ray images, the interpretation of artworks is far from being fully automatable. Therefore artwork interpretation has been based on Human Computation, that is, a software-based gathering of contributions by many humans. The approach reported about in this thesis first relies on so called Games With A Purpose, or GWAPs, for this gathering: Casual games provide an incentive for a potentially unlimited community of humans to contribute with their appreciations of artworks. Designing convenient incentives is less trivial than it might seem at first. An ecosystem of games is needed so as to collect the meta-data on artworks intended for. One game generates the data that can serve as input of another game. This results in semantically rich meta-data that can be used for building up a successful semantic search engine. Thus, a first part of this thesis reports on a "game ecosystem" specifically designed from one known game and including several novel games belonging to the following game classes: (1) Description Games for collecting obvious and trivial meta-data, basically the well-known ESP (for extra-sensorial perception) game of Luis von Ahn, (2) the Dissemination Game Eligo generating translations, (3) the Diversification Game Karido aiming at sharpening differences between the objects, that is, the artworks, interpreted and (3) the Integration Games Combino, Sentiment and TagATag that generate structured meta-data. Secondly, the approach to building a semantic search engine reported about in this thesis relies on Higher-Order Singular Value Decomposition (SVD). More precisely, the data and meta-data on artworks gathered with the afore mentioned GWAPs are collected in a tensor, that is a mathematical structure generalising matrices to more than only two dimensions, columns and rows. The dimensions considered are the artwork descriptions, the players, and the artwork themselves. A Higher-Order SVD of this tensor is first used for noise reduction in This thesis reports also on deploying a Higher-Order LSA. The parallel Higher-Order SVD algorithm applied for the Higher-Order LSA and its implementation has been validated on an application related to, but independent from, the semantic search engine for artworks striven for: image compression. This thesis reports on the surprisingly good image compression which can be achieved with Higher-Order SVD. While compression methods based on matrix SVD for each color, the approach reported about in this thesis relies on one single (higher-order) SVD of the whole tensor. This results in both, better quality of the compressed image and in a significant reduction of the memory space needed. Higher-Order SVD is extremely time-consuming what calls for parallel computation. Thus, a step towards automatizing the construction of a semantic search engine for artworks was parallelizing the higher-order SVD method used and running the resulting parallel algorithm on a super-computer. This thesis reports on using Hestenes’ method and R-SVD for parallelising the higher-order SVD. This method is an unconventional choice which is explained and motivated. As of the super-computer needed, this thesis reports on turning the web browsers of the players or searchers into a distributed parallel computer. This is done by a novel specific system and a novel implementation of the MapReduce data framework to data parallelism. Harnessing the web browsers of the players or searchers saves computational power on the server-side. It also scales extremely well with the number of players or searchers because both, playing with and searching for artworks, require human reflection and therefore results in idle local processors that can be brought together into a distributed super-computer.Semantische Suchmaschinen dienen der Verbesserung konventioneller Suche mit semantischen Informationen, oder Metadaten, zu Daten, nach denen gesucht wird, oder zu den Suchenden. Bisher nutzt Semantische Suche Charakteristika von Suchenden wie Alter, Bildung oder gesprochene Sprache für die Auswahl und/oder das Ranking von Suchergebnissen. Solche Daten erlauben den Aufbau einer Semantischen Suchmaschine als Erweiterung einer konventionellen Suchmaschine. Die Crawler der fest etablierten Suchmaschinen wie Google, Yahoo! oder Bing können Dokumente indizieren, bisher sind die Fähigkeiten eher beschränkt, die Absichten von Suchenden zu erkennen. Tatsächlich erweitert die Berücksichtigung von Charakteristika von Suchenden beträchtlich beides, die Menge an zu analysierenden Daten und die Dimensionalität des Such-Problems. Fest etablierte Suchmaschinen fokussieren deswegen stark auf allgemeine Suche, also "Suche für alle", nicht auf spezialisierte Suche, also "Suche für wenige". Diese Arbeit berichtet von Techniken, die adaptiert oder konzipiert, eingesetzt und getestet wurden, um eine semantische Suchmaschine für den sehr speziellen Kontext von Kunstwerken aufzubauen. Im Gegensatz beispielsweise zur Interpretation von Röntgenbildern ist die Interpretation von Kunstwerken weit weg davon gänzlich automatisiert werden zu können. Deswegen basiert die Interpretation von Kunstwerken auf menschlichen Berechnungen, also Software-basiertes Sammeln von menschlichen Beiträgen. Der Ansatz, über den in dieser Arbeit berichtet wird, beruht auf sogenannten "Games With a Purpose" oder GWAPs die folgendes sammeln: Zwanglose Spiele bieten einen Anreiz für eine potenziell unbeschränkte Gemeinde von Menschen, mit Ihrer Wertschätzung von Kunstwerken beizutragen. Geeignete Anreize zu entwerfen in weniger trivial als es zuerst scheinen mag. Ein Ökosystem von Spielen wird benötigt, um Metadaten gedacht für Kunstwerke zu sammeln. Ein Spiel erzeugt Daten, die als Eingabe für ein anderes Spiel dienen können. Dies resultiert in semantisch reichhaltigen Metadaten, die verwendet werden können, um eine erfolgreiche Semantische Suchmaschine aufzubauen. Deswegen berichtet der erste Teil dieser Arbeit von einem "Spiel-Ökosystem", entwickelt auf Basis eines bekannten Spiels und verschiedenen neuartigen Spielen, die zu verschiedenen Spiel-Klassen gehören. (1) Beschreibungs-Spiele zum Sammeln offensichtlicher und trivialer Metadaten, vor allem dem gut bekannten ESP-Spiel (Extra Sensorische Wahrnehmung) von Luis von Ahn, (2) dem Verbreitungs-Spiel Eligo zur Erzeugung von Übersetzungen, (3) dem Diversifikations-Spiel Karido, das Unterschiede zwischen Objekten, also interpretierten Kunstwerken, schärft und (3) Integrations-Spiele Combino, Sentiment und Tag A Tag, die strukturierte Metadaten erzeugen. Zweitens beruht der Ansatz zum Aufbau einer semantischen Suchmaschine, wie in dieser Arbeit berichtet, auf Singulärwertzerlegung (SVD) höherer Ordnung. Präziser werden die Daten und Metadaten über Kunstwerk gesammelt mit den vorher genannten GWAPs in einem Tensor gesammelt, einer mathematischen Struktur zur Generalisierung von Matrizen zu mehr als zwei Dimensionen, Spalten und Zeilen. Die betrachteten Dimensionen sind die Beschreibungen der Kunstwerke, die Spieler, und die Kunstwerke selbst. Eine Singulärwertzerlegung höherer Ordnung dieses Tensors wird zuerst zur Rauschreduktion verwendet nach der Methode der sogenannten Latenten Semantischen Analyse (LSA). Diese Arbeit berichtet auch über die Anwendung einer LSA höherer Ordnung. Der parallele Algorithmus für Singulärwertzerlegungen höherer Ordnung, der für LSA höherer Ordnung verwendet wird, und seine Implementierung wurden validiert an einer verwandten aber von der semantischen Suche unabhängig angestrebten Anwendung: Bildkompression. Diese Arbeit berichtet von überraschend guter Kompression, die mit Singulärwertzerlegung höherer Ordnung erzielt werden kann. Neben Matrix-SVD-basierten Kompressionsverfahren für jede Farbe, beruht der Ansatz wie in dieser Arbeit berichtet auf einer einzigen SVD (höherer Ordnung) auf dem gesamten Tensor. Dies resultiert in beidem, besserer Qualität von komprimierten Bildern und einer signifikant geringeren des benötigten Speicherplatzes. Singulärwertzerlegung höherer Ordnung ist extrem zeitaufwändig, was parallele Berechnung verlangt. Deswegen war ein Schritt in Richtung Aufbau einer semantischen Suchmaschine für Kunstwerke eine Parallelisierung der verwendeten SVD höherer Ordnung auf einem Super-Computer. Diese Arbeit berichtet vom Einsatz der Hestenes’-Methode und R-SVD zur Parallelisierung der SVD höherer Ordnung. Diese Methode ist eine unkonventionell Wahl, die erklärt und motiviert wird. Ab nun wird ein Super-Computer benötigt. Diese Arbeit berichtet über die Wandlung der Webbrowser von Spielern oder Suchenden in einen verteilten Super-Computer. Dies leistet ein neuartiges spezielles System und eine neuartige Implementierung des MapReduce Daten-Frameworks für Datenparallelismus. Das Einspannen der Webbrowser von Spielern und Suchenden spart server-seitige Berechnungskraft. Ebenso skaliert die Berechnungskraft so extrem gut mit der Spieleranzahl oder Suchenden, denn beides, Spiel mit oder Suche nach Kunstwerken, benötigt menschliche Reflektion, was deswegen zu ungenutzten lokalen Prozessoren führt, die zu einem verteilten Super-Computer zusammengeschlossen werden können

    Behaviour Profiling using Wearable Sensors for Pervasive Healthcare

    Get PDF
    In recent years, sensor technology has advanced in terms of hardware sophistication and miniaturisation. This has led to the incorporation of unobtrusive, low-power sensors into networks centred on human participants, called Body Sensor Networks. Amongst the most important applications of these networks is their use in healthcare and healthy living. The technology has the possibility of decreasing burden on the healthcare systems by providing care at home, enabling early detection of symptoms, monitoring recovery remotely, and avoiding serious chronic illnesses by promoting healthy living through objective feedback. In this thesis, machine learning and data mining techniques are developed to estimate medically relevant parameters from a participant‘s activity and behaviour parameters, derived from simple, body-worn sensors. The first abstraction from raw sensor data is the recognition and analysis of activity. Machine learning analysis is applied to a study of activity profiling to detect impaired limb and torso mobility. One of the advances in this thesis to activity recognition research is in the application of machine learning to the analysis of 'transitional activities': transient activity that occurs as people change their activity. A framework is proposed for the detection and analysis of transitional activities. To demonstrate the utility of transition analysis, we apply the algorithms to a study of participants undergoing and recovering from surgery. We demonstrate that it is possible to see meaningful changes in the transitional activity as the participants recover. Assuming long-term monitoring, we expect a large historical database of activity to quickly accumulate. We develop algorithms to mine temporal associations to activity patterns. This gives an outline of the user‘s routine. Methods for visual and quantitative analysis of routine using this summary data structure are proposed and validated. The activity and routine mining methodologies developed for specialised sensors are adapted to a smartphone application, enabling large-scale use. Validation of the algorithms is performed using datasets collected in laboratory settings, and free living scenarios. Finally, future research directions and potential improvements to the techniques developed in this thesis are outlined

    Master/worker parallel discrete event simulation

    Get PDF
    The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar

    Time Series Management Systems: A 2022 Survey

    Get PDF

    A Risk-Based Approach in Rehabilitation of Water Distribution Networks

    Get PDF
    A risk-based approach to support water utilities in terms of defining pipe rehabilitation priorities is presented. In a risk analysis in the risk management process, the probability that a given event will happen and the consequences if it does happen have to be estimated and combined. In the quantitative risk analysis, numerical values are assigned to both consequence and probability. In this study, the risk event addressed was the inability to supply water due to pipe breaks. Therefore, on the probability side, the probability of pipes breaking was assessed, and on the consequence side, the reduced ability to satisfy the water demand (hydraulic reliability) due to pipe breakage was computed. Random Forest analysis was implemented for the probability side, while the Asset Vulnerability Analysis Toolkit was used to analyse the network’s hydraulic reliability. Pipes could then be ranked based on the corresponding risk magnitude, thereby feeding a risk evaluation step; at this step, decisions are made concerning which risks need treatment, and also concerning the treatment priorities, i.e., rehabilitation priorities. The water distribution network of Trondheim, Norway, was used as a case study area, and this study illustrates how the developed method aids the development of a risk-based rehabilitation plan.publishedVersio

    Condition Monitoring of Machine Tool Ball Screw Feed Drives Through Signal Analysis and Artificial Intelligence

    Get PDF
    This thesis is set in the context of the large volume of work directed to improving the overall equipment effectiveness (OEE) of manufacturing machines. Of the three OEE factors, performance receives much research attention since it provides simple metrics of parts per hour produced. However, availability and quality, which are the other two factors, can play an equally important role. In the past, high availability has been achieved by time-based or preventive maintenance techniques, which can be expensive and wasteful due to needless repairs and replacement of useful parts. This research aims to develop a cost-effective strategy for machine tool maintenance that improves availability and accuracy by adopting a condition-based or predictive maintenance approach. The approaches under investigation use both machine learning and deep learning techniques to analyse continuous time-series signals to assess a machine tool's condition. For this research, the focus is on applying the techniques to the ball screw assembly of axis feed drives. This is one of the most common machine tool parts whose degradation can affect its availability and positional accuracy. This research data is obtained from experiments on a gantry-type machine tool with two ball screws, where one is good, and the other is worn. In the machine learning approach, wavelet and fast Fourier transforms are employed for data processing on time-series vibration readings before extracting useful features for model training. These extracted features consistently show better accuracy across several machine learning algorithms than those obtained via classical methods. Deep learning is then investigated as an alternative method of analysing time-series data. The chosen approach utilises a pre-trained deep learning neural network based on convolution, which had been successfully used to learn from image files. The novelty in this research arises from the use of convolution-based deep learning on time series data. It does this by the conversion of the vibration signals to image files. The method of converting time-series data streams to images relevant for this analysis has been established and verified. Test results show that the wavelet and fast Fourier transform (FFT) features used in the machine learning approach can outperform the statistical features in classifying the condition of the ball screw. With at least a 98 % accuracy across the examined machine learning networks compared to a range of 87 % (support vector machine) to 96 % (k nearest neighbour). On the other hand, the deep learning technique can achieve at least 98 % or 100 % accuracy when trained with raw and processed data, respectively. The deep learning approach has the advantage of requiring less data processing and better accuracy than the machine learning approach. This research project will contribute to the manufacturing industry by improving the overall equipment effectiveness at a low cost. Furthermore, it can lead to real-time online condition monitoring with less overhead since there is no need for a data processing stage. This research's natural progression would be applying this approach to other parts of a machine tool or equipment. Furthermore, investigating and identifying specific faults and their progression would lead to a more sophisticated system for widespread deployment

    Context & Semantics in News & Web Search

    Full text link

    Place Recognition by Per-Location Classifiers

    Get PDF
    Place recognition is formulated as a task of finding the location where the query image was captured. This is an important task that has many practical applications in robotics, autonomous driving, augmented reality, 3D reconstruction or systems that organize imagery in geographically structured manner. Place recognition is typically done by finding a reference image in a large structured geo-referenced database. In this work, we first address the problem of building a geo-referenced dataset for place recognition. We describe a framework for building the dataset from the street-side imagery of the Google Street View that provides panoramic views from positions along many streets, cities and rural areas worldwide. Besides of downloading the panoramic views and ability to transform them into a set of perspective images, the framework is capable of getting underlying scene depth information. Second, we aim at localizing a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold; (i) we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition, and (ii) as only a few positive training examples are available for each location, we propose two methods to calibrate all the per-location SVM classifiers without the need for additional positive training data. The first method relies on p-values from statistical hypothesis testing and uses only the available negative training data. The second method performs an affine calibration by appropriately normalizing the learned classifier hyperplane and does not need any additional labeled training data. We test the proposed place recognition method with the bag-of-visual-words and Fisher vector image representations suitable for large scale indexing. Experiments are performed on three datasets: 25,000 and 55,000 geotagged street view images of Pittsburgh, and the 24/7 Tokyo benchmark containing 76,000 images with varying illumination conditions. The results show improved place recognition accuracy of the learned image representation over direct matching of raw image descriptors.Katedra kybernetik
    corecore