53 research outputs found

    LifeLogging: personal big data

    Get PDF
    We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking a information retrieval scientist’s perspective on lifelogging and the quantified self

    A User-driven Annotation Framework for Scientific Data

    Get PDF
    Annotations play an increasingly crucial role in scientific exploration and discovery, as the amount of data and the level of collaboration among scientists increases. There are many systems today focusing on annotation management, querying, and propagation. Although all such systems are implemented to take user input (i.e., the annotations themselves), very few systems are user-driven, taking into account user preferences on how annotations should be propagated and applied over data. In this thesis, we propose to treat annotations as first-class citizens for scientific data by introducing a user-driven, view-based annotation framework. Under this framework, we try to resolve two critical questions: Firstly, how do we support annotations that are scalable both from a system point of view and also from a user point of view? Secondly, how do we support annotation queries both from an annotator point of view and a user point of view, in an efficient and accurate way? To address these challenges, we propose the VIew-base annotation Propagation (ViP) framework to empower users to express their preferences over the time semantics of annotations and over the network semantics of annotations, and define three query types for annotations. To efficiently support such novel functionality, ViP utilizes database views and introduces new annotation caching techniques. The use of views also brings a more compact representation of annotations, making our system easier to scale. Through an extensive experimental study on a real system (with both synthetic and real data), we show that the ViP framework can seamlessly introduce user-driven annotation propagation semantics while at the same time significantly improving the performance (in terms of query execution time) over the current state of the art

    Data Management for Dynamic Multimedia Analytics and Retrieval

    Get PDF
    Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

    Crowdsensed Mobile Data Analytics

    Get PDF
    Mobile devices, especially smartphones, are nowadays an essential part of everyday life. They are used worldwide and across all the demographic groups - they can be utilized for multiple functionalities, including but not limited to communications, game playing, social interactions, maps and navigation, leisure, work, and education. With a large on-device sensor base, mobile devices provide a rich source of data. Understanding how these devices are used help us also to increase the knowledge of people's everyday habits, needs, and rituals. Data collection and analysis can thus be utilized in different recommendation and feedback systems that further increase usage experience of the smart devices. Crowdsensed computing describes a paradigm where multiple autonomous devices are used together to collect large-scale data. In the case of smartphones, this kind of data can include running and installed applications, different system settings, such as network connection and screen brightness, and various subsystem variables, such as CPU and memory usage. In addition to the autonomous data collection, user questionnaires can be used to provide a wider view to the user community. To understand smartphone usage as a whole, different procedures are needed for cleaning missing and misleading values and preprocessing information from various sets of variables. Analyzing large-scale data sets - rising in size to terabytes - requires understanding of different Big Data management tools, distributed computing environments, and efficient algorithms to perform suitable data analysis and machine learning tasks. Together, these procedures and methodologies aim to provide actionable feedback, such as recommendations and visualizations, for the benefit of smartphone users, researchers, and application development. This thesis provides an approach to a large-scale crowdsensed mobile analytics. First, this thesis describes procedures for cleaning and preprocessing mobile data collected from real-life conditions, such as current system settings and running applications. It shows how interdependencies between different data items are important to consider when analyzing the smartphone system state as a whole. Second, this thesis provides suitable distributed machine learning and statistical analysis methods for analyzing large-scale mobile data. The algorithms, such as the decision tree-based classification and recommendation system, and information analysis methods presented in this thesis, are implemented in the distributed cloud-computing environment Apache Spark. Third, this thesis provides approaches to generate actionable feedback, such as energy consumption and application recommendations, which can be utilized in the mobile devices themselves or when understanding large crowds of smartphone users. The application areas especially covered in this thesis are smartphone energy consumption analysis in the case of system settings and subsystem variables, trend-based application recommendation system, and analysis of demographic, geographic, and cultural factors in smartphone usage.Erilaiset älylaitteet, erityisesti älypuhelimet, ovat muodostuneet oleelliseksi osaksi arkipäivän elektroniikan käyttöä. Älypuhelinten käyttö ei rajoitu perinteisiin kommunikaatiotoimintoihin, vaan niillä on voitu korvata monia muita laitteita ja palveluita, kuten pelit, kartat, sosiaalinen media, ja monet Internetin kautta saavutettavat palvelut. Koska laitteita on saatavilla monissa eri hintaluokissa, ne ovat pääsääntöisesti lähes kaikkien saatavilla, myös maailmanlaajuisesti. Aina mukana kannettavan älypuhelimen käyttö tuottaa runsaasti henkilökohtaista tietoa, mikä tarjoaa mahdollisuuden analysoida käyttäjien päivittäistä elämää. Henkilökohtaisia suosituksia hyödyntäen käyttäjille voidaan tarjota tietoa, joka auttaa parantamaan käyttäjäkokemusta ja laajentamaan älylaitteen käyttömahdollisuuksia. Joukkoistava havainnointi tarkoittaa tiedonkeräysmenetelmää, jossa useat erilliset laitteet osallistuvat automaattisesti suuremman datajoukon kartuttamiseen. Puhelinlaitteista tällaista kerättävää dataa ovat muun muassa tieto suorituksessa olevista ja asennetuista sovelluksista, erilaiset järjestelmäasetukset, kuten verkkoyhteystiedot ja näytön kirkkaus, sekä lukuisat muut järjestelmätason parametrit, kuten suorittimen ja muistin käyttö. Automaattista datan keräystä voidaan täydentää käyttäjille lähetettävillä kyselyillä. Älypuhelimista kerättävän datan analysoinnissa on monia vaiheita, jotka tekevät koko prosessista haasteellisen. Automaattisesti kerättyyn dataan päätyy helposti virheitä ja puutteita, joiden käsittely on hallittava. Datan määrä kasvaa helposti teratavuluokkaan, jolloin analysointiin tarvitaan suurten datajoukkojen käsittelyyn sopivia hajautettuja laskenta-alustoja ja algoritmeja. Hyödyllisten suositusten generoimiseksi puhelinlaitteisiin liittyvän analyysin halutaan usein olevan reaaliaikaista, mikä asettaa lisää haasteita analyysin suorituskyvylle. Tässä väitöskirjassa esitetään menetelmiä joukkoistetusti havainnoidun älypuhelindatan käsittelemiseksi tehokkaasti ja hyödyllistä informaatiota tuottaen. Väitöskirjan alussa kuvaillaan älypuhelindatan keräämistä prosessina, datan esikäsittelyä ja siistimistä hyödylliseen ja käsiteltävään muotoon. Väitöskirja esittää, että puhelinlaitteen tila tulisi ottaa huomioon kokonaisuutena, jossa useat eri tekijät, kuten samanaikaisesti suoritettavat sovellukset ja toisiinsa liittyvät järjestelmäasetukset vaikuttavat toisiinsa. Tämän jälkeen väitöskirjassa esitetään joitakin sopivia tilastollisen analyysin ja koneoppimisen menetelmiä, joita väitöskirjan tutkimuksessa on käytetty älypuhelindatan analysointiin. Kaikki näistä menetelmistä ovat suoritettavissa hajautetussa laskentaympäristössä ja toteutettu Apache Spark -järjestelmää käyttäen. Lopuksi väitöskirja näyttää, kuinka analyysiä sovelletaan käytännössä käyttäjille suunnatun palautteen ja suositusten generointiin. Päähuomion saavat puhelinlaitteiden energiankulutuksen analysointi, puhelinsovellusten trendien havainnointi, ja erilaisten kulttuuristen ja sosioekonomisten taustatekijöiden huomiointi mobiilikäyttöä tutkittaessa

    From social data mining to forecasting socio-economic crises

    Get PDF
    Abstract.: The purpose of this White Paper of the EU Support Action "Visioneer”(see www.visioneer.ethz.ch) is to address the following goals: 1. Develop strategies to quickly increase the objective knowledge about social and economic systems. 2. Describe requirements for efficient large-scale scientific data mining of anonymized social and economic data. 3. Formulate strategies how to collect stylized facts extracted from large data set. 4. Sketch ways how to successfully build up centers for computational social science. 5. Propose plans how to create centers for risk analysis and crisis forecasting. 6. Elaborate ethical standards regarding the storage, processing, evaluation, and publication of social and economic dat

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Lightweight Federation of Non-Cooperating Digital Libraries

    Get PDF
    This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs). The objective of this research is to demonstrate the feasibility of interoperability among non-cooperating DLs by presenting a lightweight, data driven approach, or Data Centered Interoperability (DCI). We build a Lightweight Federated Digital Library (LFDL) system to provide federated search service for existing digital libraries with no prior coordination. We describe the motivation, architecture, design and implementation of the LFDL. We develop, deploy, and evaluate key services of the federation. The major difference to existing DL interoperability approaches is one where we do not insist on cooperation among DLs, that is, they do not have to change anything in their system or processes. The underlying approach is to have a dynamic federation where digital libraries can be added (removed) to the federation in real-time. This is made possible by describing the behavior of participating DLs in an XML-based language that the federation engine understands. The major contributions of this work are: (1) This dissertation addresses the interoperability issues among non-cooperating DLs and presents a practical and efficient approach toward providing federated search service for those DLs. The DL itself remains autonomous and does not need to change its structure, data format, protocol and other internal features when it is added to the federation. (2) The implementation of the LFDL is based on a lightweight, dynamic, data-centered and rule-driven architecture. To add a DL to the federation, all that is needed is observing a DL\u27s interaction with the user and storing the interaction specification in a human-readable and highly maintainable format. The federation engine provides the federated service based on the specification of a DL. A registration service allows dynamic DL registration, removal, or modification. No code needs to be rewritten or recompiled to add or change a DL. These notions are achieved by designing a new specification language in XML format and a powerful processing engine that enforces and implements the rules specified using the language. (3) In this thesis we explore an alternate approach where searches are distributed to participating DLs in real time. We have addressed the performance and reliability problems associated with other distributed search approaches. This is achieved by a locally maintained metadata repository extracted from DLs, as well as an efficient caching system based on the repository

    Content-based indexing of low resolution documents

    Get PDF
    In any multimedia presentation, the trend for attendees taking pictures of slides that interest them during the presentation using capturing devices is gaining popularity. To enhance the image usefulness, the images captured could be linked to image or video database. The database can be used for the purpose of file archiving, teaching and learning, research and knowledge management, which concern image search. However, the above-mentioned devices include cameras or mobiles phones have low resolution resulted from poor lighting and noise. Content-Based Image Retrieval (CBIR) is considered among the most interesting and promising fields as far as image search is concerned. Image search is related with finding images that are similar for the known query image found in a given image database. This thesis concerns with the methods used for the purpose of identifying documents that are captured using image capturing devices. In addition, the thesis also concerns with a technique that can be used to retrieve images from an indexed image database. Both concerns above apply digital image processing technique. To build an indexed structure for fast and high quality content-based retrieval of an image, some existing representative signatures and the key indexes used have been revised. The retrieval performance is very much relying on how the indexing is done. The retrieval approaches that are currently in existence including making use of shape, colour and texture features. Putting into consideration these features relative to individual databases, the majority of retrievals approaches have poor results on low resolution documents, consuming a lot of time and in the some cases, for the given query image, irrelevant images are obtained. The proposed identification and indexing method in the thesis uses a Visual Signature (VS). VS consists of the captures slides textual layout’s graphical information, shape’s moment and spatial distribution of colour. This approach, which is signature-based are considered for fast and efficient matching to fulfil the needs of real-time applications. The approach also has the capability to overcome the problem low resolution document such as noisy image, the environment’s varying lighting conditions and complex backgrounds. We present hierarchy indexing techniques, whose foundation are tree and clustering. K-means clustering are used for visual features like colour since their spatial distribution give a good image’s global information. Tree indexing for extracted layout and shape features are structured hierarchically and Euclidean distance is used to get similarity image for CBIR. The assessment of the proposed indexing scheme is conducted based on recall and precision, a standard CBIR retrieval performance evaluation. We develop CBIR system and conduct various retrieval experiments with the fundamental aim of comparing the accuracy during image retrieval. A new algorithm that can be used with integrated visual signatures, especially in late fusion query was introduced. The algorithm has the capability of reducing any shortcoming associated with normalisation in initial fusion technique. Slides from conferences, lectures and meetings presentation are used for comparing the proposed technique’s performances with that of the existing approaches with the help of real data. This finding of the thesis presents exciting possibilities as the CBIR systems is able to produce high quality result even for a query, which uses low resolution documents. In the future, the utilization of multimodal signatures, relevance feedback and artificial intelligence technique are recommended to be used in CBIR system to further enhance the performance
    corecore