855 research outputs found

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis

    Get PDF
    Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data. Heterogeneous data centers are pushing towards more cost-efficient architectures with better resource provisioning. In this paper we study the feasibility of using disaggregated architectures for intensive data applications, in contrast to the monolithic approach of server-oriented architectures. Particularly, we have tested a proactive network analysis system in which the workload demands are highly variable. In the context of the dReDBox disaggregated architecture, the results show that the overhead caused by using remote memory resources is significant, between 66\% and 80\%, but we have also observed that the memory usage is one order of magnitude higher for the stress case with respect to average workloads. Therefore, dimensioning memory for the worst case in conventional systems will result in a notable waste of resources. Finally, we found that, for the selected use case, parallelism is limited by memory. Therefore, using a disaggregated architecture will allow for increased parallelism, which, at the same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper will be presented during the IEEE International Conference on High Performance Computing and Communications in Bangkok, Thailand. 18 - 20 December, 2017. To be published in the conference proceeding

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    A bottom-up approach to real-time search in large networks and clouds

    Full text link

    PanDA Workload Management System Meta-data Segmentation

    Get PDF
    AbstractThe PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. PanDA currently distributes jobs among more than 100,000 cores at well over 120 Grid sites, supercomputing centers, commercial and academic clouds. ATLAS physicists submit more than 1.5M data processing, simulation and analysis PanDA jobs per day, and the system keeps all meta-information about job submissions and execution events in Oracle RDBMS. The above information is used for monitoring and accounting purposes. One of the most challenging monitoring issues is tracking errors that has occurred during the execution of the jobs. Current meta-data storage technology doesn’t support inner tools for data aggregation, needed to build error summary tables, charts and graphs. Delegating these tasks to the monitor slows down the execution of requests.We will describe a project aimed at optimizing interaction between PanDA front-end and back-end, by meta-data storage segmentation into two parts – operational and archived. Active meta-data are remained in Oracle database (operational part), due to the high requirements for data integrity. Historical (read-only) meta-data used for the system analysis and accounting are exported to NoSQL storage (archived part). New data model based on usage of Cassandra as the NoSQL backend has been designed as a set of query-specific data structures. This allowed to remove most of data preparation workload from PanDA Monitor and improve its scalability and performance. Segmentation and synchronization between operational and archived parts of jobs meta-data is provided by a Hybrid Meta-data Storage Framework (HMSF). PanDA monitor was partly adopted to interact with HMSF. The operational data queries are forwarded to the primary SQL-based repository and the analytic data requests are processed by NoSQL database. The results of performance and scalability tests of HMSF-adopted part of PanDA Monitor shows that presented method of optimization, in conjunction with a properly configured NoSQL database and reasonable data model, provides performance improvements and scalability
    • …
    corecore