6,195 research outputs found
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Recommended from our members
Ranking for Scalable Information Extraction
Information extraction systems are complex software tools that discover structured information in natural language text. For instance, an information extraction system trained to extract tuples for an Occurs-in(Natural Disaster, Location) relation may extract the tuple from the sentence: "A tsunami swept the coast of Hawaii." Having information in structured form enables more sophisticated querying and data mining than what is possible over the natural language text. Unfortunately, information extraction is a time-consuming task. For example, a state-of-the-art information extraction system to extract Occurs-in tuples may take up to two hours to process only 1,000 text documents. Since document collections routinely contain millions of documents or more, improving the efficiency and scalability of the information extraction process over these collections is critical. As a significant step towards this goal, this dissertation presents approaches for (i) enabling the deployment of efficient information extraction systems and (ii) scaling the information extraction process to large volumes of text.
To enable the deployment of efficient information extraction systems, we have developed two crucial building blocks for this task. As a first contribution, we have created REEL, a toolkit to easily implement, evaluate, and deploy full-fledged relation extraction systems. REEL, in contrast to existing toolkits, effectively modularizes the key components involved in relation extraction systems and can integrate other long-established text processing and machine learning toolkits. To define a relation extraction system for a new relation and text collection, users only need to specify the desired configuration, which makes REEL a powerful framework for both research and application building. As a second contribution, we have addressed the problem of building representative extraction task-specific document samples from collections, a step often required by approaches for efficient information extraction. Specifically, we devised fully automatic document sampling techniques for information extraction that can produce better-quality document samples than the state-of-the-art sampling strategies; furthermore, our techniques are substantially more efficient than the existing alternative approaches.
To scale the information extraction process to large volumes of text, we have developed approaches that address the efficiency and scalability of the extraction process by focusing the extraction effort on the collections, documents, and sentences worth processing for a given extraction task. For collections, we have studied both (adaptations of) state-of-the art approaches for estimating the number of documents in a collection that lead to the extraction of tuples as well as information extraction-specific approaches. Using these estimations we can identify the collections worth processing and ignore the rest, for efficiency. For documents, we have developed an adaptive document ranking approach that relies on learning-to-rank techniques to prioritize the documents that are likely to produce tuples for an extraction task of choice. Our approach revises the (learned) ranking decisions periodically as the extraction process progresses and new characteristics of the useful documents are revealed. Finally, for sentences, we have developed an approach based on the sparse group selection problem that identifies sentences|modeled as groups of words|that best characterize the extraction task. Beyond identifying sentences worth processing, our approach aims at selecting sentences that lead to the extraction of unseen, novel tuples. Our approaches are lightweight and efficient, and dramatically improve the efficiency and scalability of the information extraction process. We can often complete the extraction task by focusing on just a very small fraction of the available text, namely, the text that contains relevant information for the extraction task at hand. Our approaches therefore constitute a substantial step towards efficient and scalable information extraction over large volumes of text
Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data
This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and
Intents-based Service Discovery and Integration
With the proliferation of Web services, when developing a new application, it makes sense to seek and leverage existing Web services rather than implementing the corresponding components from scratch. Therefore, significant research efforts have been devoted to the techniques for service discovery and integration. However, most of the existing techniques are based on the ternary participant classification of the Web service architecture which only takes into consideration the involvement of service providers, service brokers, and application developers. The activities of application end users are usually ignored.
This thesis presents an Intents-based service discovery and integration approach at the conceptual level inspired by two industrial protocols: Android Intents and Web Intents. The proposed approach is characterized by allowing application end users to participate in the process of service seeking. Instead of directly binding with remote services, application developers can set an intent which semantically represents their service goal. An Intents user agent can resolve the intent and generate a list of candidate services. Then application end users can choose a service as the ultimate working service. This thesis classifies intents into explicit intents, authoritative intents, and naïve intents, and examines in depth the issue of naïve intent resolution analytically and empirically. Based on the empirical analysis, an adaptive intent resolution approach is devised. This thesis also presents a design for the Intents user agent and demonstrates its proof-of-concept prototype. Finally, Intents and the Intents user agent are applied to integrate Web applications and native applications on mobile devices
2017 DWH Long-Term Data Management Coordination Workshop Report
On June 7 and 8, 2017, the Coastal Response Research Center (CRRC)[1], NOAA Office of Response and Restoration (ORR) and NOAA National Marine Fisheries Service (NMFS) Restoration Center (RC), co-sponsored the Deepwater Horizon Oil Spill (DWH) Long Term Data Management (LTDM) workshop at the ORR Gulf of Mexico (GOM) Disaster Response Center (DRC) in Mobile, AL.
There has been a focus on restoration planning, implementation and monitoring of the on-going DWH-related research in the wake of the DWH Natural Resource Damage Assessment (NRDA) settlement. This means that data management, accessibility, and distribution must be coordinated among various federal, state, local, non-governmental organizations (NGOs), academic, and private sector partners. The scope of DWH far exceeded any other spill in the U.S. with an immense amount of data (e.g., 100,000 environmental samples, 15 million publically available records) gathered during the response and damage assessment phases of the incident as well as data that continues to be produced from research and restoration efforts. The challenge with the influx in data is checking the quality, documenting data collection, storing data, integrating it into useful products, managing it and archiving it for long term use. In addition, data must be available to the public in an easily queried and accessible format. Answering questions regarding the success of the restoration efforts will be based on data generated for years to come. The data sets must be readily comparable, representative and complete; be collected using cross-cutting field protocols; be as interoperable as possible; meet standards for quality assurance/quality control (QA/QC); and be unhindered by conflicting or ambiguous terminology.
During the data management process for the NOAA Natural Resource Damage Assessment (NRDA) for the DWH disaster, NOAA developed a data management warehouse and visualization system that will be used as a long term repository for accessing/archiving NRDA injury assessment data. This serves as a foundation for the restoration project planning and monitoring data for the next 15 or more years. The main impetus for this workshop was to facilitate public access to the DWH data collected and managed by all entities by developing linkages to or data exchanges among applicable GOM data management systems.
There were 66 workshop participants (Appendix A) representing a variety of organizations who met at NOAA’s GOM Disaster Response Center (DRC) in order to determine the characteristics of a successful common operating picture for DWH data, to understand the systems that are currently in place to manage DWH data, and make the DWH data interoperable between data generators, users and managers. The external partners for these efforts include, but are not limited to the: RESTORE Council, Gulf of Mexico Research Initiative (GoMRI), Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC), the National Academy of Sciences (NAS) Gulf Research Program, Gulf of Mexico Alliance (GOMA), and National Fish and Wildlife Foundation (NFWF).
The workshop objectives were to: Foster collaboration among the GOM partners with respect to data management and integration for restoration planning, implementation and monitoring; Identify standards, protocols and guidance for LTDM being used by these partners for DWH NRDA, restoration, and public health efforts; Obtain feedback and identify next steps for the work completed by the Environmental Disasters Data Management (EDDM) Working Groups; and Work towards best practices on public distribution and access of this data.
The workshop consisted of plenary presentations and breakout sessions. The workshop agenda (Appendix B) was developed by the organizing committee. The workshop presentations topics included: results of a pre-workshop survey, an overview of data generation, the uses of DWH long term data, an overview of LTDM, an overview of existing LTDM systems, an overview of data management standards/ protocols, results from the EDDM working groups, flow diagrams of existing data management systems, and a vision on managing big data.
The breakout sessions included discussions of: issues/concerns for data stakeholders (e.g., data users, generators, managers), interoperability, ease of discovery/searchability, data access, data synthesis, data usability, and metadata/data documentation.
[1] A list of acronyms is provided on Page 1 of this report
Multimodal Content Delivery for Geo-services
This thesis describes a body of work carried out over several research projects in the area of multimodal interaction for location-based services. Research in this area has progressed from using simulated mobile environments to demonstrate the visual modality, to the ubiquitous delivery of rich media using multimodal interfaces (geo- services). To effectively deliver these services, research focused on innovative solutions to real-world problems in a number of disciplines including geo-location, mobile spatial interaction, location-based services, rich media interfaces and auditory user interfaces. My original contributions to knowledge are made in the areas of multimodal interaction underpinned by advances in geo-location technology and supported by the proliferation of mobile device technology into modern life. Accurate positioning is a known problem for location-based services, contributions in the area of mobile positioning demonstrate a hybrid positioning technology for mobile devices that uses terrestrial beacons to trilaterate position. Information overload is an active concern for location-based applications that struggle to manage large amounts of data, contributions in the area of egocentric visibility that filter data based on field-of-view demonstrate novel forms of multimodal input. One of the more pertinent characteristics of these applications is the delivery or output modality employed (auditory, visual or tactile). Further contributions in the area of multimodal content delivery are made, where multiple modalities are used to deliver information using graphical user interfaces, tactile interfaces and more notably auditory user interfaces. It is demonstrated how a combination of these interfaces can be used to synergistically deliver context sensitive rich media to users - in a responsive way - based on usage scenarios that consider the affordance of the device, the geographical position and bearing of the device and also the location of the device
- …