81,793 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
Industrial-Strength Documentation for ACL2
The ACL2 theorem prover is a complex system. Its libraries are vast.
Industrial verification efforts may extend this base with hundreds of thousands
of lines of additional modeling tools, specifications, and proof scripts. High
quality documentation is vital for teams that are working together on projects
of this scale. We have developed XDOC, a flexible, scalable documentation tool
for ACL2 that can incorporate the documentation for ACL2 itself, the Community
Books, and an organization's internal formal verification projects, and which
has many features that help to keep the resulting manuals up to date. Using
this tool, we have produced a comprehensive, publicly available ACL2+Books
Manual that brings better documentation to all ACL2 users. We have also
developed an extended manual for use within Centaur Technology that extends the
public manual to cover Centaur's internal books. We expect that other
organizations using ACL2 will wish to develop similarly extended manuals.Comment: In Proceedings ACL2 2014, arXiv:1406.123
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Using Project Management Techniques to Design a PMP Mathematics Study App for the Windows Universal Platform
Background
As a late comer to the smartphone market, Microsoft has fallen behind the Apple and Google app ecosystems in the quantity and quality of apps offered. To attract developer talent, Microsoft released the Universal Windows Platform which enables apps to run across Windows devices with few additional modifications. Although the Windows app ecosystem has realized an increased number of available apps, few apps related to project management are currently available.
About the project
This project will design a PMP Certification Mathematics Study App for the Universal Windows Platform which will serve as a reference and study aid for the PMP certification exam. The app will be available to mobile and PC users who are utilizing the Microsoft Windows 10 and Windows 8 operating systems. Features of the app will include project management formula lookup, formula flashcards, and practice problems. At the completion of the project, the app will be submitted to the Windows Store for review and publishing to the Windows 10 application ecosystem.
Approach
The project scope will include the design of the app from requirements gathering to completion. Project deliverables will be aligned with Windows store applications evaluation criteria for responsiveness, reliability, and style. This project will conclude with submission of a completed application design to the project sponsor.Title Page / Table of Contents / List of Exhibits / Abstract / Background / About the project / Approach / Keywords / Introduction / Project Purpose / Project Approach / Research and Analysis / Research Approach / Research Analysis / Application Design Rating Verification / Research Objective 1: Investigate the preferred learning style of potential users / Research Objective 1: Design Conclusions and Implications / Flashcards Module / Formula Builder Module / Formula Reference Module / Research Objective 2: Investigate the most important aspect of user satisfaction / Research Objective 2: Design Conclusions and Implications / Research Conclusions / Requirements Gathering / User Interface Design / Project Deliverable Design / ViTech CORE / Input Application Requirements / Identify Application Components / Identify Component Functions / Identify Use Cases and Test Activities / Project Deliverables / Application Design Documents / Application Hierarchy / Conclusions and Recommendations / ViTech CORE Software Con/ lusions / Graphing Capabilities / Diagnostics Capabilities / Requirements Mapping and Verification / Final Project Deliverables / Recommendations for Further Research and Development / Application Publishing / Further Development and Product Updates / User Feedback Collection / Application Update Opportunities / Application Expansion Opportunitie
Adaptive text mining: Inferring structure from sequences
Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively
- âŚ