Search CORE

35,745 research outputs found

Text Summarization

Author: Kim Youn S.
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2012
Field of study

With the overwhelming amount of textual information available in electronic formats on the web, there is a need for an efficient text summarizer capable of condensing large bodies of text into shorter versions while keeping the relevant information intact. Such a technology would allow users to get their information in a shortened form, saving valuable time. Since 1997, Microsoft Word has included a summarizer for documents, and currently there are companies that summarize breaking news and send SMS for mobile phones. I wish to create a text summarizer to provide condensed versions of original documents. My focus is on blogs, because people are increasingly using this mode of communication to express their opinions on a variety of topics. Consequently, it will be very useful for a reader to be able to employ a concise summary, tailored to his or her own interests to quickly browse through volumes of opinions relevant to any number of topics. Although many summarization methods exist, my approach involves employing the Lanczos algorithm to compute eigenvalues and eigenvectors of a large sparse matrix and SVD (Singular Value Decomposition) as a means of identifying latent topics hidden in contexts; and the next phase of the process involves taking a high-dimensional set of data and reducing it to a lower-dimensional set. This procedure makes it possible to identify the best approximation of the original text. Since SQL makes it possible to allow analyzing data sets and take advantage of the parallel processing available today, in most database management systems, SQL is employed in my project. The utilization of SQL without external math libraries, however, adds to challenge in the computation of the SVD and the Lanczos algorithm

SJSU ScholarWorks

DataSpread: Unifying Databases and Spreadsheets.

Author: Aditya Parameswaran
Bofan Sun
Ding Zhang
Kevin Chang
Mangesh Bendre
Shy-yauer Lin
Xinyan Zhou
Publication venue: eScholarship, University of California
Publication date: 01/08/2015
Field of study

Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current pane (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases

CiteSeerX

PubMed Central

eScholarship - University of California

Метод автоматичної зовнішньої оптимізації sql-запитів в умовах невизначеності фізичної та логічної структури бази даних

Author: Костенко П. П.
Publication venue: 'National Aviation University'
Publication date: 23/05/2012
Field of study

The paper deals with factors affecting on the data accessing speed in information systems. A method of automatic external SQL-query optimization is presented, it is based on the local model of the controlled process, and it makes the optimization of the SQL-queries regardless of the used database management system and its settings. Structural and functional diagram of the adaptive system of the external SQL-query optimization is presented.В работе рассмотрены факторы, влияющие на скорость получения информации в информационных системах. Изложен метод автоматической внешней оптимизации SQL-запросов на основе локальной модели управляемого процесса, который позволяет проводить оптимизацию SQL-запросов независимо от применяемой системы управления базами данных и ее настроек. Представлена структурно-функциональная схема адаптивной системы внешней оптимизации SQL-запросов.В роботі розглянуто фактори, які впливають на швидкість отримання інформації в інформаційних системах. Викладено метод автоматичної зовнішньої оптимізації SQL-запитів на основі локальної моделі керованого процесу, який дозволяє проводити оптимізацію SQL-запитів незалежно від застосованої системи керування базами даних та її налаштувань. Представлена структурно-функціональна схема адаптивної системи зовнішньої оптимізації SQL-запитів

Наукові журнали Національного Авіаційного Університету

A data management system for structural genomics

Author: Cygler Miroslaw
O'Toole Nicholas
Raymond Stéphane
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. RESULTS: We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. CONCLUSION: Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements

NRC Publications Archive

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship@McGill

An integrated information retrieval and document management system

Author: Alvarez J. Fernando
Chen James
Chen William
Cheung Lai-Mei
Clancy Susan
Coles L. Stephen
Wong Alexis
Publication venue
Publication date
Field of study

This paper describes the requirements and prototype development for an intelligent document management and information retrieval system that will be capable of handling millions of pages of text or other data. Technologies for scanning, Optical Character Recognition (OCR), magneto-optical storage, and multiplatform retrieval using a Standard Query Language (SQL) will be discussed. The semantic ambiguity inherent in the English language is somewhat compensated-for through the use of coefficients or weighting factors for partial synonyms. Such coefficients are used both for defining structured query trees for routine queries and for establishing long-term interest profiles that can be used on a regular basis to alert individual users to the presence of relevant documents that may have just arrived from an external source, such as a news wire service. Although this attempt at evidential reasoning is limited in comparison with the latest developments in AI Expert Systems technology, it has the advantage of being commercially available

NASA Technical Reports Server

A generic persistence model for CLP systems (and two useful implementations)

Author: Cabeza Gras Daniel
Carro Liñares Manuel
Correas Fernandez Jesús
Gómez J. M.
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/2003
Field of study

This paper describes a model of persistence in (C)LP languages and two different and practically very useful ways to implement this model in current systems. The fundamental idea is that persistence is a characteristic of certain dynamic predicates (Le., those which encapsulate state). The main effect of declaring a predicate persistent is that the dynamic changes made to such predicates persist from one execution to the next one. After proposing a syntax for declaring persistent predicates, a simple, file-based implementation of the concept is presented and some examples shown. An additional implementation is presented which stores persistent predicates in an external datábase. The abstraction of the concept of persistence from its implementation allows developing applications which can store their persistent predicates alternatively in files or databases with only a few simple changes to a declaration stating the location and modality used for persistent storage. The paper presents the model, the implementation approach in both the cases of using files and relational databases, a number of optimizations of the process (using information obtained from static global analysis and goal clustering), and performance results from an implementation of these ideas

Archivo Digital UPM

Vulnerability anti-patterns:a timeless way to capture poor software practices (Vulnerabilities)

Author: Coull Natalie
Ferguson Ian
Nafees Tayyaba
Sampson Adam
Publication venue
Publication date: 29/11/2018
Field of study

There is a distinct communication gap between the software engineering and cybersecurity communities when it comes to addressing reoccurring security problems, known as vulnerabilities. Many vulnerabilities are caused by software errors that are created by software developers. Insecure software development practices are common due to a variety of factors, which include inefficiencies within existing knowledge transfer mechanisms based on vulnerability databases (VDBs), software developers perceiving security as an afterthought, and lack of consideration of security as part of the software development lifecycle (SDLC). The resulting communication gap also prevents developers and security experts from successfully sharing essential security knowledge. The cybersecurity community makes their expert knowledge available in forms including vulnerability databases such as CAPEC and CWE, and pattern catalogues such as Security Patterns, Attack Patterns, and Software Fault Patterns. However, these sources are not effective at providing software developers with an understanding of how malicious hackers can exploit vulnerabilities in the software systems they create. As developers are familiar with pattern-based approaches, this paper proposes the use of Vulnerability Anti-Patterns (VAP) to transfer usable vulnerability knowledge to developers, bridging the communication gap between security experts and software developers. The primary contribution of this paper is twofold: (1) it proposes a new pattern template – Vulnerability Anti-Pattern – that uses anti-patterns rather than patterns to capture and communicate knowledge of existing vulnerabilities, and (2) it proposes a catalogue of Vulnerability Anti-Patterns (VAP) based on the most commonly occurring vulnerabilities that software developers can use to learn how malicious hackers can exploit errors in software

Abertay Research Portal