29 research outputs found
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
A Survey on Data Integration in Data Warehouse
Data warehousing embraces technology of integrating data from multiple distributed data sources and using that at an in annotated and aggregated form to support business decision-making and enterprise management. Although many techniques have been revisited or newly developed in the context of data warehouses, such as view maintenance and OLAP, little attention has been paid to data mining techniques for supporting the most important and costly tasks of data integration for data warehouse design
From logical forms to SPARQL query with GETARUNS
We present a system for Question Answering which computes a
prospective answer from Logical Forms produced by a full-fledged NLP for
text understanding, and then maps the result onto schemata in SPARQL to be
used for accessing the Semantic Web. As an intermediate step, and whenever
there are complex concepts to be mapped, the system looks for a corresponding
amalgam in YAGO classes. It is just by the internal structure of the Logical
Form that we are able to produce a suitable and meaningful context for concept
disambiguation. Logical Forms are the final output of a complex system for text
understanding - GETARUNS - which can deal with different levels of syntactic
and semantic ambiguity in the generation of a final structure, by accessing
computational lexical equipped with sub-categorization frames and appropriate
selectional restrictions applied to the attachment of complements and adjuncts.
The system also produces pronominal binding and instantiates the implicit
arguments, if needed, in order to complete the required Predicate Argument
structure which is licensed by the semantic component
TEXT MINING AND TEMPORAL TREND DETECTION ON THE INTERNET FOR TECHNOLOGY ASSESSMENT: MODEL AND TOOL
In today´s world, organizations conduct technology assessment (TAS) prior to decision making about investments in existing, emerging, and hot technologies to avoid costly mistakes and survive in the hyper-competitive business environment. Relying on web search engines in looking for relevant information for TAS processes, decision makers face abundant unstructured information that limit their ability to assess technologies within a reasonable time frame. Thus the following qustion arises: how to extract valuable TAS knowledge from a diverse corpus of textual data on the web? To cope with this qustion, this paper presents a web-based model and tool for knowledge mapping. The proposed knowledge maps are constructed on the basis of a novel method of co-word analysis, based on webometric web counts and a temporal trend detection algorithm which employs the vector space model (VSM). The approach is demonstrated and validated for a spectrum of information technologies. Results show that the research model assessments are highly correlated with subjective expert (n=136) assessment (r \u3e 0.91), and with predictive validity valu above 85%. Thus, it seems safe to assume that this work can probably be generalized to other domains. The model contribution is emphasized by the current growing attention to the big-data phenomenon
Extração de informação como base para descoberta de conhecimento em dados não estruturados
Métodos de Descoberta de Conhecimento em Texto ou Knowledge Discovery inText - KDT tem sido aplicados a uma grande variedade de domínios, desde artigos paracongressos, até receituários médicos. KDT é o processo de encontrar padrões e informaçõesimplícitas interessantes ou úteis em um corpo de informação textual não estruturado[LOH 97]. Este processo combina muitas das técnicas de Extração de Informação,Recuperação de Informação, Processamento da Linguagem Natural e Sumarização deDocumentos com os métodos de Data Mining (DM).Os dados estruturados, armazenados na maioria dos Sistemas de Gerência deBancos de Dados, são mais fáceis de serem tratados por meios computacionais, porqueexistem linguagens formais, como SQL e QBE, que permitem sua manipulação e consultade forma mais concisa e precisa [LOH 97]. Os dados não estruturados, por outro lado,necessitam de mecanismos computacionais diferentes dos tradicionalmente usados, paraque possam ser coletados, armazenados, manipulados e consultados. Para aplicar métodostradicionais de DM sobre textos, é necessário impor alguma estrutura para os dados[DIX 97]. Ou seja, alguém deve definir a estrutura destes dados, coletá-los e armazená-losnum Banco de Dados convencional. Entretanto, tal processo necessita de apoioautomatizado, pois é difícil, tedioso e sujeito a erros se feito por pessoas. Neste sentido,Descoberta de Conhecimento em Textos é uma área bastante relacionada com a área de Extração de Informação, bem como a de Recuperação de Informação, e realmente pode-seconsiderar que sistemas de KDT são construídos a partir de componentes que executam estas tarefas [FEL 99]
Контент-аналіз. Історія розвитку і світовий досвід
Монографія присвячена проблемам розвитку одного з найпоширеніших
методів аналізу масових комунікацій – контент-аналізу. Розглядаються етапи
розвитку контент-аналізу, дається характеристика застосування його на
кожному етапі, описуються особливості методики та напрями вдосконалення.
Особлива увага приділяється комп’ютерному контент-аналізу, який поступово
перетворює контент-аналіз з наукового методу в сучасну технологію, яка
знаходить повсюдне масове застосування. Однією з технологій, яка має в
основі контент-аналіз, є Text Mining. Про її можливості та застосування також
ведеться мова в роботі.
Дослідження може прислужитися викладачам, науковцям, політикам,
аспірантам, студентам, усім, хто цікавиться проблемами і методами аналізу
текстів