15,418 research outputs found
Mining Social Media to Extract Structured Knowledge through Semantic Roles
Semantics is a well-kept secret in texts, accessible only to humans. Artificial Intelligence struggles to enrich machines with human-like features, therefore accessing this treasure and sharing it with computers is one of the main challenges that the computational linguistics domain faces nowadays. In order to teach computers to understand humans, language models need to be specified and created from human knowledge. While still far from completely decoding hidden messages in political speeches, computer scientists and linguists have joined efforts in making the language easier to be understood by machines. This paper aims to introduce the VoxPopuli platform, an instrument to collect user generated content, to analyze it and to generate a map of semantically-related concepts by capturing crowd intelligence
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Text Analytics: the convergence of Big Data and Artificial Intelligence
The analysis of the text content in emails, blogs,
tweets, forums and other forms of textual communication
constitutes what we call text analytics. Text analytics is applicable
to most industries: it can help analyze millions of emails; you can
analyze customers’ comments and questions in forums; you can
perform sentiment analysis using text analytics by measuring
positive or negative perceptions of a company, brand, or product.
Text Analytics has also been called text mining, and is a subcategory
of the Natural Language Processing (NLP) field, which is one of the
founding branches of Artificial Intelligence, back in the 1950s, when
an interest in understanding text originally developed. Currently
Text Analytics is often considered as the next step in Big Data
analysis. Text Analytics has a number of subdivisions: Information
Extraction, Named Entity Recognition, Semantic Web annotated
domain’s representation, and many more. Several techniques are
currently used and some of them have gained a lot of attention,
such as Machine Learning, to show a semisupervised enhancement
of systems, but they also present a number of limitations which
make them not always the only or the best choice. We conclude
with current and near future applications of Text Analytics
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
Biomedical taxonomies, thesauri and ontologies in the form of the
International Classification of Diseases (ICD) as a taxonomy or the National
Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in
acquiring, representing and processing information about human health. With
increasing adoption and relevance, biomedical ontologies have also
significantly increased in size. For example, the 11th revision of the ICD,
which is currently under active development by the WHO contains nearly 50,000
classes representing a vast variety of different diseases and causes of death.
This evolution in terms of size was accompanied by an evolution in the way
ontologies are engineered. Because no single individual has the expertise to
develop such large-scale ontologies, ontology-engineering projects have evolved
from small-scale efforts involving just a few domain experts to large-scale
projects that require effective collaboration between dozens or even hundreds
of experts, practitioners and other stakeholders. Understanding how these
stakeholders collaborate will enable us to improve editing environments that
support such collaborations. We uncover how large ontology-engineering
projects, such as the ICD in its 11th revision, unfold by analyzing usage logs
of five different biomedical ontology-engineering projects of varying sizes and
scopes using Markov chains. We discover intriguing interaction patterns (e.g.,
which properties users subsequently change) that suggest that large
collaborative ontology-engineering projects are governed by a few general
principles that determine and drive development. From our analysis, we identify
commonalities and differences between different projects that have implications
for project managers, ontology editors, developers and contributors working on
collaborative ontology-engineering projects and tools in the biomedical domain.Comment: Published in the Journal of Biomedical Informatic
- …