8 research outputs found
Data analytics for data variety
A internet fez com que os gestores das organizações tivessem acesso a grandes quantidades de
dados e esses dados são apresentados em diferentes formatos, em concreto, estruturados,
semiestruturados e não estruturados. Esta variedade de dados é essencialmente proveniente das
redes sociais, mas não só, também são provenientes da Internet of Things. Verifica-se para os
dados estruturados que existem técnicas validadas, estudadas e maduras, mas para os outros
tipos de dados, ou seja, semiestruturados e não estruturados tal já não se verifica. Neste poster,
é apresentado um conjunto de técnicas de análise de dados para os dados semiestruturados e
não estruturados, utilizando como principal bibliografia conferências de investigação na área
de análise de dados.Through the Internet, the organizations managers had access to massive amounts of data and
these data are presented in different formats, namely, structured, semi-structured and
unstructured. These variety of data is essentially generated from social networks, but not only,
they also are generated from the Internet of Things, from machines, sensors, among others.
While the structured data has techniques well studied, mature and validated, otherwise the
other types of techniques, semi-structured and unstructured, this is no longer true. In this
poster, a set of data analysis techniques is presented for the semi-structured and unstructured
data by using as main bibliography data analytics conferences.(undefined)info:eu-repo/semantics/publishedVersio
Exploring data analytics of data variety
The Internet allows organizations managers access to large amounts of data, and this data are presented in different formats, i.e., data variety, namely structured, semi-structured and unstructured. Based on the Internet, this data variety is partly derived from social networks, but not only, machines are also capable of sharing information among themselves, or even machines with people. The objective of this paper is to understand how to retrieve information from data analysis with data variety. An experiment was carried out, based on a dataset with two distinct data types, images and comments on cars. Techniques of data analysis were used, namely Natural Language Processing to identify patterns, and Sentimental and Emotional Analysis. The image recognition technique was used to associate a car model with a category. Next, OLAP cubes and their visualization through dashboards were created. This paper concludes that it is possible to extract a set of relevant information, namely identifying which cars people like more/less, among other information.COMPETE: POCI-01-0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/201
A Topic Recommender for Journalists
The way in which people acquire information on events and form their own
opinion on them has changed dramatically with the advent of social media. For many
readers, the news gathered from online sources become an opportunity to share points
of view and information within micro-blogging platforms such as Twitter, mainly
aimed at satisfying their communication needs. Furthermore, the need to deepen the
aspects related to news stimulates a demand for additional information which is often
met through online encyclopedias, such as Wikipedia. This behaviour has also
influenced the way in which journalists write their articles, requiring a careful assessment
of what actually interests the readers. The goal of this paper is to present
a recommender system, What to Write and Why, capable of suggesting to a journalist,
for a given event, the aspects still uncovered in news articles on which the
readers focus their interest. The basic idea is to characterize an event according to
the echo it receives in online news sources and associate it with the corresponding
readers’ communicative and informative patterns, detected through the analysis of
Twitter and Wikipedia, respectively. Our methodology temporally aligns the results
of this analysis and recommends the concepts that emerge as topics of interest from
Twitter and Wikipedia, either not covered or poorly covered in the published news
articles
Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph
In this paper, we describe an embedding-based entity recommendation framework
for Wikipedia that organizes Wikipedia into a collection of graphs layered on
top of each other, learns complementary entity representations from their
topology and content, and combines them with a lightweight learning-to-rank
approach to recommend related entities on Wikipedia. Through offline and online
evaluations, we show that the resulting embeddings and recommendations perform
well in terms of quality and user engagement. Balancing simplicity and quality,
this framework provides default entity recommendations for English and other
languages in the Yahoo! Knowledge Graph, which Wikipedia is a core subset of.Comment: 8 pages, 4 figures, 8 tables. To be appeared in Wiki Workshop 2020,
Companion Proceedings of the Web Conference 2020(WWW 20 Companion), Taipei,
Taiwa
Data Analytics for Data Variety
Through the Internet, the organizations managers had access to massive amounts of data and these data are presented in different formats, namely, structured, semi-structured and unstructured. These variety of data is essentially generated from social networks, but not only, they also are generated from the Internet of Things, from machines, sensors, among others. While the structured data has techniques well studied, mature and validated, otherwise the other types of techniques, semi-structured and unstructured, this is no longer true. In this poster, a set of data analysis techniques is presented for the semi-structured and unstructured data by using as main bibliography data analytics conferences
Entity-Oriented Search
This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms
Semantic Annotation and Search: Bridging the Gap between Text, Knowledge and Language
In recent years, the ever-increasing quantities of entities in large knowledge bases on the Web, such as DBpedia, Freebase and YAGO, pose new challenges but at the same time open up new opportunities for intelligent information access. These knowledge bases (KBs) have become valuable resources in many research areas, such as natural language processing (NLP) and information retrieval (IR). Recently, almost every major commercial Web search engine has incorporated entities into their search process, including Google’s Knowledge Graph, Yahoo!’s Web of Objects and Microsoft’s Satori Graph/Bing Snapshots. The goal is to bridge the semantic gap between natural language text and formalized knowledge.
Within the context of globalization, multilingual and cross-lingual access to information has emerged as an issue of major interest. Nowadays, more and more people from different countries are connecting to the Internet, in particular the Web, and many users can understand more than one language. While the diversity of languages on the Web has been growing, for most people there is still very little content in their native language. As a consequence of the ability to understand more than one language, users are also interested in Web content in other languages than their mother tongue. There is an impending need for technologies that can help in overcoming the language barrier for multilingual and cross-lingual information access. In this thesis, we face the overall research question of how to allow for semantic-aware and cross-lingual processing of Web documents and user queries by leveraging knowledge bases.
With the goal of addressing this complex problem, we provide the following solutions: (1) semantic annotation for addressing the semantic gap between Web documents and knowledge; (2) semantic search for coping with the semantic gap between keyword queries and knowledge; (3) the exploitation of cross-lingual semantics for overcoming the language barrier between natural language expressions (i.e., keyword queries and Web documents) and knowledge for enabling cross-lingual semantic annotation and search. We evaluated these solutions and the results showed advances beyond the state-of-the-art. In addition, we implemented a framework of cross-lingual semantic annotation and search, which has been widely used for cross-lingual processing of media content in the context of our research projects