1,884 research outputs found
An event detection approach based on Twitter hashtags
Twitter is one of the most popular microblogging services in the world. The great amount of information made Twitter an important information channel for people to know and share news. Hashtag is a popular feature when people use Twitter. It can be taken as human labeled information and is useful for people to identify the topic of a tweet. Many researchers have proposed event-detection approaches that can monitor Twitter data and determine whether special events, such as accidents, extreme weather, earthquakes, or crimes, are happening. Although many approaches considered hashtag as one of their features, few of them explicitly focused on the effectiveness of using hashtag on event detection. In this study, we proposed an event detection approach that utilizes hashtags in tweets. We adopted the feature extraction used in STREAMCUBE (Feng et al., 2015) and applied a clustering K-means approach (Lloyd, 1982) to it. The experiments were conducted on 20,514 tweets with 8,616 hashtags collected between November 13, 2015 and November 17, 2015 with general topic of the Paris Attacks. A randomly sampled subset of 200 tweets was also manually labeled by a human subject to verify the approach. Based on the collected tweets, we demonstrated that the K-means approach could perform better than STREAMCUBE in the clustering results. Also, we discussed how to set the K values for the K-means approach to lead to a better clustering performance
The Impact Factor Fetishism
"One of the most popular indicators is the Impact Factor. This paper examines the coming into being of this highly influential figure. It is the offspring of Eugene Garfield’s experimentation with the huge amounts of data available at his Institute for Scientific Information and the result of a number of attempts to find appropriate measurements for the success ('impact') of articles and journals. The completely inductive procedure was initially adjusted by examining the data thoughtfully and by consulting with experts from different scientific disciplines. Later, its calculation modes were imposed on other disciplines without further consideration. The paper demonstrates in detail the inopportune consequences of this, in particular for sociology. Neither the definition of disciplines, nor the selection of journals for the Web of Science/Social Science Citation Index follows any comprehensible rationale. The procedures for calculating the impact factor are inappropriate. Despite its obvious unsuitability, the impact factor is used by editors of sociological journals for marketing and impression management purposes. Fetishism!" (author's abstract
DATASET2050 D2.1 - Data requirements and acquisition
The purpose of this document, Deliverable 2.1, is to describe the sources of data required by the H2020 coordination and support action DATASET2050. Data requirements have been categorised into seven broad groups to support WP3 and WP4: demographic; passenger demand; passenger type; door-to-kerb; kerb-to-gate; airside capacity and competing services. The current scenario is well supported by existing datasets, however the two future scenarios require modelled data
Improving data preparation for the application of process mining
Immersed in what is already known as the fourth industrial revolution, automation and data exchange are taking on a particularly relevant role in complex environments, such as industrial manufacturing environments or logistics. This digitisation and transition to the Industry 4.0 paradigm is causing experts to start analysing business processes from other perspectives. Consequently, where management and business intelligence used to dominate, process mining appears as a link, trying to build a bridge between both disciplines to unite and improve them. This new perspective on process analysis helps to improve strategic decision making and competitive capabilities. Process mining brings together data and process perspectives in a single discipline that covers the entire spectrum of process management. Through process mining, and based on observations of their actual operations, organisations can understand the state of their operations, detect deviations, and improve their performance based on what they observe. In this way, process mining is an ally, occupying a large part of current academic and industrial research.
However, although this discipline is receiving more and more attention, it presents severe application problems when it is implemented in real environments. The variety of input data in terms of form, content, semantics, and levels of abstraction makes the execution of process mining tasks in industry an iterative, tedious, and manual process, requiring multidisciplinary experts with extensive knowledge of the domain, process management, and data processing. Currently, although there are numerous academic proposals, there are no industrial solutions capable of automating these tasks. For this reason, in this thesis by compendium we address the problem of improving business processes in complex environments thanks to the study of the state-of-the-art and a set of proposals that improve relevant aspects in the life cycle of processes, from the creation of logs, log preparation, process quality assessment, and improvement of business processes.
Firstly, for this thesis, a systematic study of the literature was carried out in order to gain an in-depth knowledge of the state-of-the-art in this field, as well as the different challenges faced by this discipline. This in-depth analysis has allowed us to detect a number of challenges that have not been addressed or received insufficient attention, of which three have been selected and presented as the objectives of this thesis. The first challenge is related to the assessment of the quality of input data, known as event logs, since the requeriment of the application of techniques for improving the event log must be based on the level of quality of the initial data, which is why this thesis presents a methodology and a set of metrics that support the expert in selecting which technique to apply to the data according to the quality estimation at each moment, another challenge obtained as a result of our analysis of the literature. Likewise, the use of a set of metrics to evaluate the quality of the resulting process models is also proposed, with the aim of assessing whether improvement in the quality of the input data has a direct impact on the final results.
The second challenge identified is the need to improve the input data used in the analysis of business processes. As in any data-driven discipline, the quality of the results strongly depends on the quality of the input data, so the second challenge to be addressed is the improvement of the preparation of event logs. The contribution in this area is the application of natural language processing techniques to relabel activities from textual descriptions of process activities, as well as the application of clustering techniques to help simplify the results, generating more understandable models from a human point of view.
Finally, the third challenge detected is related to the process optimisation, so we contribute with an approach for the optimisation of resources associated with business processes, which, through the inclusion of decision-making in the creation of flexible processes, enables significant cost reductions. Furthermore, all the proposals made in this thesis are validated and designed in collaboration with experts from different fields of industry and have been evaluated through real case studies in public and private projects in collaboration with the aeronautical industry and the logistics sector
Human capital development : Case Study on BENIN Health & Education
This research is on Human Capital Building in Benin. Recent literature indicates that educational and health care expansion do effectively contribute to output growth and upgraded living standards as the main support to build up a stock of human capital. This study review the cases of early childhood enrolment, access to health care and nutrition and the return on education investment. Increase in access to basic education, health care and nutrition can redistribute income faster than increase in lonely household income and might help in reducing income inequality. It also indicates that income increases with the level of education. The focus of this study is to analyze also the distribution of education and health care in Benin. These lower enrolments and lower access to health care and nutrition are marked with regional inequalities, urban, rural, gender and income biases. The distributional analysis indicates that primary and secondary school enrolment, use of health facilities and the consumption of balanced food are concentrated in the hand of rich households who are located in the urban areas. The study will conclude with recommendations
User Interfaces to the Web of Data based on Natural Language Generation
We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision
- …