84,669 research outputs found
Web Content Mining for Information on Information Scientists
This paper presents a search system for information on scientists which was implemented prototypically for the area of information science, employing Web Content Mining techniques. The sources that are used in the implemented approach are online publication services and personal homepages of scientists. The system contains wrappers for querying the publication services and information extraction from their result pages, as well as methods for information extraction from homepages, which are based on heuristics concerning structure and composition of the pages. Moreover a specialised search technique for searching for personal homepages of information scientists was developed
Mining Behavior of Citizen Sensor Communities to Improve Cooperation with Organizational Actors
Web 2.0 (social media) provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens generate content for sharing information and engaging in discussions. Such a citizen sensor community (CSC) has stated or implied goals that are helpful in the work of formal organizations, such as an emergency management unit, for prioritizing their response needs. This research addresses questions related to design of a cooperative system of organizations and citizens in CSC. Prior research by social scientists in a limited offline and online environment has provided a foundation for research on cooperative behavior challenges, including \u27articulation\u27 and \u27awareness\u27, but Web 2.0 supported CSC offers new challenges as well as opportunities. A CSC presents information overload for the organizational actors, especially in finding reliable information providers (for awareness), and finding actionable information from the data generated by citizens (for articulation). Also, we note three data level challenges: ambiguity in interpreting unconstrained natural language text, sparsity of user behaviors, and diversity of user demographics. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues. I present a novel web information-processing framework, called the Identify-Match- Engage (IME) framework. IME allows operationalizing computation in design problems of awareness and articulation of the cooperative system between citizens and organizations, by addressing data problems of group engagement modeling and intent mining. The IME framework includes: a.) Identification of cooperation-assistive intent (seeking-offering) from short, unstructured messages using a classification model with declarative, social and contrast pattern knowledge, b.) Facilitation of coordination modeling using bipartite matching of complementary intent (seeking-offering), and c.) Identification of user groups to prioritize for engagement by defining a content-driven measure of \u27group discussion divergence\u27. The use of prior knowledge and interplay of features of users, content, and network structures efficiently captures context for computing cooperation-assistive behavior (intent and engagement) from unstructured social data in the online socio-technical systems. Our evaluation of a use-case of the crisis response domain shows improvement in performance for both intent classification and group engagement prioritization. Real world applications of this work include use of the engagement interface tool during various recent crises including the 2014 Jammu and Kashmir floods, and intent classification as a service integrated by the crisis mapping pioneer Ushahidi\u27s CrisisNET project for broader impact
From Social Data Mining to Forecasting Socio-Economic Crisis
Socio-economic data mining has a great potential in terms of gaining a better
understanding of problems that our economy and society are facing, such as
financial instability, shortages of resources, or conflicts. Without
large-scale data mining, progress in these areas seems hard or impossible.
Therefore, a suitable, distributed data mining infrastructure and research
centers should be built in Europe. It also appears appropriate to build a
network of Crisis Observatories. They can be imagined as laboratories devoted
to the gathering and processing of enormous volumes of data on both natural
systems such as the Earth and its ecosystem, as well as on human
techno-socio-economic systems, so as to gain early warnings of impending
events. Reality mining provides the chance to adapt more quickly and more
accurately to changing situations. Further opportunities arise by individually
customized services, which however should be provided in a privacy-respecting
way. This requires the development of novel ICT (such as a self- organizing
Web), but most likely new legal regulations and suitable institutions as well.
As long as such regulations are lacking on a world-wide scale, it is in the
public interest that scientists explore what can be done with the huge data
available. Big data do have the potential to change or even threaten democratic
societies. The same applies to sudden and large-scale failures of ICT systems.
Therefore, dealing with data must be done with a large degree of responsibility
and care. Self-interests of individuals, companies or institutions have limits,
where the public interest is affected, and public interest is not a sufficient
justification to violate human rights of individuals. Privacy is a high good,
as confidentiality is, and damaging it would have serious side effects for
society.Comment: 65 pages, 1 figure, Visioneer White Paper, see
http://www.visioneer.ethz.c
The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective
The World Wide Web no longer consists just of HTML pages. Our work sheds
light on a number of trends on the Internet that go beyond simple Web pages.
The hidden Web provides a wealth of data in semi-structured form, accessible
through Web forms and Web services. These services, as well as numerous other
applications on the Web, commonly use XML, the eXtensible Markup Language. XML
has become the lingua franca of the Internet that allows customized markups to
be defined for specific domains. On top of XML, the Semantic Web grows as a
common structured data source. In this work, we first explain each of these
developments in detail. Using real-world examples from scientific domains of
great interest today, we then demonstrate how these new developments can assist
the managing, harvesting, and organization of data on the Web. On the way, we
also illustrate the current research avenues in these domains. We believe that
this effort would help bridge multiple database tracks, thereby attracting
researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011
- …