41 research outputs found
Defining data literacy: An empirical study of data literacy dimensions
Data literacy has become a core component in higher education as it encompasses a range of data skills and the knowledge necessary to deal with data, which are critical in our social and work lives in the advent of big data. Multiple perspectives to define data literacy have emerged from multiple disciplines, including information science, computer science, business, and education. Along with this, there have been efforts to develop a data literacy competency model to enhance our understanding of the required skills for data literacy. But each model has a different focus, context, and target audience – for instance, some efforts are intended to address the data literacy needs of citizens in today’s society because they see data literacy as a life skill, whereas others are intended to define data literacy as one of the essential skills required to perform tasks in a specific career.
Although the importance of data literacy is increasingly recognized, there is no consensus about the definition of data literacy. Further, the constituent dimensions of data literacy remain disputed. As such, this presentation will illustrate the preliminary results of a bibliometric analysis of data literacy literatures in recent ten years. Through citation analysis and topic analysis, this study aims to identify the central dimensions of data literacy and develop an integrated model for data literacy
Enhancing Decision-making in Smart and Connected Communities with Digital Traces
The ubiquitous use of information communication technologies (ICTs) enables generation of digital traces associated with human behaviors at unprecedented breadth, depth, and scale. Large-scale digital traces provide the potential to understand population behaviors automatically, including the characterization of how individuals interact with the physical environment. As a result, the use of digital traces generated by humans might mitigate some of the challenges associated to the use of surveys to understand human behaviors such as, high cost in collecting information, lack of quality real-time information, and hard to capture behavioral level information. In this dissertation, I study how to extract information from digital traces to characterize human behavior in the built environment; and how to use such information to enhance decision-making processes in the area of Smart and Connected Communities.
Specifically, I present three case studies that aim at using data-driven methods for decision-making in Smart and Connected Communities. First, I discuss data-driven methods for socioeconomic development with a focus on inference of socioeconomic maps with cell phone data. Second, I present data-driven methods for emergency preparedness and response, with a focus on understanding user needs in different communities with geotagged social media data. Third, I describe data-driven methods for migration studies, focusing on characterizing the post-migration behaviors of internal migrants with cell phone data. In these case studies, I present data-driven frameworks that integrate innovative behavior modeling approaches to help solve decision-making questions using digital traces. The explored methods enhance our understanding of how to model and explain population behavior patterns in different physical and socioeconomic contexts. The methods also have practical significance in terms of how decision-making can become cost-effective and efficient with the help of data-driven methods
Revealing the Disciplinary Landscape of Data Science Journals
The discipline, field, and practice of data science emerged to its current prominence in
the past several decades. New disciplines, fields, and practices often involve definitional and
scope challenges. This seems to be the case with data science. The research presented in this
poster is part of a broader investigation into the disciplinary or interdisciplinary characteristics of
data science. This work-in-progress poster reports the results of analyses of data science journals
in different subject areas to answer several questions including:
• What is the population of journals that focus on topics of data science?
• What disciplinary landscape of data science is revealed in the aims and scope statements
of these journals?
The unit of analysis in this research is at the journal level. Both quantitative and
qualitative approaches were used in the analysis of the aim and scope statements. The
quantitative approach used computational methods (e.g., Part-of-Speech Tagging, Word
Embedding) to identify keywords representing characteristics of the journal. The qualitative
approach used conceptual content analysis to reveal different patterns in terms of research types
and the scope of research of the journals.
Data science research and education are part of many library and information science
degree programs. The results of this research have the following benefits:
• Researchers can understand disciplinary and research types published in the journals
when selecting a venue for submitting papers.
• Educators and students can identify appropriate journal resources to support learning.
• Librarians can use the results to assess collection development decisions regarding data
science journals
Recommended from our members
Defining Data Literacy: An Empirical Study of Data Literacy Dimension
Poster on an analysis of publications from 2002-2021 on data literacy to identify relevant topics and trends. This is a part of preliminary work done to support a proposal for an Institute of Museum and Library Services (IMLS) grant. It was presented at the 2021 Association for Library and Information Science Education (ALISE) Annual Conference held virtually September 20-24, 2021
Hate Speech and Counter Speech Detection: Conversational Context Does Matter
Hate speech is plaguing the cyberspace along with user-generated content.
This paper investigates the role of conversational context in the annotation
and detection of online hate and counter speech, where context is defined as
the preceding comment in a conversation thread. We created a context-aware
dataset for a 3-way classification task on Reddit comments: hate speech,
counter speech, or neutral. Our analyses indicate that context is critical to
identify hate and counter speech: human judgments change for most comments
depending on whether we show annotators the context. A linguistic analysis
draws insights into the language people use to express hate and counter speech.
Experimental results show that neural networks obtain significantly better
results if context is taken into account. We also present qualitative error
analyses shedding light into (a) when and why context is beneficial and (b) the
remaining errors made by our best model when context is taken into account.Comment: Accepted by NAACL 202
Taxonomy of the genus Metolinus Cameron (Coleoptera, Staphylinidae, Staphylininae, Xantholinini) from China with description of three new species
This paper studies the taxonomy of the genus Metolinus Cameron, 1920 (Coleoptera: Staphylinidae, Staphylininae, Xantholinini) from China and describes three new species: Metolinus xizangensis sp. n. from Xizang (Tibet), M. emarginatus sp. n. from Sichuan, and M. binarius sp. n. from Yunnan. The Chinese fauna of the genus is thus increased to 8 species in total. A key to eight Chinese species is provided. Female genital segments and other important morphological characters are illustrated in line drawings for the new species as well as M. shanicus Bordoni, 2002 and M. gardneri (Cameron, 1945). The text also provides color plates with habitus photographs and a map to show the species’ geographical distribution pattern. The type specimens of the new species are deposited in Institute of Zoology, the Chinese Academy of Sciences (IZ-CAS)
Two new species of Xanthophius Motschulsky (Coleoptera: Staphylinidae, Staphylininae, Xantholinini) from China with notes on X. filum (Kraatz)
Zhou, Yu-Lingzi, Zhou, Hong-Zhang (2013): Two new species of Xanthophius Motschulsky (Coleoptera: Staphylinidae, Staphylininae, Xantholinini) from China with notes on X. filum (Kraatz). Zootaxa 3626 (3): 363-380, DOI: 10.11646/zootaxa.3626.3.
Topic Models to Infer Socio-Economic Maps
Socio-economic maps contain important information regarding the population of a country. Computing these maps is critical given that policy makers often times make important decisions based upon such information. However, the compilation of socio-economic maps requires extensive resources and becomes highly expensive. On the other hand, the ubiquitous presence of cell phones, is generating large amounts of spatiotemporal data that can reveal human behavioral traits related to specific socio-economic characteristics. Traditional inference approaches have taken advantage of these datasets to infer regional socio-economic characteristics. In this paper, we propose a novel approach whereby topic models are used to infer socio-economic levels from large-scale spatio-temporal data. Instead of using a pre-determined set of features, we use latent Dirichlet Allocation (LDA) to extract latent recurring patterns of co-occurring behaviors across regions, which are then used in the prediction of socio-economic levels. We show that our approach improves state of the art prediction results by 9%