6,644 research outputs found
It's Public Knowledge: The National Digital Archive of Datasets
This article describes the history and development of the National Digital
Archive of Datasets, a service run by the University of London Computer Centre for
the National Archives of England. It discusses the project in light of the context in
which it emerged in the 1990s, its departure in approach from traditional data archives,
and the range of archival functions. Finally, it offers reflections on the project as whole.
Cet article décrit l’histoire et le développement du National Digital Archive
of Datasets, un service offert par le centre informatique de l’Université de Londres
pour les Archives nationales de l’Angleterre. L’auteure présente le contexte dans lequel
le projet a émergé dans les années 1990, son approche qui diffère de celle des archives
de données informatiques traditionnelles, ainsi que la gamme de ses fonctions archivistiques.
Finalement, elle offre des réflexions sur le projet dans son ensemble
Harvesting Entities from the Web Using Unique Identifiers -- IBEX
In this paper we study the prevalence of unique entity identifiers on the
Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs
(for documents), email addresses, and others. We show how these identifiers can
be harvested systematically from Web pages, and how they can be associated with
human-readable names for the entities at large scale.
Starting with a simple extraction of identifiers and names from Web pages, we
show how we can use the properties of unique identifiers to filter out noise
and clean up the extraction result on the entire corpus. The end result is a
database of millions of uniquely identified entities of different types, with
an accuracy of 73--96% and a very high coverage compared to existing knowledge
bases. We use this database to compute novel statistics on the presence of
products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A.
Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting
Entities from the Web Using Unique Identifiers. WebDB workshop, 201
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
Exploring The Nature Of The Co-emergence Of Students’ Representational Fluency And Functional Thinking
Abstract In this dissertation, I explore ways to support secondary school students’ meaningful understanding of quadratic functions. Specifically, I investigate how students co-developed representational fluency (RF) and functional thinking (FT), when they gained meaningful understanding of quadratic functions. I also characterize students’ co-emergence of RF and FT on each representation (e.g., a graph, a symbolic equation, and a table) and across multiple representations. To accomplish these goals, I employed a design research methodology: a teaching experiment with eight Turkish-American secondary school students in an after-school context at a Turkish Community Center. I constructed the design principles and design elements for the study by networking two distinct domains of literature—representations and quantitative reasoning—to support students’ meaningful learning. I conducted ongoing and retrospective analyses on the enhanced transcriptions of small- and whole-group interactions. The analyses revealed a learning-ecology framework that supported secondary school students’ meaningful understanding of quadratic functions. The learning-ecology framework consisted of three components: enacted task characteristics, teacher pedagogical moves, and socio-mathematical norms. Furthermore, the findings showed that students employed two types of reasoning when they created and connected representations of quantities and the relationships between them: static thinking and lateral thinking. Static thinking is recalling a learned fact to represent a quantitative relationship with no attention to how quantities covary on a representation, while lateral thinking is a creative way of thinking wherein students conceive of concrete representations of functions as an emergent quantitative relationship. The findings also showed that students’ co-emergence of RF and FT can be operationalized into four levels starting from lesser sophisticated reasoning to greater sophisticated reasoning. Level 0 is a disconnection, level 1 is a partial connection, level 2 is a connection and level 3 is flexible a connection between students’ RF and FT. The dissertation informs teachers and the mathematics education community by (a) reporting and verifying the learning-ecology framework that supported students’ meaningful understanding of quadratic functions; and (b) characterizing students’ co-emergence of RF and FT within and across multiple representations
Exploring the Nature of the Co-emergence of Students\u27 Representational Fluency and Functional Thinking
Abstract In this dissertation, I explore ways to support secondary school students\u27 meaningful understanding of quadratic functions. Specifically, I investigate how students co-developed representational fluency (RF) and functional thinking (FT), when they gained meaningful understanding of quadratic functions. I also characterize students\u27 co-emergence of RF and FT on each representation (e.g., a graph, a symbolic equation, and a table) and across multiple representations. To accomplish these goals, I employed a design research methodology: a teaching experiment with eight Turkish-American secondary school students in an after-school context at a Turkish Community Center. I constructed the design principles and design elements for the study by networking two distinct domains of literature—representations and quantitative reasoning—to support students\u27 meaningful learning. I conducted ongoing and retrospective analyses on the enhanced transcriptions of small- and whole-group interactions. The analyses revealed a learning-ecology framework that supported secondary school students\u27 meaningful understanding of quadratic functions. The learning-ecology framework consisted of three components: enacted task characteristics, teacher pedagogical moves, and socio-mathematical norms. Furthermore, the findings showed that students employed two types of reasoning when they created and connected representations of quantities and the relationships between them: static thinking and lateral thinking. Static thinking is recalling a learned fact to represent a quantitative relationship with no attention to how quantities covary on a representation, while lateral thinking is a creative way of thinking wherein students conceive of concrete representations of functions as an emergent quantitative relationship. The findings also showed that students\u27 co-emergence of RF and FT can be operationalized into four levels starting from lesser sophisticated reasoning to greater sophisticated reasoning. Level 0 is a disconnection, level 1 is a partial connection, level 2 is a connection and level 3 is flexible a connection between students\u27 RF and FT. The dissertation informs teachers and the mathematics education community by (a) reporting and verifying the learning-ecology framework that supported students\u27 meaningful understanding of quadratic functions; and (b) characterizing students\u27 co-emergence of RF and FT within and across multiple representations
Inferring Tabular Analysis Metadata by Infusing Distribution and Knowledge Information
Many data analysis tasks heavily rely on a deep understanding of tables
(multi-dimensional data). Across the tasks, there exist comonly used metadata
attributes of table fields / columns. In this paper, we identify four such
analysis metadata: Measure/dimension dichotomy, common field roles, semantic
field type, and default aggregation function. While those metadata face
challenges of insufficient supervision signals, utilizing existing knowledge
and understanding distribution. To inference these metadata for a raw table, we
propose our multi-tasking Metadata model which fuses field distribution and
knowledge graph information into pre-trained tabular models. For model training
and evaluation, we collect a large corpus (~582k tables from private
spreadsheet and public tabular datasets) of analysis metadata by using diverse
smart supervisions from downstream tasks. Our best model has accuracy = 98%,
hit rate at top-1 > 67%, accuracy > 80%, and accuracy = 88% for the four
analysis metadata inference tasks, respectively. It outperforms a series of
baselines that are based on rules, traditional machine learning methods, and
pre-trained tabular models. Analysis metadata models are deployed in a popular
data analysis product, helping downstream intelligent features such as insights
mining, chart / pivot table recommendation, and natural language QA...Comment: 13pages, 7 figures, 9 table
- …