242,405 research outputs found
Visualizing Social Science Research in an Institutional Repository
Using text mining and visualization techniques to identify the topical coverage of text corpora is increasingly common in a number of disciplines. When these approaches are applied to the titles and abstracts of articles published in an academic journal, it yields insight into the evolution of scholarly content in the journal. Similarly, text mining and visualization can reveal the topical coverage of items archived in an institutional repository. This poster will present initial results from mining the text and visualizing the abstracts of social science research in one university’s institutional repository. Generating a topic map visually demonstrates how research in a repository clusters around specific domains in the social sciences. These topic maps are potentially useful to librarians and researchers seeking to learn more about the topical coverage of items in their repository and determine if the research is reflective of the scholarly output from an institution more broadly
What Lies Beneath: Treatment of Canvas-backed Pennsylvania Coal Mining Maps for Digitization
An ongoing program to preserve approximately seven hundred oversized, canvas-backed, coal mining maps from the CONSOL Energy Mining Map Collection was initiated by the University of Pittsburgh (Pitt) in 2007, supported by funding from the United States Department of the Interior Office of Surface Mining and Reclamation (OSM) and the Pennsylvania Department of Environmental Protection (PA-DEP). The main goal of this project is to stabilize and clean the mining maps for digitization at the OSM National Mine Map Repository (NMMR) located in Pittsburgh, Pennsylvania. The digitized data of the underground mines will be incorporated into Geographical Information Systems relative to mine safety, land reclamation, current mining operations, and new development
Ontology models research and development for data mining repository
Currently, there are a lot of data mining, as well as a large number of data sets stored in different repositories. A significant problem is the lack of analysis methods themselves in the repositories, in fact there is no connection to a specific data set method for the respective data sets. But in this work we offer the implementation of the system and model research ontology and development for our Data mining repositoryВ даний час існує багато інтелектуального аналізу даних, а також велика кількість наборів даних, що зберігаються в різних сховищах. Суттєвою проблемою є відсутність самих методів аналізу в репозиторіях, більш того немає зв'язку наборів даних з конкретним методом для відповідних наборів даних. Але в цій роботі ми представляємо структуру системи і онтологічні моделі для нашого сховища дани
Recommended from our members
Extracting and re-using research data from chemistry e-theses: the SPECTRa-T project
Scientific e-theses are data-rich resources, but much of the information they contain is not readily accessible. For chemistry, the SPECTRa-T project has addressed this problem by developing data-mining techniques to extract experimental data, creating RDF (Resource Description Framework) triples for exposure to sophisticated Semantic Web searches.
We used OSCAR3, an Open Source chemistry text-mining tool, to parse and extract data from theses in PDF, and from theses in Office Open XML document format.
Theses in PDF suffered data corruption and a loss of formatting that prevented the identification of chemical objects. Theses in .docx yielded semantically rich SciXML that enabled the additional extraction of associated data. Chemical objects were placed in a data repository, and RDF triples deposited in a triplestore.
Data-mining from chemistry e-theses is both desirable and feasible; but the use of PDF, the de facto format standard for deposit in most repositories, prevents the optimal extraction of data for semantic querying. In order to facilitate this, we recommend that universities also require deposition of chemistry e-theses in an XML document format. Further work is required to clarify the complex IPR issues and ensure that they do not become an unwarranted barrier to data extraction and re-use
- …
