Search CORE

10 research outputs found

Something borrowed: sequence alignment and the identification of similar passages in large text collections

Author: Horton Russell
Olsen Mark
Roe Glenn
Publication venue: Société canadienne des humanités numériques
Publication date: 08/12/2015
Field of study

The following article describes a simple technique to identify lexically-similar passages in large collections of text using sequence alignment algorithms. Primarily used in the field of bioinformatics to identify similar segments of DNA in genome research, sequence alignment has also been employed in many other domains, from plagiarism detection to image processing. While we have applied this approach to a wide variety of diverse text collections, we will focus our discussion here on the identification of similar passages in the famous 18th-century Encyclopédie of Denis Diderot and Jean d'Alembert. Reference works, such as encyclopedias and dictionaries, are generally expected to "reuse" or "borrow" passages from many sources and Diderot and d'Alembert's Encyclopédie was no exception. Drawn from an immense variety of source material, both French and non-French, many, if not most, of the borrowings that occur in the Encyclopédie are not sufficiently identified (according to our standards of modern citation), or are only partially acknowledged in passing. The systematic identification of recycled passages can thus offer us a clear indication of the sources the philosophes were exploiting as well as the extent to which the intertextual relations that accompanied its composition and subsequent reception can be explored. In the end,we hope this approach to "Encyclopedic intertextuality" using sequence alignment can broaden the discussion concerning the relationship of Enlightenment thought to previous intellectual traditions as well as its reuse in the centuries that followed

The Australian National University

Supervised semantic relation mining from linguistically noisy text documents

Author: Basili R
Giannone C
Moschitti A
Naggar P
Publication venue: Springer Verlag
Publication date: 16/11/2010
Field of study

Crossref

ART

Knowledge-based Content Linking for Online Textbooks

Author: Daqing He
Peter Brusilovsky
Rui Meng
Shuguang Han
Yun Huang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Although the volume of online educational resources has dramatically increased in recent years, many of these resources are isolated and distributed in diverse websites and databases. This hinders the discovery and overall usage of online educational resources. By using linking between related subsections of online textbooks as a testbed, this paper explores multiple knowledge-based content linking algorithms for connecting online educational resources. We focus on examining semantic-based methods for identifying important knowledge components in textbooks and their usefulness in linking book subsections. To overcome the data sparsity in representing textbook content, we evaluated the utility of external corpuses, such as more textbooks or other online educational resources in the same domain. Our results show that semantic modeling can be integrated with a term-based approach for additional performance improvement, and that using extra textbooks significantly benefits semantic modeling. Similar results are obtained when we applied the same approach to other domains

D-Scholarship@Pitt

Cyberinfrastructure for Classical Philology

Author: Crane G
Seales B
Terras M
Publication venue: Alliance of Digital Humanities Organisations
Publication date: 01/01/2009
Field of study

No humanists have moved more aggressively in the digital world than students of the Greco-Roman world but the first generation of digital classics has seen relatively superficial methods to address the problems of print culture. We are now beginning to see new intellectual practices for which new terms, eWissenschaft and eClassics, and a new cyberinfrastructure are emerging

Crossref

UCL Discovery

Efficient partial-duplicate detection based on sequence matching

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Crossref

Finding and exploring memes in social media

Author: Hohyon Ryu
Matthew Lease
Nicholas Woodward
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Critical literacy challenges us to question how what we read has been shaped by external context, especially when infor-mation comes from less established sources. While cross-checking multiple sources provides a foundation for critical literacy, trying to keep pace the constant deluge of new on-line information is a daunting proposition, especially for ca-sual readers. To help address this challenge, we propose a new form of technological assistance which automatically discovers and displays underlyingmemes: ideas embodied by similar phrases which are found in multiple sources. Once detected, these underlying memes are revealed to users via generated hypertext, allowing memes to be explored in con-text. Given the massive volume of online information today, we propose a highly-scalable system architecture based on MapReduce, extending work by Kolak and Schilit [11]. To validate our approach, we report on using our system to pro-cess and browse a 1.5 TB collection of crawled social media. Our contributions include a novel technological approach to support critical literacy and a highly-scalable system archi-tecture for meme discovery optimized for Hadoop [25]. Our source code and Meme Browser are both available online

CiteSeerX

Crossref

The Digital Classicist 2013

Author
Publication venue: 'School of Advanced Study'
Publication date: 31/12/2019
Field of study

This edited volume collects together peer-reviewed papers that initially emanated from presentations at Digital Classicist seminars and conference panels. This wide-ranging volume showcases exemplary applications of digital scholarship to the ancient world and critically examines the many challenges and opportunities afforded by such research. The chapters included here demonstrate innovative approaches that drive forward the research interests of both humanists and technologists while showing that rigorous scholarship is as central to digital research as it is to mainstream classical studies. As with the earlier Digital Classicist publications, our aim is not to give a broad overview of the field of digital classics; rather, we present here a snapshot of some of the varied research of our members in order to engage with and contribute to the development of scholarship both in the fields of classical antiquity and Digital Humanities more broadly

SAS-SPACE

The Digital Classicist 2013

Author
Publication venue: 'School of Advanced Study'
Publication date: 10/02/2021
Field of study

Directory of Open Access Books (DOAB)

Recommended from our members

Linking Textual Resources to Support Information Discovery

Author: Knoth Petr
Publication venue
Publication date: 14/05/2015
Field of study

A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of. In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery. This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge

Open Research Online (The Open University)

Generating links by mining quotations

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Crossref