Isaac Newton Corpus
This project combines the expertise of the British National Archives, King's College Cambridge, the Cambridge University Library, the Ecole des chartes (Sorbonne), the Huntington Library, the Science History Institute, Peterhouse College Cambridge, and Indiana University in an innovative attempt to devise new methods of digital watermark capture and analysis by means of machine learning, using Isaac Newton's extensive manuscript corpus as a test platform.
This project will investigate two research areas with general application in digital humanities scholarship, using the dispersed manuscript corpus of Isaac Newton as a test case. The immediate purpose of the test case will be to use the AI-assisted identification and classification of watermarks in Newton material as a tool to assist with the organisation and dating of manuscripts, but the project also has much wider significance. The project’s first stage will be the methodological investigation of techniques for the production of images of watermarks which are suitable for automated analysis, using both new photography and the exploration of the potential latent in existing images. During the second stage, we will develop computer vision methods to systematically cluster and match the assembled corpus of watermark images across manuscripts and collections. Methods developed through this project will be transferrable to watermark collections beyond that of Newton’s corpus, creating a methodology for scholars seeking to analyse, date, and organise historical collections via watermark matching, and for conservators seeking to establish standardised surveying and documentation methods while imaging and digitising watermarked documents. The third stage of the project will disseminate our results.
Since the groundbreaking early twentieth-century research of Charles Moïse Briquet, watermarks have formed a central part in the dating of otherwise undated manuscripts. Briquet’s monumental 1907 catalogue, Les filigranes, made it possible, in principle, to date (and to some extent localise) pre-1600 watermarks found by researchers in manuscripts by reference to exemplars in Briquet’s catalogue. While this catalogue and others have been digitised thanks to the Bernstein consortium (https://memoryofpaper.eu/), advances in research and technology have revealed the limitations of the traditional approach, which requires time-consuming procedures and some degree of expertise for the identification of each single watermark. It is very difficult to find exact matches between watermarks in situ and those reproduced in any catalogue, first due to the limited comprehensiveness of the catalogues, and, second, because each individual watermark is produced in two “twin” versions, never perfectly identical, and suffers deformation over time as a result of repeated use in the paper manufacturing process. By developing and enhancing new approaches and techniques to improve the acquisition and analysis of watermarks, we hope to solve basic problems and thereby provide benefit to all who must rely upon paper documents for chronological evidence.
While computer vision has made significant progress in recent years thanks to machine learning and artificial intelligence, this project will build on cutting-edge work already undertaken by the Ecole Nationale des Chartes and its partners (notably the computer scientists at École des Ponts ParisTech) to investigate the problem of matching images, specifically of watermarks, across formats (photographs and tracings). In creating a corpus of images used to train and develop the open source software created by the Ecole Nationale des Chartes we will build on recent work by The National Archives (TNA) to use comparatively affordable equipment and techniques to produce images of watermarks that are highly suitable for machine analysis. The project will develop and apply both of these approaches in order to attempt to enhance the CV software so that it may be able to unlock the latent information held in thousands of existing images shot in reflected light which institutions have already digitised and made accessible through IIIF.
Informing our research will be the extensive notes taken by Alan Shapiro over years of working with Newtonian watermarks; these notes formed the basis for Shapiro’s ground- breaking 1992 article “Beyond the Dating Game: Watermark Clusters and the Composition of Newton’s Opticks.” The dataset provided by Shapiro’s work will inform the selection of images chosen to train the CV algorithm, whose application will then be scaled up across a broader sample of the Newton manuscript corpus.