Strategies for the Development of Databases

Roberto de Andrade Martins
Group of History and Theory of Science, Unicamp, Brazil

    The Division of History of Science of the International Union of History and Philosophy of Science is currently attempting to produce new research tools for historians of science See additional information. The present document presents a general overview of databases on history of science, together with suggestions for the improvement of existing databases and for the creation of new ones.
  1. Databases on secondary sources
  2. Databases on primary sources
     Library catalogues
     Scientific databases
     General (non scientific) databases
     National history of science databases
     Cartography databases
     Scientific instruments databases
     Scientific archives
     Manuscript databases
1. Scientific papers database

Preamble: Sources for the research of history of science

    History of science researchers make use of several different types of documents, that can be divided into primary, secondary and tertiary sources of information See additional information.

    The main primary sources used by historians of science are:

1a Primary sources: Textual material
1.1 Published works (books, theses, articles, pamphlets, letters, etc.) produced by the scientists who are the subject of research
1.2 Unpublished writings (manuscripts of scientific works, laboratory notebooks, diaries, letters, projects, autobiographies, etc.) produced by the scientists who are the subject of research
1.3 Interviews (on tape, film or paper records) with the scientists who are the subject of research
1.4 Other relevant textual material produced during the period studied by the historian

1b Primary sources: Non textual material
1.4 Scientific instruments produced during the period studied by the historian
1.5 Iconographic material (maps, photographs, drawings, graphs, etc.) produced by scientists
1.6 Collections of objects studied by the scientists

    Besides the primary sources, historians of science use several secondary sources, such as:

2 Secondary sources
2.1 Historiographic production (books, theses and papers on history of science)
2.2 Biographies
2.3 Contextual sources (containing information about the historical and social context of science)
2.4 Works on related fields: sociology of science, psychology of science, philosophy of science, etc.

    Translations and reproductions of primary sources are sometimes classified as primary sources, but sometimes they are classified as secondary sources. Perhaps a new name should be ascribed to those documents (for instance: quasi primary sources, or derived primary sources).

    To find both types primary and secondary sources, historians use tertiary sources of information:

3 Tertiary sources
3.1 Bibliographies, library catalogues, databases and other similar sources of information

    Of course, tertiary sources may include both primary and secondary sources, or they may be specific, including only primary or only secondary sources.

    In this document we are going to discuss only the creation and improvement of digital tertiary sources of information. Therefore, the discussion of digital collections of quasi primary sources (e-book collections) is outside the scope of this document.

    It is necessary to stress that, in all cases, the utility of databases depend on both their content and the searching capability. A huge database, containing information about millions of books, could be of little use to researchers, if the searching possibilities are very limited.

Available resources

1. Databases on secondary sources

    The best current general bibliography on secondary sources of history of science is the History of Science, Technology, and Medicine Database:

    This database, originally created by the History of Science Society (HSS), using the Internet resources of the Research Libraries Group (RLG), includes information provided by:

    The joint History of Science, Technology, and Medicine Database is growing at a rate of about 18,000 references per year (2002).

    Historians of science are eagerly expecting to see all old paper bibliographies on history of science, medicine and technology to become digital and to be integrated into the History of Science, Technology, and Medicine Database.

    The use of this database, at, requires an username and password. The database can be freely used by members of the co-operating societies, or by researchers working at institutions that subscribe to the databases of the Research Libraries Group (RLG).

    There are national bibliographies on history of science secondary sources, developed in Spain, Australia, Brazil, and other countries, that are not included in the History of Science, Technology, and Medicine Database:

    There are also a few thematic databases, such as the History of Medicine database HistLine of the National Library of Medicine

    Those bibliographies of secondary sources are invaluable tools for the historian of science, but of course they are not complete. The main limitations are:

    It is possible to improve the coverage of those databases, both through an effort of their editors, and by the parallel development of thematic and national bibliographies on history of science. All those databases should be integrated into a single one, and it is desirable to allow all researchers to have free access to the complete database.

    There is a function that the old paper history of science bibliographies could fulfill, and that the modern databases do not. Consider, for instance, some well known thematic bibligraphies, See additional information such as:

* BRUSH, Stephen G. & BELLONI, Lanfranco. The history of modern physics: An international bibliography. New York: Garland, 1983.
* OVERMIER, Judith A. The history of biology: A selected, annotated bibliography. New York: Garland, 1989.

    Those bibliographies contained a careful selection of the most important secondary sources, with comments on their scope, worth and limitations. They did not attempt to be "complete", but the structure and selection of the information they provided was invaluable, especially for beginning researchers. Electronic (and updated) versions of those works should be available, because they cannot be replaced by the all-inclusive databases.

2. Databases on primary sources

    There is a very useful listing of over 5000 websites describing holdings of manuscripts, archives, rare books, historical photographs, and other primary sources for the research scholar, that could be the starting point for further research:

    Historians of science would like to be able to find information about primary sources, using databases encompassing all kinds of documents, covering all time periods, all nations, and all subjects. Ideally, one should be able to search by

    Truncated terms should be allowed, of course. Any combination of those criteria (using boolean operators) should also be allowed by the database. Therefore, without entering any author and title data, a historian should be able to find books published in France, in Latin, in the decade of 1780, on astronomy or astrology (or containing the truncated keyword astro*).

    Unfortunately, this ideal international database does not exist. Nowadays, a historian of science can find relevant information on primary sources using a few instruments that (in general) do not allow him/her to make all the above described kinds of search.

Library catalogues

    Nowadays, there are many library catalogues online. There are also several databases including many different libraries in a single country, or world-wide. The largest one seems to be the OCLC WorldCat, which  houses over 48 million bibliographic records (April 2003). Unfortunately, this is a commercial product that is not available to all researchers.

    Of course, library catalogues can be used to search for books (and, sometimes, manuscripts, maps and other materials) and periodicals, but not single articles belonging to periodicals.

    In principle, if one could make a search using many libraries world-wide, he would be able to find most primary sources in book format. However, most library catalogues do not allow researchers to look for books by criteria such as the ones pointed out above. The usual search possibilities are author, title, and subject – however, in some library databases, it is not possible to use broad subject searches (such as "medicine"), truncated terms and boolean operators are sometimes prohibited, and the search for books by language, place and year of publication is never possible.

    It seems possible to develop a searching tool (similar to the BookWhere software) that could make a simultaneous search in many different online library catalogues (using the Z39.50 protocol) and allowing the use of more sophisticated search strategies. See additional information If this tool were developed and freely distributed to researchers, this would be an invaluable resource for historians of science.

Scientific databases

    In the case of historians of science whose research subject is contemporary science, there are many bibliographic instruments developed and used by scientists that can be used to find out contemporary scientific published works. Some of them are multidisciplinary (such as the Science Citation Index, together with the Social Sciences Citation Index and the Humanities Index). Other instruments are disciplinary (for instance, the Mathematical Reviews, the Chemical Abstracts, etc.). Those databases have, however, several limitations:

    Some freely accessible scientific databases are listed here:

    Scientists usually only need information concerning works published in the recent past. It is well known that the probability that a paper be cited in current scientific works decreases exponentially with its age (except in the case of the "classics"). For most scientists doing research, 30 years is a very far past. Hence, it is not likely that the scientific societies will develop retrospective databases including the whole of the 20th century and preceding centuries. There are, however, some exceptions. Astronomers do usually need information about old observations, and for that reason they have developed a database including information about old astronomical works, the Astronomical Bibliography, produced by the Astronomisches Rechen-Institut at Heidelberg (ARIBIB):

    Besides recent bibliographic data, this database includes information from the two best old astronomical bibliographies:

    Another important project is the Jahrbuch Project: Electronic Research Archive for Mathematics (ERAM): . This project created a database of mathematical works, using for the 19th century articles the oldest mathematical periodicals index, the Jahrbuch über die Fortschritte der Mathematik (which was founded in 1868).

    If all scientific disciplines had the same need for old bibliographical information as astronomy, there would be soon on the Web very valuable databases on primary printed sources of all centuries, and this would greatly benefit historians of science. However, even in this wonderful situation, there might be severe limitations concerning content (scientific output of "developing countries" / "third world") and search possibilities.

    There are also some commercial databases that include information about both recent and old articles, such as PCI: Periodicals Contents Index, by ProQuest/Chadwyck-Healey.
    PCI currently indexes the articles in over 4,000 periodicals in the humanities and social sciences from their first issues to 1995. The focus is on 20th century periodicals.  However, periodicals in the 20th century that extend back into the 18th and 19th centuries are indexed from their earliest volumes (some right back to 1770). Its scope is international.

    Another database including information about old periodicals (not necessarily scientific ones) is the Index to Early American Periodicals:
    The three "Indexes to Early American Periodicals" cover three separate time periods from 1741 to 1935 and are thought to include all known periodical publications that had their inception and ending during this time period.

    Another comercial project is Poole's Plus, or 19th Century Masterfile. This is a digital version (with some additional indexes) of W. F. Poole's Index to Periodical Literature, covering the period 1802-1906. It was not a scientific periodicals index, but a cultural and literary index. However, it does describe some relevant scientific periodicals. It indexes 12,241 volumes of 479 periodicals and contains more than 400,000 citations.

General (non scientific) databases

    There are some general bibliographic databases online, such as the Incunables Database, and the Spanish Printed Books database. They are useful, but have the same limitations described above in the case of other databases.

* Illustrated Incunabula Short Title Catalogue on CD-ROM:
    Published by Primary Source Microfilm, in association with the British Library. The CD-ROM contains bibliographical records combined with images of keypages, such as title-page, start of text, end of text, colophon. The bibliographical records are from the Incunabula Short-Title Catalogue, the database of the British Library. Over 4,000 editions of the 28,000 records are now illustrated by 20,000 images.

* English Short Title Catalogue Project:
    The 'English Short Title Catalogue' (ESTC) is an international project established at the British Library in 1977. Its aim is to create a machine-readable bibliography of books, serials, pamphlets and other ephemeral material printed in English-speaking countries from 1473 to 1800, based on the collections of over 1,600 institutions world-wide. Thre is no free access.

* Catálogo Colectivo del Patrimonio Bibliográfico Español:
    This is a Spanish national project. In January 2003, this database contained 560,000 records, describing books published from the 15th to the 20th century, and information about 1,250,000 copies of those books, in 600 libraries.

* North American Imprint Program (NAIP), American Antiquarian Society:
    The North American Imprints Program (NAIP) has as its long-term goal the creation of a highly detailed and sophisticated machine-readable catalog of all books, pamphlets, and broadsides printed through the year 1876 in what are now the United States and Canada. There are now 40,000 records descriptive of 17th- and 18th- century imprints, and records the locations of more than 120,000 extant copies.

National history of science databases

    There is a database on primary scientific Australian sources (in Physics, only), up to 1945, that can be consulted online:

    A database on primary scientific Portuguese and Brazilian sources (all subjects), up to 1900, is being developed. Only a small sample is available online. The Lusodat project can be consulted here:

    It might be useful to develop joint projects in the case of countries having a common cultural basis, such as Spain and the Spanish-speaking American countries. See the Iberodata proposal:

Cartography databases

    There is no international database for old maps. The history of cartography is, of course, a well developed discipline, and there are many userful sites on the Internet, but an international co-operative project is still a desideratum.

    A round-up of national and international digital projects concerning early mapping can be found here:

Scientific instruments databases

    There is an outstanding international database for scientific instruments: the Online Register of Scientific Instruments.

    The Online Register of Scientific Instruments is an international database of historic scientific instruments and related objects available via the Internet. It is developed and supported by the Museum of the History of Science in Oxford in association with the Scientific Instrument Commission of the International Union of the History and Philosophy of Science.

    Many institutions have already co-operated with this outstanding project, sending the information about their scientific instruments. Several important scientific museums have not yet adhered to this initiative, but it seems that as the coverage of the Online Register of Scientific Instruments increases, it will be a remarkable instrument for historians of science.

    This database does not include technology and medical instruments, and there is no similar database in those fields.

    See also the Epact project: Scientific Instruments of Medieval and Renaissance Europe

    One should also consult this site, where the issue of museum collection description is addressed:

Scientific archives

    In the United Kingdom, there is a wide archival project concerning contemporary scientific archives: National Cataloguing Unit for the Archives of Contemporary Scientists,

    The University of Bath also houses the Cooperation on Archives of Science in Europe project:

    The Center for the History of Physics, of the American Institute of Physics, develops the International Catalog of Sources for History of Physics & Allied Fields (ICOS). The catalog can be freely accessed:

    There are also several general archival projects (that is, not specifically scientific archives), such as Archon (British National Register of Archives) and EAN (European Archival Network):

Archon -

EAN: European Archival Network -

Manuscript databases

    Manuscripts databases (in the sense of individual items, such as medieval manuscript books) have deserved a lot of attention in recent times. The catalogues of some very important manuscript collections are available online, such as the  On-line Catalogues of Western Manuscripts, Bodleyan Library, Oxford (, and the British Library Manuscripts Catalogue (

    In the United States of America, the Library of Congress developed a free-of-charge cooperative cataloging program, the National Union Catalog of Manuscript Collections (NUCMC) There are two versions that can be consulted online:

    MASTER is a European Union funded project to create a single on-line catalogue of medieval manuscripts in European libraries. The general description of the project and a reference manual for the Master document type definition can be found here:

    MALVINE, Manuscripts and Letters via Integrated Networks in Europe opens new and enhanced access to disparate holdings of modern manuscripts and letters, kept and catalogued in European libraries, archives, documentation centres and museums.

    Unfortunately, there are several competing projects on manuscript databases, instead of a co-operative international project. See, for instance:

    Committee on Cataloging: Description and Access. Report of the Task Force on the Review of the Draft: Descriptive Cataloging of Ancient, Medieval, Renaissance, and Early-Modern Manuscripts (AMREMM)

    Electronic Access to Medieval Manuscripts (EAMMS) project, and links to online catalogs of manuscript collections and image files:

    There are also some very useful digital instruments that are not available online, such as In Principio – An Incipit Index of Latin Texts – 815,000 Incipits on CD-ROM:

    The available manuscript databases cannot be accessed in an unified way, and the searching methods are usually poor, for history of science requirements.


    The History of Science database was created using the information contained in the annual Isis Current Bibliographies.
   The Isis annual bibliographies have been published since 1913. The most recent bibliographies (since 1975) are now available in the History of Science Society (HSS) database. For the older period, it is possible to use the Cumulative bibliographies that have been published by the HSS:

