Strategies for the Development of Databases

History of Science, Medicine and Technology.
Bibliography of Primary Sources: Articles


    Periodical articles are one of the primary sources used by historians of science. It is well known that the first scientific journals were published in the 17th century, but they were not as important as books, as a form of communicating scientific results. The situation gradually changed, and in the 19th century scientific papers replaced books as the main form of presenting new scientific results.

    How can historians of science find relevant scientific articles for their research? In the case of contemporary science, they can use several scientific databases to find articles published in the last few decades, but there are no databases including information on older periodicals. If the historian of science is dealing with early 20th century science or any older period, he/she will have to use published bibliographies and indexes. See additional information

    Is it desirable to develop a database (or a set of databases) describing the old scientific papers? Is it possible, from the economical point of view?

    The output of scientific papers increases in time following an exponential law. The number of scientific papers published each year, today, far surpasses the whole production published up to the end of the 19th century. It would be easy (and cheap) to develop a database on scientific papers published in the 17th and 18th centuries. It is much more difficult and expensive to include in the database the papers published in the 19th century but, as shown below, this is not beyond the reach of the history of science community. However, attempting to produce a database encompassing the whole scientific production up to the period covered by existing scientific databases would be a far too expensive project, that cannot be undertaken by the history of science community.

    The following sections describe a possible strategy to develop a database of scientific papers published from the 17th to the 19th century.

Database of scientific periodicals

    As a preliminary step to the development of a database of scientific papers, it is necessary to produce a database on scientific journals. Journals should be described as a whole, both when they are individually important and when they contain relevant articles. Other serials are also included here (such as irregular publications, annual publications, etc.).

    The main bibliographies that can be used to produce the periodicals database are:

    Using this information, it is possible to produce the core database of scientific periodicals up to 1900. They will include the periodicals where most of the scientific papers of the 17th, 18th and 19th centuries were published. However, this database will be far from "complete".

    The core periodicals database can be complemented using information available from the main online library catalogues. Even after this is done, journals published outside the main European countries will not be fairly represented. During the 19th century, many short lived scientific periodicals appeared all over the world, and most of them are not described in the above cited bibliographies, and cannot be easily found. Some of them will be only available in the national libraries of the respective countries (or sometimes in other smaller libraries). An international effort will be needed to complement the core database of scientific journals, adding information from all countries.

Articles: 17th and 18th centuries

    Up to 1800, only about 100 scientific periodicals had appeared. Therefore, the creation of a database containing information about all papers published up to this time should not be a huge job.

    As a starting point, it would be possible to produce a digital version of Reuss' RepertoriumSee additional information describing articles published up to the end of the 18th century in the journals of scholarly societies. This database would be the initial core of the 17th and 18th centuries database, but it would be necessary to complement it, in several ways:

    In this period, the number of scientific journals published outside Europe seems negligible, and it will be quite easy to complement the information of the core database, in order to include those periodicals.

Articles: 19th century

    From 1800 to 1900, the total number of scientific periodicals (including those that had already disappeared by 1900) increased from 100 to about 10,000, according to Derek de Solla Price. Other more conservative estimates reckon about 5,000 periodicals in 1900. This huge increase of periodicals was, of course, accompanied by an explosion of scientific papers.

    The starting point for the production of a database of scientific papers published in the 19th century should be the Royal Society Catalogue of Scientific PapersSee additional information This works describes the content of a large number of scientific periodicals (about 3,000) published all over the world, and produced an index of about 800,000 papers. A digital version of this Catalogue would become the core database of scientific papers for the 19th century.

    One severe limitation of the Catalogue of Scientific Papers is the lack of a subject classification – except in the case of mathematics and physics. Therefore, to produce an useful database, it is necessary to ascribe a subject to each of the 800,000 papers, and this is not an easy job.

    The best approach seems to be ascribing subjects to each paper according to the title of the paper, the journal where it was published (if it is a specific journal) and the subject of other papers published by the same author. This work could be done by a team of people who are familiar with the several languages used in the Catalogue (English, French, German, Italian, etc.) and who are also familiar with the several scientific disciplines, in order to detect the subject of each paper. Of course, this strategy may produce many mistakes, but it is better to have a tentative subject classification than no classification at all.

    The cost of production of this core database is expected to be about US$250,000.00.

    After the completion of this core database of 19th century scientific papers, it would be necessary to complement the database, because of several limitations:

    The first limitation is a serious one. The output of medical and technical papers, during the 19th century, was probably similar to (an perhaps larger than) the output of scientific papers. If there existed a bibliography similar to the Catalogue of Scientific Papers including those medical and technical papers, the production of a digital version of that bibliography would cost as much as (and perhaps more than) the core database. However, since such a bibliography does not exist, it would be very difficult to produce a suitable complementation of the core database in a short time and at a reasonable cost.

    This limitation should not be used as an argument against the project, however. It is better to have a database of scientific papers without medicine and technology than nothing at all.

    The core database should be gradually complemented in two ways:

    There were several relevant bibliographies published in the late 19th and early 20th century that could contribute to the project. One good example is Houzeau and Lancaster's Bibliographie Générale de l'Astronomie. The authors used a lot of bibliographic instruments that were already available (such as Lalande's Bibliographie astronomique) and also the Catalogue of Scientific Papers, and complemented those sources with the direct analysis of many periodicals. Only half of the articles included in Houzeau and Lancaster's book can be found in the Catalogue of Scientific Papers. This shows both that Houzeau and Lancaster did a very careful work, and that the Catalogue is far from complete.

    Information about medicine can be found, for the early 19th century, in Callisen's monumental work:

    This lexicon provides international coverage of books and journal articles from approximately 1780 – 1840. The classification is by author, with citations listed chronologically. There is a recent reprint:     In the case of journals that have not been indexed by the Catalogue, an international effort will be needed to complement the core database, adding information from periodicals published in all countries. The national committees should check which relevant journals had not been included in the Catalogue, and next it will be necessary to find complete sets of those periodicals, to examine them and to produce complete indexes of their articles.

Articles: 20th century

    As has already been commented, the number of scientific papers produced in each year increases exponentially in time. If one were to include in the database the articles published from 1901 to 1920, this would double the size (and cost) of the project. It seems desirable to limit the project to the 17th, 18th and 19th centuries, leaving the 20th century to future projects.

Scientific articles and journals: what should be included?

    All scientific journals (including technology and medicine) of the 17th, 18th and 19th centuries, published in any country, in any language, should be included in the database. However, what counts as a "scientific journal" is somehow arbitrary. Should cultural periodicals be included? Maybe. There were many illustrated periodicals, in the late 19th century, that included scientific news among its subjects. Historians of science who study the diffusion of scientific ideas among the public would like to have information about those periodicals. Should newspapers be included? Maybe, both for the reason presented in the case of cultural periodicals, and because sometimes there were fierce discussions of scientific issues in the pages of newspapers. However, if one attempts to include all periodicals in the database, the project will become unfeasible. A practical compromise could be adopted: journals exclusively dedicated to the sciences (including technology and medicine), even in the case of popular science, should be included; journals that do not address scientific issues should not be included; borderline cases (periodicals where some scientific papers can be found) can be included if someone thinks they should be included and takes to himself the task of describing and indexing them.

    There is, however, another question: what counts as "scientific"? Should journals and papers on astrology and other "pseudo-sciences" be included? Should we include the social sciences – history, anthropology, psychology, sociology, economy, statistics, philology, philosophy, ... ? I think that the answer should be certainly "yes", when the periodical / paper presents research results – independently of the field, the novelty, the methodology and the results. On the other hand, political speeches on the economical situation of a country, and other documents that present mere opinions about economy, politics, etc., should not be included. Mere matter-of-fact descriptions of the historical or political situation should not be included, too: they are a relevant source of information for historians, but they are not the result of historical analysis. Historiography should be included; the whole mass of non-historiographic documents that might interest historians should not be included – otherwise, the whole content of all newspapers would become relevant.

    Another way of answering to this question is this: anything pertaining to the subject studied by historians of science (including medicine and technology) should be included. Political history is not history of science, and therefore the documents that are relevant for the research of a political historian should not be regarded as relevant for historians of science (unless there is some other reason to include them).

Database structure: Periodicals
Database structure: Articles


Roberto de Andrade Martins
Group of History, Theory of Science and Teaching
Document version 1.1, 23 April 2003