History of Science, Medicine
and Technology.
Bibliography of Primary Sources
Auxiliary databases
Introduction
In many bibliographical database there are codes and specific information that can appear in many different entries, and that should be discussed as independent auxiliary units. In the actual implementation, they may be real independent databases linked to the main databases, or they may be tables belonging to the main databases. The actual implementation will not be discussed here. Only the main concept will be presented.
Authorities database
Library catalogues do usually make use of authorities databases as a means to produce uniform personal names in their entries. However, the auxiliary database proposed here is a little bit more complex and useful for historians of science.
We suggest that the database should contain the following fields:
1. Personal code
A unique code identifying a single
person. It may be a simple numeric code, or a mnemonic code. One possibility
is a code using the first letter of the first name, the following consonant
of the first name, the first letter of the family name, the following consonant
of the family name and, whenever necessary, a number. "Roberto Martins"
would have a code such as rbmr3, for instance. This code will be used in
the main databases, to provide a link to the auxiliary database.
2. Name(s) of the person
Yes, a single person can have several
names. This may occur in several circumstances:
* personal name and title (for instance: William Thomson
= Lord Kelvin)
* literary name, pseudonym
* nickname, short name
* personal name in different languages (for instance:
Descartes and Cartesius)
* old and new form of the same name (for instance: José
is the current correspondent of Joseph, in Portuguese)
* variant orthographies
* names originally written in non-European languages
My suggestion is that all known
forms of a person's name should be entered in the authorities database.
Subfields should identify the different cases described above. The "standard
name" used by librarians should also be added.
In the main bibliographic databases,
my suggestion is that each personal name should be entered exactly as it
appears in the corresponding publication (at the title page, by default).
3. Date of birth and death
If the day is known, it should be
entered in the database. Otherwise, the years should be informed. In many
cases, even the years are not known, or are doubtful, but it is possible
to enter the decade or century.
4. Occupation(s) or profession(s) of the person
This is a useful information,
for historians.
5. Cities or countries associated to the person
City and/or country and/or region
where the person was born and/or died and/or produced his works. This field
will use codes from the geographical database.
6. Sources of information about the person
It is necessary to identify the work
from which the biographical information was obtained, and to provide a
specific reference (volume and page, or sometimes a reference item number).
Each source of information will be identified by a code, and will be described
by the sources of information auxiliary database. Of course, it is not
necessary to produce a detailed bibliography on each person.
Historians should be able to search this database, finding relevant authors, and then use the author entries to search the main bibliographical databases. Also, when searching for specific documents, it should be possible to use any variant of the author's name.
Geographical database
Library databases usually ascribe codes to countries and other geographical regions. Our proposal is a little bit different from that. The aim of this auxiliary geographical database is to allow geographical searches by country, region, and by any variant of a city's name.
We suggest that the database should contain the following fields:
1. Place code
A code identifying the city, state,
province, country, region, continent, etc.
There are standard library codes (sometimes
national ones) for many of these.
2. Level code
A code identifying the type of geographical
information (city, state, province, country, region, continent, etc.)
3. Names of the place
A geographical place can have serveral
names, especially under the following circumstances:
* places that have different names in different periods
* geographical name in different languages (for instance:
Lutetia = Paris)
* old and new form of the same name
* variant orthographies
* names originally written in non-European languages
The "standard name" used by librarians
should also be added.
In the case of the most famous cities,
and for country names, it is common to have different "translations" of
the city name. This auxiliary database should contain the geographical
names in all languages that can be used when the main databases are searched.
For instance: if the main databases can be searched in English, French,
and Portuguese, there will be several translations for name of the Italian
city "Firenze": Florence, Florence, Florença. A sub-field should
identify the languages of the several translations, using the codes of
the language database.
4. Upper level connections
In the case of cities, the code of
the states or provinces to which they belong (or the country, if there
are no country divisions); in the case of states and provinces, the countries
to which they belong; etc.
CAVEAT: Due to political changes, a city may belong to some country during some period, and to a different country during another. Also, countries appear and disappear in time. It would be possible to circumvent this problem by specifying the period during which a city belonged to a given country, etc. (but this would be very complicated).
Languages database
Librarians use a standard set of codes to represent different languages. We suggest that the same set o codes should be used, but together with the identification of the language names in different languages:
1. Language code
Standard (Library of Congress, or
any other) code identifying a language.
2. Language names
The full name of the several languages,
in all languages that can be used when the main databases are searched.
For instance: if the main databases can be searched in English, French,
and Portuguese, there will be several translations of the language "French":
French, François, Francês. A sub-field should identify the
several translations.
When searching the main databases for specific documents, it should be possible to use any variant of the language's name.
Subjects database
Library databases usually classify books using a standard set of subject entries, sometimes with a numeric code (for instance, Dewey Decimal Classification, Universal Decimal Classification, Library of Congress subject classification). It is very useful to use any of those standard classification schemes, together with their codes, because it is easier to control the subject entries and because the standard classification subjects have already been translated into several languages. Therefore, once the code is known, it is possible to know the equivalent subject in several languages. Conversely, if someone searches the database using subjects described in any of the available languages, the subject codes will allow him/her to find the relevant entries. Besides that, it is possible to make searchs starting with very general subjects (for instance, medicine) and then using increasingly specific subjects, because of the structure of those classification schemes.
The structure of this database is very simple:
1. Subject code
Standard (DDC, UDC, Library of Congress,
or any other) code identifying the subject.
2. Subject, in several languages
The full subject, in all languages
that can be used when the main databases are searched. A sub-field should
identify the language of each translation.
Libraries and achives database
The bibliographical databases will contain information about documents that can be found in several different libraries and archives, all over the world. Each repository should be identified by a code, in the bibliographic entries, but it is necessary to have an auxiliary database (or table) containing the full description of the repository.
The structure of this database is also very simple:
1. Library or archive code
We suggest a mnemonic code, built
from the initial letters of the library or archive name. The British Library
will be identified as BL, the Library of Congress as LC, the Bibliothèque
Nationale de France by BNF, and so on. When necessary, a number can be
added to the initials, to distinguish similar codes.
2. Library or archive name, in several languages
The full name of the repository, in
all languages that can be used when the main databases are searched. A
sub-field should identify the language of each translation. Besides that,
it is useful to add variant names, when a library or archive had different
names in different times.
3. City code of the library or archive
The geographical code (from the geographical
database) identifying the place where the repository is situated.
4. Library or archive address
The full address of the library or
archive. The street addresses are usually long lived. Telephone numbers,
e-mail addresses, fax numbers and other similar information usually change
very fast, and it is a difficult task to keep the database up to date,
if this information is included.
5. Link to the library or archive
The locator (URL) of the Internet
site of the library or archive should also be included, when known, even
if it believed that it will suffer future changes.
Sources of information database
This is an auxiliary bibliographical database, describing the secondary and tertiary printed sources used in obtaining information entered at the main databases. Suppose, for instance, that the following book is one of the sources of information used while building the periodicals database.
The structure of this database will be similar to that of the main books and articles databases, since it should contain the bibliographical description of the source. However, it is necessary to introduce one additional field:
1. Source of information code
We suggest a mnemonic code, built
from the initial letters of the work title. For instance, Bolton's book
could have the code CSTP.
This database need not contain all fields used in the main database. A very simple bibliographic description will be enough for the sources of information.
Special problems: merging auxiliary databases
When a set of databases is produced at a single institution, coherence and compatibility can be easily obtained. However, in an international project, several problems can arise when databases of different origins are merged. Some of them can have used different codes for the same information, or the same code with different meanings.
In the best of the worlds, all countries used the same codes; but what can be done if they didn't?
Let us suppose that in country XY, the code STP was used to represent Bolton's Catalogue of Scientific and Technical Periodicals. In country WZ, the code STP was used to represent Kronick's Scientific and Technical Periodicals, and the code CSTP was used to represent Bolton's Catalogue of Scientific and Technical Periodicals.
The problem can be solved if all peculiar
codes of each country (or each project) are identified by an additional
piece of information, corresponding to the country (or project) from which
the information came. For instance: suppose that we identify the whole
database coming from country XY by the code XYDB, and the database coming
from country WZ by the code WZDB. Now, all codes used in XYDB should be
renamed as XYDB+code, and all codes coming from WZDB should be renamed
as WZDB+code. How, the code that was identical in both databases (STP)
will become XYDB+STP in the first one, and WZDB+STP in the second one.
Now, each entry in the main databases will point to the correct source
of information, and no conflict will arise. Of course, the same source
of information will have several different codes, but this will not create
any conflict or misunderstanding.
Database
structure: Periodicals
Database
structure: Articles