The works published by Classiques Garnier Digital satisfy the three criteria of «numerical philology» defined by Claude Blum in the 1990s, professor at the Paris Sorbonne Univrsity.
1 - The notion of Corpus. The edition of a text in electronic form assumes its full significance only if it is considered as part of a coherent set aiming to be complete (the complete works of a given author, of a period in time, of a literary genre, etc.).
The electronic edition of a work, for example, will be constituted by the set of all the original editions of that work published during the author’s lifetime (the electronic edition of Montaigne’s Essays, for example, will thus include all the editions published during the author’s lifetime : 1580, 1582, 1587, 1588 and the edition of the manuscript prepared by Montaigne shortly before his death, the so-called «Bordeaux copy»). If, for reasons of physical impossibility, completeness is unattainable and a selection has to be made among editions published during the author’s lifetime, then we will choose the last edition considered to represent a state of the text corresponding to the author’s intention. In the absence of an edition in the author’s lifetime, we will choose the first posthumous edition. This basic principle may be adapted by the Publishing committee, which may choose, for scientific reasons, an edition other than the last in the author’s lifetime or the first posthumous edition. It is always deemed better to edit the complete set of editions published during the author’s lifetime and/or posthumous editions, as has been done for th works of Montaigne.
2 - Exact identical reproduction of works, in all its facets. The electronic editor will refrain from any judgement, selection or destruction of data provided by the original work. All signs are regarded as being integral parts of the work and as being significant within that context. The electronic edition, being an edition of our historical heritage, will be constituted according to the rules of «diplomatic editing», with scrupulous respect for the original.
In «numerical philology», the original pagination, for example, is respected, a page of the reference editon corresponding to a screen-page ; all the pages, including blank pages, are maintained and numerised. The same goes for all the data on the page, collating zones, type-face (large and small capitals, for example), written forms, spelling (typos will be corrected in order to facilitate electronic searches, but will be indicated in critical notes). This choice of the original pagination is not a light one, since it implies the existence of heavy tagging, i.e. the existence of a record of each page – and not only of parts and chapters of the work as can be found in most electronic editions (since the latter exclude any research based on information as to the original pagination, which is the basis of all research in the academic field).
3 - The electronic edition in text-mode will be associated, by means of a hypertextual link, page by page, to the facsimile edition of the original text (image mode) so that, by a click, the reader can refer back to the visual aspect of the original (this method is now being extended to all our databases).
Stages of production of a database
1) Constitution of the bibliography by the best specialists in the subject according to the above principles:
- recourse only to original editions;
- the last edition in the author’s lifetime is chosen unless scientific reasons justify another choice ;
- in the absence of editions during the author’s lifetime, the first recognised posthumous edition will be chosen.
2) Collection of the original documents indicated in the bibliography, in libraries and private collections world-wide, by numerisation, microfilm or photocopy, respecting royalties.
3) Technical and scientific analysis of samples chosen by teams of specialists.
4) Manual pre-tagging of documents. Definition and highlighting of the structure of the documents and indication of possible problems of reproduction (coding of unusual signs, figures, etc.) by means of coloured overlining and commentary.This work is done manually on a paper-support.
5) Composition of the DTD (Document Type Definition).
6) Keyboarding of the sample texts according to the DTD by specialised teams.
According to the difficulty of the text, it is established by double or triple comparative keyboarding, which allows a maximum of 1 fault in 10,000 signs. There is no question of scanning. Scanning is more or less unreliable for any text and is impossible for pre-19th century texts. Moreover, scanning does not allow tagging of structural endings and thus excludes complex search-fields. The best scanning reference does not go beyond the «chapter».
Scanning, which is an automatic process, excludes account being taken of structural elements, the manual pre-tagging of those elements, and their being included in the DTD when going into text-mode. Search possibilities are restricted, limited to search for words. Editors or producers (such as Google, for example) who use scanning techniques (pure scanning or scanning with hidden text) announce a reliability rate of 96%-99% in relation the original.
Attention should be paid to the meaning of such figures, which may create an illusion of reliability. On an ordinary page of 2,500 signs, such a rate means that there will be between 25 and 100 errors. If it concerns a dictionary (6,000 signs per page), we will have 60-240 errors per page. In other words, scanning with hidden text forbids any reliable textual search and gives results which are not significant. For a database search to be significant, there must be no more than 1 error per 10,000 signs in the humanities, and 1 error in 100,000 signs in legal texts (it is easy to imagine the consequences of faulty texts in searches for jurisprudence or for the history of sentences rendered in courts of justice). To attain such a degree of reliability, there is no alternative to double or triple manual comparative keyboarding of the same text.
7) The basic principle of DTD and keyboarding is the foundation numerical philology : our edition is exactly identical to the original ; it is a «diplomatic edition» of the original.
8) Checking the text and the proper respect of the DTD by computer tools (parsers) and manual control.
9) Elaboration of a prototype according to the Classiques Garnier Numérique application (application = multiplatform Babel motor + interface).
10) Computer integration of sample, then tests and possible adaptation of the DTD and of the developments of the Babel motor.
11) Keyboarding of the whole of the corpus.
12) Quality check of the whole text with computer tools (parsers) and manual checks.
13) Integration of the keyboarded text into the Classiques Garnier Numérique application.
14) Quality tests; correction of texts or of the application (motor and interface) until the tests are entirely positive.
15) The production goes on-line.
Search-functions - Principles
The tagging of our databases allows specific searches on all the different parts of a given corpus of texts (pre-texts, texts, post-texts; parts, chapters, etc.), while operators allow the user to refine his/her search in full-text (search for structures, typographical searches, etc.).
All search-fields can be combined thanks to dynamic indices which produce results in real time (one can thus search in which prefaces of works published at a given date in such and such place a given author or theme is mentioned in an epigraph or a quotation).
For each search-field, the user has at his/her disposal, fully spelt out, the complete index corresponding to that field. The entire content of that field can thus be grasped and judgement can be made as to the reliability of the texts proposed (since an index immediately highlights the errors in the texts represented). It should be noted that this instrument is extremely rare in electronic editions.