Rolf Brugger
October 1994
In this article a generator for SGML marked up ASCII text is presented. It has been integrated into the OSCAR-II prototype (further referred to as document recognition module), which is part of a project for structural optical recognition of printed documents. This has been used in the HIPOCAMPE project [5] whose goal was the realization of a prototype of an interactive computer assisted instruction system.
The document recognition module plays an important role in the HIPOCAMPE system. Its task is to recognize two kinds of information from a printed and scanned document. It recognizes the logical structure by using segmentation and character and font recognition. Merging together these two intermediate results leads to the specific logical structure of a document which is the result used by the following modules.
This section explains shortly how the document recognition module can be decomposed to submodules as shown in Figure 1. The root box symbolizes the generic logical document structure. It describes in a generic way the logical structure of the documents that the system potentially should be able to recognize. Here, the generic logical structure is formalized in a document type definition (DTD) according to the SGML standard. It describes for example the set of all the possible chapters of a textbook.
The given DTD must the be translated by hand into another formalism called document description. The document description was considered the more appropriate generic formalism for the OSCAR-II recognition approach. In section 3.2 we explain how a DTD can be translated correctly to a document description.
The document description is a human readable and editable ASCII file. In order to be processable by the analyzer, the document description has to be transformed to a finite state automaton in a well machine readable binary form.
The main data stream that leads to the analyzer is the actual document image to be processed. A printed document is scanned and transformed to a digital image format. Three different processes extract OCR, OFR and segmentation information from the image. The OCR (optical character recognition) process recognizes the pure textual information consisting of letters, digits and special characters. The OFR (optical font recognition) process recognizes font information of the text like font family, font size and font style. Finally, the segmentation process recognizes the position in the document and the shape of the envelope of entire text blocks, text lines, figures etc.
These intermediate results are then passed to the analyzer which is parameterized by a document description represented as automaton. The analyzer, in a few brief words, is an error tolerant parser that scans the OCR data while simultaneously taking into account the OFR and segmentation data. This matching of the actual document to a document description results in the specific logical structure of the actual document. Typically, the specific structure is represented as a tree where the nodes correspond to nonterminal symbols of the document description and the leaves correspond the actual characters of the recognized text.
The specific document structure can then be postprocessed in two different ways: Either its tree structure is visualized on the screen or it is transformed to an ASCII text document enriched by SGML tags. As the SGML tags have to correspond to the DTD, a relation between document description and SGML entities has to be established. This relation is defined in the translation table which parameterizes the specific structure to SGML translator. The translation table has to be created by hand. The relations between DTD, document description and translation table, and the SGML generator algorithm will be documented in detail in Section 3 and the edition of a translation table in Section 4.
As explained in the previous section the document recognition module (the analyzer respectively) is parameterized by a generic description of the document class to be recognized. It defines on the one hand the entities that may exist in a document and how they can be arranged logically. On the other hand it defines the physical appearance of these entities or, in other words, how they are typeset in a printed document. These two aspects of a document class description are called generic logical structure and generic physical structure.
The formalism that is being used to describe the generic logical structure
was proposed by Ingold [3].
It is a grammar expressed in with an EBNF-like notation (refer to
Figure 2). A nonterminal symbol can be composed by
other symbols using lists
,
iterations
,
alternatives
or
options [s]. Parentheses can be used to group subexpressions in an
expression that would be ambiguous otherwise.
Hipsix: DOC => Chapter;
Chapter: PRT => ChapTitle {SectOne};
ChapTitle: FRG => LevelOneNum {Word};
LevelOneNum: STR => FDigit Period;
Word: STR => {Letter};
SectOne: PRT => SectOneTitle SectOneCont;
SectOneTitle: FRG => LevelTwoNum {Word};
SectOneCont: PRT => MainText | {SectTwo};
|
The generic physical structure is described by attributes associated to the symbols of the above grammar. As can be seen in Figure 3 the attributes specify typographical parameters like alignment, font, line height, line distance, margins etc.
ChapTitle.zone = main; ChapTitle.alignment = (Allowed, Leftadjusted, [ -3 pt, 3 pt], [-3 pt, 3 pt], [-3 pt, 3 pt]); ChapTitle.lineHeight = [18pt, 18pt]; ChapTitle.spaceBefore = (Obligatory, [0 pt, 100pt]); ChapTitle.interSpace = (Forbidden, [5 pt, 25 pt]); ChapTitle.font = (Times, 18pt, Bold, Roman); SectOne.zone = main; SectOne.alignment = (Allowed, Justified, [-3 pt, 3 pt], [-3 pt, 3 pt], [-3 pt, 3 pt]); SectOne.lineHeight = [10 pt, 10pt]; SectOne.spaceBefore = (Allowed, [0 pt, 50pt]); SectOne.interSpace = (Allowed, [0 pt, 4 pt]); SectOne.font = (Times, 10pt, Normal, Roman); |
Due to the EBNF-like notation that has been adopted a context free language could be described with it. However, by limiting the document description to non-recursive production rules it can also be formalized by regular expressions. It is an important property of the document description to cover only regular languages because it guarantees the possibility to create an automaton for the parsing process.
The Standardized Generalized Markup Language SGML is a widely adopted document description standard [4]. In the SGML approach, the ASCII text of a specific document is logically structured by ASCII markups. The markups themselves and how they can be arranged is defined in the document type definition DTD, which corresponds to the generic document structure of a document class. Figure 4 shows an example of the DTD for a very simple document class and an instance of it.
|
SGML defines three types of markups:
In Section 2.1.1 we already pointed out the fact that the document recognition module is based on a proprietary document description. Therefore, the DTD has to be translated to the document description format. This translation would be very difficult to implement because the two formalisms have a different expressiveness. So we decided to do the translation by hand. This implies two consequences: First, only a subset of the SGML expressiveness can be used for the DTD and second, all information that will be used by the document recognition module that is not defined in the DTD must be added by the user.
The document description is then passed to the recognition module. Its output is a tree structure where the nodes correspond to the nonterminal symbols of the document description and where the leaves contain the actual text information. In order to be able to transform this structure to an SGML text the node labels (which are identical to the nonterminal symbol's identifier) have to be translated to the respective SGML identifiers. This is what the translation table is used for: it lists all document description identifiers that can appear in a specific document and associates to each of them the corresponding SGML identifier (Figure 5).
![]() |
In order to make the SGML text generator and the document recognizer work correctly several restrictions on the DTD and document description have to be respected:
In this section we will explain in detail the algorithm of the SGML text generator. It consists of two parts: first, the SGML text is created and second, some simple formatting operations are applied on it, leading to the marked up ASCII file.
SGML formatted documents contain pure text that is structured by three types of markups -- elements, entities and attributes. As the document recognizer cannot generate attributed logical elements it is not possible to generate SGML attributes. Thus the SGML text generator must be able to create elements, entities and the characters of the text. The main principle is to traverse the document tree in a depth first manner. Whenever a node is encountered the appropriate character or SGML tag is created (Figure 6).
![]() |
<paragraph>) and on node
exit a closing tag (e.g. </paragraph>). When a node is
encountered its corresponding SGML tag is written directly to the
result file.
The following formatting operations are applied while the SGML text generation proceeds:
The translation table is an ASCII file that can be edited with standard text editors. It is interpreted line by line, providing four line types:
A formatting instruction is one of the keywords bs, as, be or ae. They are used to insert line breaks in the resulting SGML text before an opening tag (bs), after an opening tag (as), before a closing tag (be) or after a closing tag (ae). As SGML entities are only generated at a node entry, the instructions be and ae would be ignored for them. Formatting instructions may be listed in any order.
Note, that all identifiers are interpreted in a case sensitive way. The complete syntax of the translation table is listed in Figure 7. An example can be found in Appendix A.4.
TransTab ::= {TransLine <CR>}TransLine ::= Comment | Empty | ElementTrans | EntityTransComment ::= '#'{Char}Empty ::= {<TAB>| }ElementTrans ::= Ident Ident FormInstrEntityTrans ::= Ident '&'Ident FormInstrFormInstr ::= {'bs' | 'as' | 'be' | 'ae'}Ident ::= Letter {Letter | Digit}Letter ::= {'a' | .. | 'z' | 'A' | .. | 'Z'}Digit ::= {'0' | .. | '9'} |
The first step is to list all document description identifiers (all nonterminal symbols of the document description). Then for each document description identifier the semantically corresponding SGML element or entity is associated. Several document description identifiers may be associated to one SGML identifier but not vice versa. This is the reason why a document description identifier must not appear more than once in a translation line. In the case where a document description identifier needn't be translated, the according translation line can simply be omitted.
For an example refer to the translation table in Appendix A.4 which corresponds to the DTD in Appendix A.2 and the document description in Appendix A.3.
<!DOCTYPE Book
[<!ENTITY title "" -- -->
<!ENTITY digit "" --0,1,2,3,4,5,6,7,8,9 -->
<!ENTITY period "" -- -->
<!ENTITY word "" -- -->
<!ENTITY chemelmt "" --exemple: C4H12O6; NaCl,4(H2O); Na(+); Cl(-); Cu(2+)-->
<!ENTITY chemform "" -- 2(NaCl) + 8(H2O) --> 2(NaCl,4(H2O)) -->
<!ENTITY mathequ "" -- bitmap -->
<!ENTITY chapnum "" -- (%digit)+, %period, (%digit)+ -->
<!ENTITY superscript "" -- reference a une note en bas de page -->
<!ENTITY bigperiod "" -- . -->
<!ENTITY openquot "" -- `` -->
<!ENTITY punct "" -- toute autre ponctuation -->
<!ENTITY closguot "" -- '' -->
<!ENTITY kwdexp "Experience" -- -->
<!ENTITY bitmap "" -- -->
<!ENTITY kwdtabl "Table" -- -->
<!ENTITY kwdfig "Fig." -- -->
<!ELEMENT chap - - (chapti, (maintext)?, sect+) >
<!ELEMENT chapti - - (chapnu, %title) >
<!ELEMENT chapnu - - ((%digit)+, %period) >
<!ELEMENT maintext - - ((parabeg | parafol | exp | tabl | fig | topic)+) >
<!ELEMENT parabeg - - ((%word | %punct | ref | %chemelmt | list | %chemform | >
<!ELEMENT ref - - (%chapnum | %superscript) >
<!ELEMENT list - - (%bigperiod, listitem) >
<!ELEMENT listitem - - ((%word | %punct | ref | %chemelmt | %chemform | >
<!ELEMENT keyexpr - - ((%word)+) >
<!ELEMENT keysent - - ((%word | %punct | ref | %chemelmt | %mathequ)+) >
<!ELEMENT quotat - - (%openquot, (%word | %punct)+), %closquot) >
<!ELEMENT parafol - - ((%word | %punct | ref | %chemelmt | list | %chemform |
%mathequ | keyexpr | keysent | quotation)+) >
<!ELEMENT exp - - (expti, artwork?, expdesc) >
<!ELEMENT expti - - (%kwdexp, expnu, %title) >
<!EtEMENT expnu - - ((%digit)+, %period, (%digit)+) >
<!ELEMENT artwork - - (%bitmap) >
<!ELEMENT expdesc - - ((parabeg, parafol)+) >
<!ELEMENT tabl - - (tablti, tablcont) >
<!ELEMENT tablti - - (%kwdtabl, tablnu, %title) >
<!ELEMENT tablenu - - ((%digit)+, %period, (%digit)+) >
<!ELEMENT tablcont - - (%bitmap) >
<!ELEMENT fig - - (artwork, figti) >
<!ELEMENT figti - - (%kwdfig, fignu, %title) >
<!ELEMENT fignu - - ((%digit)+, %period, (%digit)+) >
<!ELEMENT topic - - (topicti, (maintext)+) >
<!ELEMENT topicti - - (%title) >
<!ELEMENT sect - - (sectti, (maintext)*, susect*) >
<!ELEMENT sectti - - (sectnu, %title) >
<!ELEMENT sectnu - - ((%digit)+, %period, (%digit)+) >
<!ELEMENT susect - - (susectti, (maintext)*) >
<!ELEMENT susectti - - (susectnu, %title) >
<!ELEMENT susectnu - - ((%digit)+, %period, (%digit)+, %period, (%digit)+) >
]>
Hipsix: DOC => Chapter;
Chapter: PRT => ChapTitle {SectOne};
ChapTitle.zone = main;
ChapTitle.alignment = (Allowed, Leftadjusted, [ -3 pt, 3 pt],
[-3 pt, 3 pt], [-3 pt, 3 pt]);
ChapTitle.lineHeight = [18pt, 18pt];
ChapTitle.spaceBefore = (Obligatory, [0 pt, 100pt]);
ChapTitle.interSpace = (Forbidden, [5 pt, 25 pt]);
ChapTitle.font = (Times, 18pt, Bold, Roman);
SectOne.zone = main;
SectOne.alignment = (Allowed, Justified, [-3 pt, 3 pt],
[-3 pt, 3 pt], [-3 pt, 3 pt]);
SectOne.lineHeight = [10 pt, 10pt];
SectOne.spaceBefore = (Allowed, [0 pt, 50pt]);
SectOne.interSpace = (Allowed, [0 pt, 4 pt]);
SectOne.font = (Times, 10pt, Normal, Roman);
ChapTitle: FRG => LevelOneNum {Word};
ChapTitle.separBefore = (Allowed, [7 pt, 30 pt]);
LevelOneNum: STR => FDigit Period;
FDigit.cand = {"0" | "1"| "2"| "3"| "4"| "5"| "6"| "7"| "8"| "9"};
FDigit.font = (@, @, @, @);
FDigit.separBefore = (@, [@, @]);
Period.cand = {"."};
Period.font = (@, @, @, @);
Period.separBefore = (Forbidden, [0 pt, 2 pt]);
RefOne: STR => FDigit Period;
Word: STR => {Letter};
Letter.cand = {"A"|"B"|"C"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|
"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z"|
"a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|
"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z"|
"a^"|"e^"|"i^"|"o^"|"u^"|"e/"|"e:"|"i:"|"a\"|"e\"|"u\"|"c,"};
Letter.font = (@, @, @, @);
Letter.separBefore = <FST: (@, [@, @]),
NXT: (Forbidden, [0 pt, 2 pt])>;
GWord: STR => {GLetter};
GLetter.cand = {"A"|"B"|"C"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|
"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z"|
"a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|
"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z"|
"a^"|"e^"|"i^"|"o^"|"u^"|"e/"|"e:"|"i:"|"a\"|"e\"|"u\"|"c,"};
GLetter.font = (@, @, @, @);
GLetter.separBefore = <FST: (Forbidden, [0 pt, 2 pt]),
NXT: (Forbidden, [0 pt, 2 pt])>;
ComWord: STR => {Letter} ((Connection {GLetter})|
(Break BLetter {GLetter}));
Connection.cand = {"'"|"/"};
Connection.font = (@, @, @, @);
Connection.separBefore = (Forbidden, [0pt, 2pt]);
Break.cand = {"-"};
Break.font = (@, @, @, @);
Break.separBefore = (Forbidden, [0pt, 2pt]);
BLetter.cand = {"A"|"B"|"C"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|
"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z"|
"a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|
"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z"|
"a^"|"e^"|"i^"|"o^"|"u^"|"e/"|"e:"|"i:"|"a\"|"e\"|"u\"|"c,"};
BLetter.font = (@, @, @, @);
BLetter.separBefore = (Obligatory, [1 pt, 30 pt]);
MainText: PRT => {(ParaSta [{(Experience | NonText| Table |
ItemFST | ItemFol) [ParaSpe]}]) |
(NonText ItemFol) | TopicTitle};
Experience.alignment = (Allowed, Justified, [11 pt, 17 pt],
[11 pt, 17 pt], [-3 pt, 3 pt]);
Experience.font = (Times, 8 pt, Normal, Roman);
ParaSta.zone = @;
ParaSta.alignment = (@, @, [@, @], [@, @], [11 pt, 17 pt]);
ParaSta.lineHeight = [10pt, 10pt];
ParaSta.spaceBefore = (Allowed, [1 pt, 50pt]);
ParaSta.interSpace = (Allowed, [1 pt, 4 pt]);
ParaSta.font = (Times, 10pt, Normal, Roman);
ParaSpe.zone = @;
ParaSpe.alignment = (@, @, [@, @], [@, @], [@, @]);
ParaSpe.lineHeight = [10pt, 10pt];
ParaSpe.spaceBefore = (Allowed, [1 pt, 50 pt]);
ParaSpe.interSpace = (Allowed, [1 pt, 4 pt]);
ParaSpe.font = (Times, 10pt, Normal, Roman);
ItemFST.zone = main;
ItemFST.alignment = (Allowed, Justified, [22 pt, 28 pt],
[-3 pt, 3 pt], [-6pt, -12pt]);
ItemFST.lineHeight = [10 pt, 10 pt];
ItemFST.spaceBefore = (Allowed, [2 pt, 20pt]);
ItemFST.interSpace = (Allowed, [1 pt, 4 pt]);
ItemFST.font = (Times, 10 pt, Normal, Roman);
ItemFol.zone = main;
ItemFol.alignment = (Allowed, Justified, [22 pt, 28 pt],
[-3 pt, 3 pt], [-3 pt, 3 pt]);
ItemFol.lineHeight = [10 pt, 10 pt];
ItemFol.spaceBefore = (Allowed, [20 pt, 50pt]);
ItemFol.interSpace = (Allowed, [1 pt, 4 pt]);
ItemFol.font = (Times, 10 pt, Normal, Roman);
NonText.zone = @;
NonText.alignment = (Obligatory, @, [100 pt, 100 pt],
[100 pt, 100 pt], [100 pt, 100 pt]);
NonText.lineHeight = [1000 pt, 1000 pt];
NonText.spaceBefore = (Obligatory, [100 pt, 100 pt]);
NonText.interSpace = (Obligatory, [100 pt, 100 pt]);
NonText.font = (Times, 10pt, Normal, Roman);
NonText: FRG => {Letter} ;
NonText.separBefore = (Obligatory, [100 pt, 100pt]);
ParaSta: FRG => ParaData ;
ParaSta.separBefore = (Allowed, [3pt, 20pt]);
ParaSpe: FRG => ParaData;
ParaSpe.separBefore = (Allowed, [3pt, 20pt]);
ParaData: STR => { Word | ComWord | Number | ComNumber | RefOne |
RefTwo | RefThree | Punction | KeyExpr | KeySent |
MathOpe | MixWord | Phrase };
Punction.cand = {","|"."|"?"|":"|";"};
Punction.font = (@, @, @, @);
Punction.separBefore = <FST: (Forbidden, [0pt, 2pt]),
NXT: (Forbidden, [0pt, 2pt])>;
MathOpe.cand = {"+"|"-"|"*"|"/"|"="};
MathOpe.font = (@, @, @, @);
MathOpe.separBefore = <FST: (Forbidden, [2 pt, 15 pt]),
NXT: (Forbidden, [2 pt, 15 pt])>;
Number: STR => {Digit};
Digit.cand = {"0" | "1"| "2"| "3"| "4"| "5"| "6"| "7"| "8"| "9"};
Digit.font = (@, @, @, @);
Digit.separBefore = <FST: (@, [@, @]),
NXT: (Forbidden, [0 pt, 2 pt])>;
GNumber: STR => {GDigit};
GDigit.cand = {"0" | "1"| "2"| "3"| "4"| "5"| "6"| "7"| "8"| "9"};
GDigit.font = (@, @, @, @);
GDigit.separBefore = <FST: (Forbidden, [0 pt, 2 pt]),
NXT: (Forbidden, [0 pt, 2 pt])>;
ComNumber: STR => {Digit} Comma {GDigit};
Comma.cand = {","};
Comma.font = (@, @, @, @);
Comma.separBefore = (Forbidden, [0 pt, 2 pt]);
LevelTwoNum: STR => FDigit Period GDigit;
RefTwo: STR => FDigit Period GDigit;
LevelThreeNum: STR => FDigit Period GDigit Period GDigit;
RefThree: STR => FDigit Period GDigit Period GDigit;
KeyExpr: STR => {Word};
KeyExpr.font = (@, @, Bold , Italic);
KeySent: STR => {Word};
KeySent.font = (@, @, Normal , Italic);
MixWord: STR => {Letter} (LeftPar|GDigit)
[{GLetter|LeftPar|GDigit|RightPar|Plus}];
LeftPar.cand = {"("};
LeftPar.font = (@, @, @, @);
LeftPar.separBefore = <FST: (Forbidden, [0pt, 2pt]),
NXT: (Forbidden, [0pt, 2pt])>;
RightPar.cand = {")"};
RightPar.font = (@, @, @, @);
RightPar.separBefore = <FST: (Forbidden, [0pt, 2pt]),
NXT: (Forbidden, [0pt, 2pt])>;
Plus.cand = {"+"};
Plus.font = (@, @, @, @);
Plus.separBefore = <FST: (Forbidden, [0pt, 2pt]),
NXT: (Forbidden, [0pt, 2pt])>;
Phrase: STR => PhraseBegin {Word | Punction | Number | MixWord}
PhraseEnd;
PhraseBegin.cand = {"("|"-"};
PhraseBegin.font = (@, @, @, @);
PhraseBegin.separBefore = (@, [@, @]);
\ Allowed, 2, 25 \
PhraseEnd.cand = {")"|"-"};
PhraseEnd.font = (@, @, @, @);
PhraseEnd.separBefore = (Forbidden, [0pt, 2pt]);
Experience: PRT => ExpeTitle [NonText] ExpeDesc;
ExpeTitle.zone = @;
ExpeTitle.alignment = (@, Centered, [11 pt, 17 pt],
[11 pt, 17 pt], [-3 pt, 3 pt]);
ExpeTitle.lineHeight = [8 pt, 8 pt];
ExpeTitle.spaceBefore = (Allowed, [5 pt, 50pt]);
ExpeTitle.interSpace = (Forbidden, [0 pt, 3 pt]);
ExpeTitle.font = (Times, 8pt, Normal, Roman);
ExpeTitle.separBefore = (Allowed, [2pt, 15 pt]);
ExpeTitle: FRG => ExpeTitleKey ExpeTitleNum ExpeTitleText;
ExpeTitleKey.font = (Times, 8 pt, Bold, Roman);
ExpeTitleNum.font = (Times, 8 pt, Bold, Roman);
ExpeTitleKey: STR => CharE Charx Charp {CharEArie}
Charn Charc Chare;
CharE.cand = {"E"};
CharE.font = (@, @, @, @);
CharE.separBefore = (Obligatory, [@, @]);
Charx.cand = {"x"};
Charx.font = (@, @, @, @);
Charx.separBefore = (Forbidden, [0pt, 1pt]);
Charp.cand = {"p"};
Charp.font = (@, @, @, @);
Charp.separBefore = (Forbidden, [0pt, 1pt]);
CharEArie.cand = { "e/"|"r"|"i"|"e"};
CharEArie.font = (@, @, @, @);
CharEArie.separBefore = (Forbidden, [0pt, 1pt]);
Charn.cand = {"n"};
Charn.font = (@, @, @, @);
Charn.separBefore = (Forbidden, [0pt, 1pt]);
Charc.cand = {"c"};
Charc.font = (@, @, @, @);
Charc.separBefore = (Forbidden, [0pt, 1pt]);
Chare.cand = {"e"};
Chare.font = (@, @, @, @);
Chare.separBefore = (Forbidden, [0pt, 1pt]);
ExpeTitleNum: STR => FDigit Period GDigit;
ExpeTitleNum.separBefore = (@, [@, @]);
TableTitleNum: STR => FDigit Period GDigit;
TableTitleNum.separBefore = (@, [@, @]);
ExpeTitleText: STR => {Word | Punction | MixWord | ComWord};
ExpeDesc: PRT => ExpeParaSta [{{ExpeParaSta} | {(NonText
[ExpeParaSpe])} | {ExpeItem}}];
ExpeParaSta.zone = @;
ExpeParaSta.alignment = (@, @, [11 pt, 17 pt], [11 pt, 17 pt],
[11 pt, 17 pt]);
ExpeParaSta.lineHeight = [8 pt, 8 pt];
ExpeParaSta.spaceBefore = (Allowed, [1 pt, 300pt]);
ExpeParaSta.interSpace = (Allowed, [0 pt, 3 pt]);
ExpeParaSta.font = (Times, 8pt, Normal, Roman);
ExpeParaSpe.zone = @;
ExpeParaSpe.alignment = (@, @, [11 pt, 17 pt], [11 pt, 17 pt],
[-3 pt, 3 pt]);
ExpeParaSpe.lineHeight = [8 pt, 8 pt];
ExpeParaSpe.spaceBefore = (Allowed, [1 pt, 300pt]);
ExpeParaSpe.interSpace = (Allowed, [0 pt, 3 pt]);
ExpeParaSpe.font = (Times, 8pt, Normal, Roman);
ExpeItem.zone = @;
ExpeItem.alignment = (@, @, [39 pt, 45 pt], [11 pt, 17 pt],
[-11pt, -17pt]);
ExpeItem.lineHeight = [8 pt, 8 pt];
ExpeItem.spaceBefore = (Allowed, [0 pt, 10pt]);
ExpeItem.interSpace = (Allowed, [0 pt, 3 pt]);
ExpeItem.font = (Times, 8pt, Normal, Roman);
ExpeParaSta: FRG => ParaData;
ExpeParaSta.separBefore = (Allowed, [2pt, 15pt]);
ExpeParaSpe: FRG => ParaData;
ExpeParaSpe.separBefore = (Allowed, [2pt, 15pt]);
ExpeItem: FRG => BigPeriod ParaData;
ExpeItem.separBefore = (Allowed, [2pt, 15pt]);
BigPeriod.cand = {"."};
BigPeriod.font = (@, @, Bold, @);
BigPeriod.separBefore = (Obligatory, [2pt, 8pt]);
Table: PRT => TableTitle NonText;
TableTitle.zone = @;
TableTitle.alignment = (Allowed, Centered, [11 pt, 17 pt],
[11 pt, 17 pt], [-3 pt, 3 pt]);
TableTitle.lineHeight = [8 pt, 8 pt];
TableTitle.spaceBefore = (Allowed, [10 pt, 30pt]);
TableTitle.interSpace = (Forbidden, [0 pt, 3 pt]);
TableTitle.font = (Times, 8 pt, Normal, Roman);
TableTitle.separBefore = (Allowed, [3pt, 15 pt]);
TableTitle: FRG => TableTitleKey TableTitleNum TableTitleText;
TableTitleKey.font = (Times, 8 pt, Bold, Roman);
TableTitleKey: STR => CharT Chara {Charble} Chara Charu
TableTitleNum;
CharT.cand = {"T"};
CharT.font = (@, @, @, @);
CharT.separBefore = (Obligatory, [@, @]);
Chara.cand = {"a"};
Chara.font = (@, @, @, @);
Chara.separBefore = (Forbidden, [0pt, 2pt]);
Charble.cand = {"b"|"l"|"e"};
Charble.font = (@, @, @, @);
Charble.separBefore = (Forbidden, [0pt, 2pt]);
Charu.cand = {"u"};
Charu.font = (@, @, @, @);
Charu.separBefore = (Forbidden, [0pt, 2pt]);
TableTitleText: STR => {Word | Punction};
ItemFST: FRG => BigPeriod ParaData;
ItemFST.separBefore = (Allowed, [3pt, 20 pt]);
ItemFol: FRG => ParaData;
ItemFol.separBefore = (Allowed, [3pt, 20 pt]);
Topic: PRT => TopicTitle ParaSta [ParaSpe];
TopicTitle.zone = main;
TopicTitle.alignment = (Allowed, Leftadjusted, [-3 pt, 3 pt],
[-3 pt, 3 pt], [-3 pt, 3 pt]);
TopicTitle.lineHeight = [10 pt, 10 pt];
TopicTitle.spaceBefore = (Allowed, [10 pt, 25pt]);
TopicTitle.interSpace = (Forbidden, [1 pt, 6 pt]);
TopicTitle.font = (Times, 10pt, Bold, Roman);
TopicTitle: FRG => { Word };
TopicTitle.separBefore = (Allowed, [5pt, 20 pt]);
SectOne: PRT => SectOneTitle SectOneCont;
SectOneTitle.zone = main;
SectOneTitle.alignment = (Allowed, Leftadjusted, [-3 pt, 3 pt],
[-3 pt, 3 pt], [-3 pt, 3 pt]);
SectOneTitle.lineHeight = [12 pt, 12 pt];
SectOneTitle.spaceBefore = (Allowed, [15 pt, 40pt]);
SectOneTitle.interSpace = (Forbidden, [4 pt, 10 pt]);
SectOneTitle.font = (Times, 12pt, Bold, Roman);
SectOneTitle: FRG => LevelTwoNum {Word};
SectOneTitle.separBefore = (Allowed, [6pt, 25 pt]);
SectOneCont: PRT => MainText | {SectTwo};
SectTwo: PRT => SectTwoTitle ( MainText| {Topic});
SectTwoTitle.zone = main;
SectTwoTitle.alignment = (Allowed, Leftadjusted, [-3 pt, 3 pt],
[-3 pt, 3 pt], [-3 pt, 3 pt]);
SectTwoTitle.lineHeight = [10 pt, 10 pt];
SectTwoTitle.spaceBefore = (Allowed, [10 pt, 25pt]);
SectTwoTitle.interSpace = (Forbidden, [1 pt, 6 pt]);
SectTwoTitle.font = (Times, 10pt, Bold, Roman);
SectTwoTitle: FRG => LevelThreeNum {Word};
SectTwoTitle.separBefore = (Allowed, [5pt, 20 pt]);
# This is a test translation table # Translation from logi-docu-tags to sgml-tags. # # ld-tag sgml-tag formatting instructions #-------------------------------------------------------- Chapter chap bs as be ae ChapTitle chapti bs ae LevelOneNum chapnu SectOne sect bs as be ae SectOneTitle sectti bs ae LevelTwoNum sectnu MainText maintext ParaSta parabeg bs as be ae ParaSpe parabeg bs as be ae RefOne ref RefTwo ref RefThree ref Experience exp bs as be ae ExpeTitle expti bs ae ExpeTitleKey &kwdexp ExpeTitleNum expnu NonText artwork ExpeDesc expdesc bs as be ae ExpeParaSta parabeg bs as be ae ExpeParaSpe parabeg bs as be ae SectTwo susect bs as be ae SectTwoTitle susectti bs ae LevelThreeNum susectnu Table tabl bs as be ae TableTitle tablti bs ae TableTitleKey &kwdtabl TableTitleNum tablnum #? list ItemFST listitem bs ae ItemFol listitem bs ae Topic topic bs as be ae TopicTitle topicti bs ae KeyExpr keyexpr KeySent keysent Phrase quotat PhraseBegin &openquot PhraseEnd &closquot
~ correspond
to unrecognized characters.
<chap> <chapti> <chapnu>6.</chapnu> Thermodynamique</chapti> <sect> <sectti> <sectnu>6.1</sectnu> Introducti~n</sectti> <maintext> <parabeg> Les nombreu~ exemples de r act~ons vues jusqu'ici dans ce~ ouvrage on~ mon~r que l'on peu~ ais men~ e~ u~~lemen~ d cr~re ce qu~ se d roule lors d'une r ac~~on c~m~que au moyen d'une qua~on. Il es~ cependan~ un ph nom ne qui n'es~ pas d cri~ par les qua~ons ~elles que nous les avons cri~es, c'es~ le d gage-men~ ou l'absorp~~on d' nerg~e. Les e~p r~ences <ref>6.1</ref> e~ <ref>6.2</ref> d mon~ren~ que les r ac~~ons c~~m~ques son~ le plus souven~ accompagn es de ph nom nes ~er-~ques. </parabeg> <exp> <expti> &kwdexp <expnu>6.1</expnu> R ~ct~on ~u cu~vre ~vec l'~c~de ~~~que, d~g~~ement de ch~leur.</expti> <expdesc> <parabeg> En f~i~~nt couler d~ l'~c~d~ ~~~qu~ conc~n~ ~u~ d~~ tou~nur~s d~ cu~vr~, on con~t~t~ qu'~l ~~ d~roul~ un~ v~o~~nt~ r~~ct~on: ~e cu~vre ~~ d~s~out ~n donn~nt un~ so~u~on v~~~, ~~ s~ d g~g~ un g~z brun. D'~u~e p~r~, l~ ~~rmom~~~ p~~c~ ~~ns ~e b~~~on ~d~que une bru~que ~u~en~-t~on de te~p~~~tu~e. </parabeg> <parabeg> L~ r ~ct~on est ~epr ~ent e p~ l' qu~~on: </parabeg> </expdesc> </exp> </maintext> </sect> </chap>
--------------------------------------------------------------------------------
--- FRIBOURG UNIVERSITY ---
--- COMPUTER SCIENCE LABORATORY ---
--- Chemin du Musee 3, CH-1700 FRIBOURG, SWITZERLAND ---
--------------------------------------------------------------------------------
--+ TITLE: sgml_output
--+ SUPPORT: Rolf Brugger
--+ CREATION: August 1994
--+ VERSION: of 30.9.94
--------------------------------------------------------------------------------
WITH logical_document_manager; USE logical_document_manager;
WITH long_string; USE long_string;
WITH TABLE_OF_STATIC_KEYS_AND_STATIC_VALUES_G;
PACKAGE sgml_output IS
TYPE tab_entry_type IS RECORD
tagname: v_string; -- sgml: element's generic identifier
nl_bs: BOOLEAN:=FALSE; -- print newline before start-Tag
nl_as: BOOLEAN:=FALSE; -- print newline after start-Tag
nl_be: BOOLEAN:=FALSE; -- print newline before end-Tag
nl_ae: BOOLEAN:=FALSE; -- print newline after end-Tag
END RECORD;
PACKAGE table IS NEW TABLE_OF_STATIC_KEYS_AND_STATIC_VALUES_G(
KEY_TYPE => v_string,
LESS => "<",
EQUALS => "=",
VALUE_TYPE => tab_entry_type);
TYPE transl_table_type IS NEW table.table_type;
--------------------------------------------------------------------------------
PROCEDURE read_transl_params(paramfile:IN STRING;
transl_table: IN OUT transl_table_type);
-- Reads the translation table from the file 'paramfile' and transfers the
-- data to 'transl_table'.
-- The translation table is used to translate logical document tags
-- (-identifiers) to SGML-identifiers.
-- If the file 'paramfile' doesn't exist, an error message will be printed to
-- standard output.
PROCEDURE write_sgml(doc: IN logical_entity_type;
transl_table: IN transl_table_type;
output_file: IN STRING);
-- Traverses the tree 'doc' in a depth first manner.
-- The nodes of the tree are interpreted and translated to an sgml-standard
-- output, that will be written to the file 'output_file'.
-- According to the contents of the translation table 'transl_table' the
-- node-tags are translated to sgml-tags.
--------------------------------------------------------------------------------
END sgml_output;
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html hipo.
The translation was initiated by Rolf Brugger on 1999-09-15