Technical documentation
Main ant file: "hdoc_to_dokiel.ant"
This file is the entry point of the converter. Here's how it works:
First of all, this script will clean all the previous data temporary files, in case of a not working previous try.
Secondly, it will construct and copy all the needed files, like the .wspmeta file needed, and unzip the hdoc file.
Then, the most interesting part of the script:
It will run the XSLT stylesheet that will find the content.xml of the hdoc file, according to the path given in the container.xml file, and will create another ant file.
This XSLT stylesheet will also create another ant script that will copy all the resource files of the HDOC file.
This stylesheet is find_content.xsl.
Next, it will run the 2 previously generated scripts: the first that will run the principal xslt stylesheet, and the other one that will copy the media files.
This script are named: temp.xml and find_media.ant.
To conclude, it will zip the archive and delete all the temporary files.
Find the content of the hdoc file: "find_content.xsl"
This stylesheet creating an ant file has 2 aims:
First, it will find the correct path to the content of the hdoc, and will prepare the execution of the main XSLT stylesheet hdocToDokiel.xsl.
Then he will run the find_media.xsl, that will create an ant file to copy all the media resources of the HDOC file.
Both of the xslt instructions are using XSLT 2.0 because of the instructions <xsl:result-document> and tokenize, like described below.
Copy the media files: "find_media.xsl"
This stylesheet will create an ant script that will copy the media files, like pictures, into the scar folder.
The most interesting instruction of this script is the line: tokenize(@src,'/')[last()]. As soon as the script finds a media file, like a picture, it will get his path contained in his "src" attribute, and will only take the last part of the path (the name of the file). Then, it will copy this file at the root of the & folder of the resulting archive. Regardless to the format of the path (absolute or relative), it will always match the right file and copy it on the right folder.
The instruction tokenize is a XSLT 2.0 instruction, so we force ant to use the Saxon engine.
The main XSLT: "hdocToDokiel.xsl"
This stylesheet uses a XSLT 2.0 instruction: <xsl:result-document>, which create a new document with a given name each time a Dokiel externalization is needed. Unlike the others format of Scenari, Dokiel externalizes almost all of the items it has.
This instruction makes the whole converter lighter and the process of conversion run faster. On the other hand, I spent too much time configuring the Saxon engine...
In order to retrieve an unique identifier for all of the sections that need to be externalized, I also use the XPATH instruction count on the preceding and ancestor nodes.
I also reuse the tokenize instruction in order to retrieve the correct path of the media files.