Impressum/Imprint Datenschutz

Chapter 2 Configure MorganaXProc-III

1 Setting the configuration for MorganaXProc-III

Using the command line interface you can also set specific configurations for internal features of MorganaXProc-III. Those control can appear everywhere on the command line after the specification of the pipeline to run.

1.1 Setting the catalog resolver

MorganaXProc-III uses XMLResolver (developed by Norm Tovey-Walsh) as catalog system to resolve resources used in your pipeline. Use -catalogs=uri-or-path to set an initial XML catalogs as a semicolon-separated list of URIs or paths. If a path or a URI is relative it is resolved against the current working directory.

1.2 Selecting the XSLTConnector

MorganaXProc-III implements a flexible way letting you choose which XSLT processor is used in p:xslt. For each supported XSLT processor MorganaXProc-III provides a connector which needs to be registered to be used. Only one connector can be registered for a specific pipeline run. To register a connector use -xslt-connector=java-classname-or-shortcut . Currently the following XSLT processors are supported:

processor java classname shortcut
Saxon 9.9

com.xml_project.morganaxproc3.saxon99connector.Saxon99XSLTConnector (Recommended to be used with Saxon 9.9.1.8)

saxon99
Saxon 10

com.xml_project.morganaxproc3.saxon10connector.Saxon10XSLTConnector (Recommened to be used with Saxon 10.8)

saxon10
Saxon 11 (please see note below)

com.xml_project.morganaxproc3.saxon11connector.Saxon11XSLTConnector (Recommened to be used with Saxon 11.4)

saxon11
Saxon 12, 12.1 and 12.2 (please see note below)

com.xml_project.morganaxproc3.saxon12connector.Saxon12XSLTConnector (Recommened to be used with Saxon 12.2)

saxon12
Saxon 12.3 or later (please see note below)

com.xml_project.morganaxproc3.saxon12_3connector.Saxon12_3XSLTConnector (Recommened to be used with Saxon 12.3 or later)

saxon12-3
Any JAXP compliant processor

Any processor extending javax.xml.transform.TransformerFactory can be used for XSLT 1.0 transformations with p:xslt. Just using the "jaxp" shortcut will cause the default TransformerFactory of your Java system to be loaded. If you want a special implementation/extension, you can use jaxp:class-name, where class-name has to be the full Java name of the extending class. Examples are:

  • org.apache.xalan.processor.TransformerFactoryImpl

  • net.sf.saxon.TransformerFactoryImpl

Of cause you have to make sure that the relevant Java classes are on Java's classpath.

jaxp

Currently Saxon 10 is used as default setting, so if you do not set an XSLTConnector, this version will be used. Please mind that you have to put the requested XSLT processor on the Java classpath too. The easiest way is to just drop the JAR-file into folder “MorganaXProc-IIIse_lib” or “MorganaXProc-IIIee_lib” respectively. As the classes in the different version of an XSLT processor typically overlap, it is in general a risky strategy to have different versions of the same processor on the classpath. MorganaXProc-III will use the first Saxon version found on the classpath.

Another problem in setting the configuration for XSLTConnector is that MorganaXProc-III will always use the same instance of a Saxon processor if it is used for p:xslt and p:xquery. Therefor it is not possible to use Saxon 9.9 for XSLT processing and Saxon 10 for XQuery processing or the other way around.

Note on using Saxon 11 and later in p:xslt and p:xquery:

The W3C specification requires that if document-uri(D) = U, then doc(U) is D. A consequence of this rule is that two different documents cannot have the same document URI.

This rule is strongly enforced with Saxon 11. This has consequences for p:xslt and p:xquery steps in an XProc 3.0 pipeline using an underlying Saxon 11 processor. XProc 3.0 does allow two different documents to have the same URI. Additionally it also forces different documents to have the same URI as all inline created documents (in general) get their base URI from the base URI of the surrounding pipeline. Using these documents as initial match selection, global-context-item, and/or as default collection forces Saxon to raise an error.

Norm Tovey-Walsh invented a mechanism to workaround this problem by making document URIs unique: Each document supplied to Saxon 11 (either via p:xslt or via p:xquery gets a unique id by adding a special query parameter to document's base URI. MorganaXProc-III follows this strategy: The document URI is made unique (where necessary) by adding a query parameter named “xproc_unique” associated with a increasing integer value.

This seems to be a good solution for now: May be the query parameter is added in places where it is not necessary, but time will tell. One place where the query parameter is necessary due to Saxon 11's internal mechanism is providing the same document twice as part of the default collection. Saxon 11 raises an error here, so a query parameter has to be added. This is probably not the best solution, but it is a solution for now, occurs only in a special situation, and does not interfere with normal operation too much.

1.3 Selecting the XQueryConnector

With -xquery-connector=java-classname-or-shortcut you can select the XQuery processor to be used in p:xquery in the same ways, an XSLTConnector is used for p:xslt. Currently the following connectors are supported:

processor java classname shortcut
Saxon 9.9

com.xml_project.morganaxproc3.saxon99connector.Saxon99XQueryConnector (Recommended to be used with Saxon 9.9.1.8)

saxon99
Saxon 10

com.xml_project.morganaxproc3.saxon10connector.Saxon10XQueryConnector (Recommended to be used with Saxon 10.8)

saxon10
Saxon 11

com.xml_project.morganaxproc3.saxon11connector.Saxon11XQueryConnector (Tested with Saxon 11.4)

Please mind that Saxon 11 only supports XQuery 3.1. Therefor using p:xquery either with version="1.0" or with version="3.0" will raise an error (XC0009: unsupported XQuery version). If you do not specifiy an explicit version, "3.1" will be used for Saxon 11. Please see also note on Saxon 11.

saxon11
Saxon 12, 12.1 and 12.2

com.xml_project.morganaxproc3.saxon12connector.Saxon12XQueryConnector (Tested with Saxon 12.2)

Please mind that Saxon 11 and later only supports XQuery 3.1. Therefor using p:xquery either with version="1.0" or with version="3.0" will raise an error (XC0009: unsupported XQuery version). If you do not specifiy an explicit version, "3.1" will be used for Saxon 12. Please see also note on Saxon 12.

saxon12
Saxon 12.3 or later

com.xml_project.morganaxproc3.saxon12_3connector.Saxon12_3XSLTConnector (Tested with Saxon 12.3 and 12-4)

Please mind that Saxon 11 and later only supports XQuery 3.1. Therefor using p:xquery either with version="1.0" or with version="3.0" will raise an error (XC0009: unsupported XQuery version). If you do not specifiy an explicit version, "3.1" will be used for Saxon 12. Please see also note on Saxon 12.

saxon12-3

Currently version Saxon 10 is used as default. Please see the hints on Saxon as XSLTConnector for further configuration details.

1.4 Loading configuration for XSLT- and XQuery processors

Some XSLT- and XQuery processors accept configuration files to control their settings. With -xslt-config=uri-or-path and -xquery-config=uri-or-path you can provide such files. MorganaXProc-III does not do anything with these files but just pass them through to the XSLT- or XQuery processors when they are instantiated. The files nature and the semantics of its content is therefor completely determined by the receiving product. Please see its documentation.

As stated above, MorganaXProc-III will use the only one processor instance if you choose to do p:xslt and p:xquery with Saxon. If you do so, please make sure to give the same configuration file for -xslt-config and -xquery-config. As only one Saxon instance is created, only one of them will be used, but as it is not predictable, whether this will be done by a p:xslt or a p:xquery.

1.5 Selecting the Schematron processor

Using -schematron-connector=java-classname-or-shortcut you can select the Schematron processor used in p:validate-with-schematron. Currently the following Schematron processors are supported:

processor java classname shortcut
SchXSLT com.xml_project.morganaxproc3.validation.support.SchXSLTAdapterForSchematron schxslt
Skeleton XSLT implementation com.xml_project.morganaxproc3.validation.support.ISOSkeletonAdapterForSchematron skeleton

Currently the SchXSLT connector is default. Please note that MorganaXProc-III only provides connectors to the two implementations, but not the implementation itself. If you want to use p:validate-with-schematron in your pipeline, you have to download an implementation yourself and make their paths known to MorganaXProc-III. For details please see instructions.

1.6 Selecting the XML Schema validator

MorganaXProc-III comes with out of the box support for XML Schema 1.0 validation using the Xerces implementation supplied with Java.However this validator does not know anything about XML Schema 1.1, so if attribute version on p:validate-with-xml-schema is set to 1.1 you will get an error message, saying the connector is not capable of this type of validation.

MorganaXProc-III offers support for validation with XML Schema 1.1 using either Xerces (xerces-2_12_1-xml-schema-1.1) or Saxon-EE. If you have already installed Saxon-EE for XSLT transformation or p:xquery, MorganaXProc-III will automatically select it as soon as validation with XML Schema 1.1 is invoked. If you want to use Xerces, download the package from their website and make it available on MorganaXProc-III's classpath. The JAR-files needed are xercesImpl.jar, org.eclipse.wst.xml.xpath2.processor_1.2.0.jar, and cupv10k-runtime.jar.

Caveat:

  • If you use Xerces and Saxon-EE with MorganaXProc-III, the first implementation to appear on the classpath will be used. To explicitly control which validation processor is used, you can supply a command line switch or a configuration file element “schemafactory-impl”. The supplied value must be the fully qualified factory class name of a class which provides implementation of javax.xml.validation.SchemaFactory. For Xerces this is org.apache.xerces.jaxp.validation.XMLSchemaFactory (for Schema 1.0) or org.apache.xerces.jaxp.validation.XMLSchema11Factory (for Schema 1.1), for SaxonEE the class is named com.saxonica.ee.jaxp.SchemaFactoryImpl. Alternately you can supply “Xerces” or “Saxon” as short cuts. Of course you can supply the fully qualified class name of any other class implementing the named interface. Please make sure that the relevent Jar files can be found on the classpath.

  • Due to the internal mechanism of Saxon-EE, the validator will try to resolve additional schemas even if you set p:validate-with-xml-schema's options use-location-hints and/or try-namespaces to false.

SaxonEE has some features to control aspects of Schema validation. For instance you can use http://saxon.sf.net/feature/strip-whitespace to control whitespace stripping of the document to be validated. MorganaXProc-III supports these features via p:validate-with-xml-schema's option parameter. The key has to be in Saxon's feature namespace and the local name is the feature's name, e.g. Q{http://saxon.sf.net/feature}strip-whitespace. Those key-value-pairs are ignored if Schema validation is performed by another validator.

1.7 Selecting processor the Invisible XML

Starting with release 1.3 MorganaXProc-III supports Invisible XML processing with XProc's p:ixml . The step is implemented as specified using either the NineML tools developed by Norm Tovey-Walsh or Markup Blitz developed by Gunther Rademacher. There is no default selection so you have to explicitly set your preferred processor for IXML.

  • To use NineML as your IXML processor use -ixml-connector=com.xml_project.morganaxproc3.ninemlConnector.NineMLConnector on commandline or the respective <ixml-connector> in your configuration file. Additionally you need CoffeeGrinder and CoffeeFilter on classpath. The connector for NineML supports Java 8 and later.

  • If you use Java 11 or later, you can use Markup Blitz by setting -ixml-connector=com.xml_project.morganaxproc3.markupblitzConnector.MarkupBlitzConnector. Additionally make sure that markup-blitz-xxx.jar is on your classpath.

1.8 (Re-) directing message from steps and stylesheets

Using the common attribute [p:]message pipeline author can define messages to be printed on some output channel. The same is true for xsl:message within XSLT stylesheet. Using -messages on CLI (or a message element in configuration document) users can choose whether and where these message are printed. If the value of messages is off, no messages will be printed. The values std:out or std:err tell MorganaXProc-III to print the messages to the standard output stream or the standard error stream. Any other value will be interpreted as a filepath or a URI of a ressource to where the messages should be written. If the path/URI is relative it will be resolved against the current working directory (for the CLI option) or the base URI of the configuration document. Any existing resource with the given name is overwritten without warning.

2 Setting the configuration file on the command line

Additional to the explicit settings on the command line, MorganaXProc-III also allows you to bundle specific configuration settings in a file. This is done by using -config=uri-or-path as first element on the command line. If the given URI or path is relative it is resolved against the current working directory. You can find an example of a configuration file in your distribution folder. All switches and configuration controls can be used in the configuration document. The local name of the elements are the same as on the command line without the trailing “-”.

As the configuration file has to be the first element in the command line settings, latter settings for the same switch or control will override the settings in the configuration document.

3 Configuration of XSLT based schematron implementations

MorganaXProc-III supports XSLT based schematron implementations like SchXSLT and Skeleton XSLT implementation. These implementation are based on a set of XSLT stylesheets needed to perform the actual validation. You can place the respective files anywhere on your file system, but please make sure you do not change the file names. In order for MorganaXProc-III to find the relevant file on your file system, you have to register the relevant folders with the processor. In order to do so, the configuration file loaded by MorganaXProc-III has to contain one or more of the following elements:

  • mox:path_to_SchXSLT_1: Path to the folder containing the SchXSLT files for XSLT 1.0, e.g. schxslt-1.4.5-sources/xslt/1.0.

  • mox:path_to_SchXSLT_2: Path to the folder containing the SchXSLT files for XSLT 2.02, e.g. schxslt-1.4.5-sources/xslt/2.0.

  • mox:path_to_iso_skeleton_schematron_1: Path to the folder with the Skeleton files for XSLT 1.0, e.g. ISO_SKELETON_SCHEMATRON_1.

  • mox:path_to_iso_skeleton_schematron_2: Path to the folder withe the Skeleton files for XSLT 2.0, e.g. ISO_SKELETON_SCHEMATRON_2.

If a provided path is relative, it is made absolute against the URI of the configuration file the respective element is contained in.

4 Adding media type mappings

When loading documents, MorganaXProc-III needs to determine whether the file contains an XML document, an HTML document, a JSON document, a text document, or a binary document. MorganaXProc-III defines a number of mappings from file extensions to media types by default. In your XProc 3.0 pipeline you can use attribute content-type on p:load and p:document to explicitly define the media type of the related document.

MorganaXProc-III defines an additional way to identify the media type via file extensions using the configuration file. If the document contains an element mediatype-mapping in MorganaXProc-III's namespace, all elements map (in the namespace) will be considered. They need to have non-empty attributes file-extension and media-type. Attribute media-type must contain a valid media type. If all those criteria are matched and the file extension is not already bound to another media type, subsequent loading of files with the used extension will be recognized as the used media type. See file config.xml in your MorganaXProc-III distribution for an example.

Please keep in mind that running MorganaXProc-III with a different configuration file might change pipeline's behaviour dramatically because the file might be recognized with another media type.