MorganaXProc-IIIse User's manual (draft). Rel 0.9.4.5-beta and later.

Achim Berndzen

<xml-project />

Table of Contents

1. Using the command line interface (CLI)
Introduction
Running a pipeline from the command line
Binding an input port
Specifying target for an output port
Binding values to options
Setting static options
Setting the configuration for MorganaXProc-IIIse
Setting the catalog resolver
Selecting the XSLTConnector
Selecting the XQueryConnector
Loading configuration for XSLT- and XQuery processors
Selecting the Schematron processor
(Re-) directing message from steps and stylesheets
Switches
2. Using a configuration file
Setting the configuration file on the command line
Configuration of XSLT based schematron implementations

Chapter 1. Using the command line interface (CLI)

Introduction

The most basic way to run MorganaXProc-IIIse with the command line interface is this: Change to the folder containing file MorganaXProc-IIIse.jar and then start the program using java -jar.

Here is an example supposing you stored the JAR-file in folder /users/me/MorganaXProc:

cd /users/me/MorganaXProc
java -jar -javaagent:MorganaXProc-IIIse_lib/quasar-core-0.7.9.jar MorganaXProc-IIIse.jar 
pipeline.xpl
=================================
MorganaXProc-IIIse 0.9.4.5-beta
Copyright 2011-2020 by <xml-project /> Achim Berndzen
=================================

Hello world. This is an XProc 3.0 pipeline running.

As you can see, MorganaXProc-IIIse needs to be run with a javaagent option to do some instrumentation. As a pipeline author you do not have to worry about this. The only thing you have to remember is, that the javaagent is required and MorganaXProc-IIIse will not run without it. The rest of the command line is pretty straight forward: We name the Jar-file to be run and the name the XProc pipeline we want to run.

As this way to call MorganaXProc-IIIse from the command line is very long and a typing error will ruin everything, there is a better way to start XProc pipelines from the command line. If you look into the folder containing the JAR-file, you will find two files named “Morgana.bat” (for Windows) and “Morgana.sh” (for MacOS and other UNIX based operating systems). Using these batch files to start MorganaXProc-IIIse requires significantly less typing on the command line:

cd /users/me/MorganaXProc
sh Morgana.sh pipeline.xpl
=================================
MorganaXProc-IIIse 0.9.4.5-beta
Copyright 2011-2020 by <xml-project /> Achim Berndzen
=================================

Hello world. This is an XProc 3.0 pipeline running.

This was easy, wasn't it?

Running a pipeline from the command line

Running an XProc pipeline might be very complex because you do not only want to tell MorganaXProc-IIIse the pipeline's place in the file system. You may also want to control which documents appear on the different input ports of your pipeline and where the document appearing on the output ports should be stored. And then there are option and static options in your pipeline you might wish to give values for. Let us see how to do this and thereby understand in more detail how MorganaXProc-IIIse interprets the tokens appearing on the command line.

As we have already seen, you specify the name of the XProc pipeline you want to run by simply writing its name as the first (or –as we shall see later– the second) token. The token either may take the form of a URI (like file:///users/me/MorganaXProc/pipeline.xpl) or as a file path (either in the Windows form if you are on Windows or the UNIX form if your operating system is Unix like). If the pipeline's path is relative, it will be resolved against the current working directory.

After the pipeline name you can specify the bindings for input ports, output port, option and static options in random order.

Binding an input port

To specify a binding for an input port you have to use -input:port-name=uri-or-path. For example you could type “-input:source=doc.xml” to bind the document in file “doc.xml” in your current working directory to a pipeline's input port called “source”. Here again, like with the XProc pipeline, you might either give a URI or a path in your file system. If it is relative, it will be resolved against the current working directory. If you want to bind more than one document to a (sequence) port, just add more -input on the command line. The order in which these documents appear on the sequence port is the same as the one you give the -inputs on the command line.

Specifying target for an output port

To specify where the documents appearing on an output port should be stored, use -output:port-name=uri-or-path. The mechanism for resolving a relative URI or path is the same as for the pipeline or -input: The current working directory is used.

If you do not specify a -output for an output port declared in your XProc pipeline, the document(s) appearing on this port will be written to you command line shell. There are three tricky things with -output worth to remember:

  1. If more than one document appears on an output port for which a -output setting is given, all documents are written to the specified resource in the order they appear on the port. As a consequence, the resource content will not be well-formed.

  2. Any existing content in a resource named with -output will be overridden without warning, so please be careful.

  3. MorganaXProc-IIIse will not create any non-existing folder/directory named in the path. If you specify a path or URI with one or more non-existing folders, an error will be raised. See below about -cp to see how to change this behaviour.

Binding values to options

Additionally to input and output ports, an XProc pipeline may also specify options you might want to set using the command line interface. To do this use -option:option-name=string-or-XPathExpression. Here are some examples:

-option:opt=value
-option:Q{http://some-namespace}opt=5+3
-option:map=map{'key':'value'}
-option:date=2020-03-28T13:53:00
-option:name=Q{'http://some-namespace')name
-option:numbers='(1, 1+1, 2+1)'
-option:numbers=(1,1+1,2+1,2+2)

As you can see from the examples, the option's name can either be a simple string (if the name of the option to be set is in no namespace) or be given using the Q{}-notation to name an option in a namespace. How the string after “=” is interpreted depends on the option's type:

  • If the option in the pipeline is declared without a type, the string following “=” is taken as parameter of xs:untypedAtomic().

  • If the option's type is an atomic type (optionally with occurrence indicator “?”) the processor will try to cast the given string to an instance of the option's type.

  • If the option's type is xs:QName or xs:QName? you can use the Q{}-notation.

  • In all other cases the processor will try to interpret the string as an XPath expression, evaluate it and cast the resulting sequence to an instance the option's type.

Caveat: Please remember that the command line processor splits up the command line into tokens using “ ” (space) as a separator. As a consequence you can not have a “ ” in the string specifying the option's value. If you want to use a “ ” you have to put quotes around the whole string. See the last but one example above and compare it to the last one.

One last thing: If you want to load a document (XML, text, or JSON) from a file, use the corresponding XPath function.

Setting static options

In XProc 3.0 a pipeline may declare static options which are supplied to pipeline's static analysis. As static options are a special kind of option, MorganaXProc-IIIse uses the same syntax to parse the command line. The only difference is, that you have to use “-static” instead of “-option”.

Unlike ordinary or dynamic options, static options can also be set from a document. This can be handy if you want to run your pipeline with different sets of basic configurations, say one for debugging and one for production. To load a set of values for your pipeline's static options, use -statics=uri-or-path. If the URI or path is relative, it will be resolved against the current working directory. Please see the file “static-defs.xml in your distribution for an example. Your will notice that configuration is cascading because one configuration file can hold a reference to another configuration file to be loaded.

MorganaXProc-IIIse will process -static and -statics from left to right so the later settings will override earlier settings for a static option with the same name.

Setting the configuration for MorganaXProc-IIIse

Using the command line interface you can also set specific configurations for internal features of MorganaXProc-IIIse. Those control can appear everywhere on the command line after the specification of the pipeline to run.

Setting the catalog resolver

MorganaXProc-IIIse uses XMLResolver (developed by Norm Tovey-Walsh) as catalog system to resolve resources used in your pipeline. Use -catalogs=uri-or-path to set an initial XML catalogs as a semicolon-separated list of URIs or paths. If a path or a URI is relative it is resolved against the current working directory.

Selecting the XSLTConnector

MorganaXProc-IIIse implements a flexible way letting you choose which XSLT processor is used in p:xslt. For each supported XSLT processor MorganaXProc-IIIse provides a connector which needs to be registered to be used. Only one connector can be registered for a specific pipeline run. To register a connector use -xslt-connector=java-classname-or-shortcut. Currently the following XSLT processors are supported:

processor java classname shortcut
Saxon 9.9

com.xml_project.morganaxproc3.saxon99connector.Saxon99XSLTConnector (Recommended to be used with Saxon 9.9.1.7)

saxon99
Saxon 10

com.xml_project.morganaxproc3.saxon10connector.Saxon10XSLTConnector Tested with Saxon 10.0. and Saxon 10.1.

saxon10
Any JAXP compliant processor Any processor extending javax.xml.transform.TransformerFactory can be used for XSLT 1.0 transformations with p:xslt. Just using the "jaxp" shortcut will cause the default TransformerFactory of your Java system to be loaded. If you want a special implementation/extension, you can use jaxp:class-name, where class-name has to be the full Java name of the extending class. Examples are:
  • org.apache.xalan.processor.TransformerFactoryImpl

  • net.sf.saxon.TransformerFactoryImpl

Of cause you have to make sure that the relevant Java classes are on Java's classpath.
jaxp

Currently Saxon 9.9 is used a default setting, so if you do not set an XSLTConnector, this version will be used. Please mind that you have to put the requested XSLT processor on the Java classpath to. The easiest way is to just drop the JAR-file into folder “MorganaXProc-IIIse_lib”. As the classes in the different version of an XSLT processor typically overlap, it is in general a risky strategy to have different versions of the same processor on the classpath. The current search order for a Saxon implementation is saxon9ee.jar, saxon9pe.jar, saxon9he.jar, saxon-ee-10.0.jar, saxon-pe-10.0.jar, saxon-he-10.0.jar, saxon-ee-10.1.jar, saxon-pe-10.1.jar and saxon-he-10.1.jar. MorganaXProc-IIIse will use the first Saxon version found by this search order.

Another problem in setting the configuration for XSLTConnector is that MorganaXProc-IIIse will always use the same instance of a Saxon processor if it is used for p:xslt and p:xquery. Therefor it is not possible to use Saxon 9.9 for XSLT processing and Saxon 10 for XQuery processing or the other way around.

Selecting the XQueryConnector

With -xquery-connector=java-classname-or-shortcut you can select the XQuery processor to be used in p:xquery in the same ways, an XSLTConnector is used for p:xslt. Currently the following connectors are supported:

processor java classname shortcut
Saxon 9.9

com.xml_project.morganaxproc3.saxon99connector.Saxon99XSLTConnector (Recommended to be used with Saxon 9.9.1.7)

saxon99
Saxon 10

com.xml_project.morganaxproc3.saxon10connector.Saxon10XSLTConnector (Tested with Saxon 10.0 and with Saxon 10.1)

saxon10

Currently version Saxon 9.9 is used as default. Please see the hints on Saxon as XSLTConnector for further configuration details.

Loading configuration for XSLT- and XQuery processors

Some XSLT- and XQuery processors accept configuration files to control their settings. With -xslt-config=uri-or-path and -xquery-config=uri-or-path you can provide such files. MorganaXProc-IIIse does not do anything with these files but just pass them through to the XSLT- or XQuery processors when they are instantiated. The files nature and the semantics of its content is therefor completely determined by the receiving product. Please see its documentation.

As stated above, MorganaXProc-IIIse will use the only one processor instance if you choose to do p:xslt and p:xquery with Saxon. If you do so, please make sure to give the same configuration file for -xslt-config and -xquery-config. As only one Saxon instance is created, only one of them will be used, but as it is not predictable, whether this will be done by a p:xslt or a p:xquery.

Selecting the Schematron processor

Using -schematron-connector=java-classname-or-shortcut you can select the Schematron processor used in p:validate-with-schematron. Currently the following Schematron processors are supported:

processor java classname shortcut
SchXSLT com.xml_project.morganaxproc3.validation.support.SchXSLTAdapterForSchematron schxslt
Skeleton XSLT implementation com.xml_project.morganaxproc3.validation.support.ISOSkeletonAdapterForSchematron skeleton

Currently the SchXSLT connector is default. Please note that MorganaXProc-III only provides connectors to the two implementations, but not the implementation itself. If you want to use p:validate-with-schematron in your pipeline, you have to download an implementation yourself and make their paths known to MorganaXProc-III. For details please see instructions.

(Re-) directing message from steps and stylesheets

Using the common attribute [p:]message pipeline author can define messages to be printed on some output channel. The same is true for xsl:message within XSLT stylesheet. Using -messages on CLI (or a message element in configuration document) users can choose whether and where these message are printed. If the value of messages is off, no messages will be printed. The values std:out or std:err tell MorganaXProc-III to print the messages to the standard output stream or the standard error stream. Any other value will be interpreted as a filepath or a URI of a ressource to where the messages should be written. If the path/URI is relative it will be resolved against the current working directory (for the CLI option) or the base URI of the configuration document. Any existing resource with the given name is overwritten without warning.

Switches

Switches on the command line could be used to control the running behaviour of MorganaXProc-IIIse. They could appear anywhere on the command line after the specification of the pipeline to run. Currently the following switches are available:

-cp

As said above, MorganaXProc-IIIse usually does not create any folder necessary to store documents appearing on an output port to a specific location in your file system. This behaviour can be overridden by using switch -cp (read: “create paths”).

-silent

If you use -silent on the command line, no additional information will be written to the console while running the pipeline. This might be handy if you want to pipe the console output to a file containing just the document(s) appearing on an output port.

-dump-binary

In the standard configuration MorganaXProc-III does not output binary documents on standard output. Information about the document is printed instead. To enable output of binary documents on standard output use -dump-binary.

-split-sequence

Binding on output port to a path or URI on the CLI means that the content of the respective port is written to a file. This can be inconvenient if a sequence appears on the port. By using -split-sequence you can advice the CLI to create a new file for each document appearing on a sequence port. The CLI will take the path or URI given with -output:port= as a blueprint to create the effective file name: If the path contains a ., -x is inserted before the dot. (x here stands for the number of the respective document in the sequence, counting from 1). If there is no dot, -x is simply attached to the path.

-mopl

Using this switch will output the MoPL pipeline generated by the XProc 3.0 compiler on the console. This is currently a purely debugging feature and might not be of public interest.

-graph

If you use this switch on the command line, you get a textual representation of the execution graph created by the XProc 3.0 compiler. This might be helpful if your pipeline has an unexpected behaviour. For every step's input and output port it is listed how they are connected. Of course the current format is not the last word on this. Some graphical representation might surely be more useful.

-debug

Turns on debugging information of running the compiled XProc 3.0 pipeline in MoPL. It displays every message send in the running MoPL pipeline together with it's sender and it's receiver. This is currently only used for internal debugging purposes of MorganaXProc-IIIse and might not be useful to the public.

-no-run

Tells the XProc processor just to perform a static analysis of the supplied pipeline, but not to run it. This switch can be handy if you just want to check whether your pipeline is correct, but not start a long execution process.

-indent-errors

This switch will turn on indentation for dynamic errors printed on the console.

Chapter 2. Using a configuration file

Setting the configuration file on the command line

Additional to the explicit settings on the command line, MorganaXProc-IIIse also allows you to bundle specific configuration settings in a file. This is done by using -config=uri-or-path as first element on the command line. If the given URI or path is relative it is resolved against the current working directory. You can find an example of a configuration file in your distribution folder. All switches and configuration controls can be used in the configuration document. The local name of the elements are the same as on the command line without the trailing “-”.

As the configuration file has to be the first element in the command line settings, latter settings for the same switch or control will override the settings in the configuration document.

Configuration of XSLT based schematron implementations

MorganaXProc-III supports XSLT based schematron implementations like SchXSLT and Skeleton XSLT implementation. These implementation are based on a set of XSLT stylesheets needed to perform the actual validation. You can place the respective files anywhere on your file system, but please make sure you do not change the file names. In order for MorganaXProc-IIIse to find the relevant file on your file system, you have to register the relevant folders with the processor. In order to do so, the configuration file loaded by MorganaXProc-IIIse has to contain one or more of the following elements:

  • mox:path_to_SchXSLT_1: Path to the folder containing the SchXSLT files for XSLT 1.0, e.g. schxslt-1.4.5-sources/xslt/1.0.

  • mox:path_to_SchXSLT_2: Path to the folder containing the SchXSLT files for XSLT 2.02, e.g. schxslt-1.4.5-sources/xslt/2.0.

  • mox:path_to_iso_skeleton_schematron_1: Path to the folder with the Skeleton files for XSLT 1.0, e.g. ISO_SKELETON_SCHEMATRON_1.

  • mox:path_to_iso_skeleton_schematron_2: Path to the folder withe the Skeleton files for XSLT 2.0, e.g. ISO_SKELETON_SCHEMATRON_2.

If a provided path is relative, it is made absolute against the URI of the configuration file the respective element is contained in.