MorganaXProc-IIIse User's manual (draft)

Achim Berndzen

<xml-project />

Table of Contents

1. Using the command line interface (CLI)
Introduction
Running a pipeline from the command line
Binding an input port
Specifying target for an output port
Binding values to options
Setting static options
Setting the configuration for MorganaXProc-IIIse
Setting the catalog resolver
Selecting the XSLTConnector
Selection the XQueryConnector
Loading configuration for XSLT- and XQuery processors
Switches
2. Using a configuration file
Setting the configuration file on the command line

Chapter 1. Using the command line interface (CLI)

Introduction

The most basic way to run MorganaXProc-IIIse with the command line interface is this: Change to the folder containing file MorganaXProc-IIIse.jar and then start the program using java -jar.

Here is an example supposing you stored the JAR-file in folder /users/me/MorganaXProc:

cd /users/me/MorganaXProc
java -jar -javaagent:MorganaXProc-IIIse_lib/quasar-core-0.7.9.jar MorganaXProc-IIIse.jar 
pipeline.xpl
=================================
MorganaXProc-IIIse 0.9.1.8-beta
Copyright 2011-2020 by <xml-project /> Achim Berndzen
=================================

Hello world. This is an XProc 3.0 pipeline running.

As you can see, MorganaXProc-IIIse needs to be run with a javaagent option to do some instrumentation. As a pipeline author you do not have to worry about this. The only thing you have to remember is, that the javaagent is required and MorganaXProc-IIIse will not run without it. The rest of the command line is pretty straight forward: We name the Jar-file to be run and the name the XProc pipeline we want to run.

As this way to call MorganaXProc-IIIse from the command line is very long and a typing error will ruin everything, there is a better way to start XProc pipelines from the command line. If you look into the folder containing the JAR-file, you will find two files named “Morgana.bat” (for Windows) and “Morgana.sh” (for MacOS and other UNIX based operating systems). Using these batch files to start MorganaXProc-IIIse requires significantly less typing on the command line:

cd /users/me/MorganaXProc
sh Morgana.sh pipeline.xpl
=================================
MorganaXProc-IIIse 0.9.1.8-beta
Copyright 2011-2020 by <xml-project /> Achim Berndzen
=================================

Hello world. This is an XProc 3.0 pipeline running.

This was easy, wasn't it?

Running a pipeline from the command line

Running an XProc pipeline might be very complex because you do not only want to tell MorganaXProc-IIIse the pipeline's place in the file system. You may also want to control which documents appear on the different input ports of your pipeline and where the document appearing on the output ports should be stored. And then there are option and static options in your pipeline you might wish to give values for. Let us see how to do this and thereby understand in more detail how MorganaXProc-IIIse interprets the tokens appearing on the command line.

As we have already seen, you specify the name of the XProc pipeline you want to run by simply writing its name as the first (or –as we shall see later– the second) token. The token either may take the form of a URI (like file:///users/me/MorganaXProc/pipeline.xpl) or as a file path (either in the Windows form if you are on Windows or the UNIX form if your operating system is Unix like). If the pipeline's path is relative, it will be resolved against the current working directory.

After the pipeline name you can specify the bindings for input ports, output port, option and static options in random order.

Binding an input port

To specify a binding for an input port you have to use -input:port-name=uri-or-path. For example you could type “-input:source=doc.xml” to bind the document in file “doc.xml” in your current working directory to a pipeline's input port called “source”. Here again, like with the XProc pipeline, you might either give a URI or a path in your file system. If it is relative, it will be resolved against the current working directory. If you want to bind more than one document to a (sequence) port, just add more -input on the command line. The order in which these documents appear on the sequence port is the same as the one you give the -inputs on the command line.

Specifying target for an output port

To specify where the documents appearing on an output port should be stored, use -output:port-name=uri-or-path. The mechanism for resolving a relative URI or path is the same as for the pipeline or -input: The current working directory is used.

If you do not specify a -output for an output port declared in your XProc pipeline, the document(s) appearing on this port will be written to you command line shell. There are three tricky things with -output worth to remember:

  1. If more than one document appears on an output port for which a -output setting is given, all documents are written to the specified resource in the order they appear on the port. As a consequence, the resource content will not be well-formed.

  2. Any existing content in a resource named with -output will be overridden without warning, so please be careful.

  3. MorganaXProc-IIIse will not create any non-existing folder/directory named in the path. If you specify a path or URI with one or more non-existing folders, an error will be raised. See below about -cp to see how to change this behaviour.

Binding values to options

Additionally to input and output ports, an XProc pipeline may also specify options you might want to set using the command line interface. To do this use -option:option-name=string-or-XPathExpression. Here are some examples:

-option:opt=value
-option:Q{http://some-namespace}opt=5+3
-option:map=map{'key':'value'}
-option:date=2020-03-28T13:53:00
-option:name=Q{'http://some-namespace')name
-option:numbers='(1, 1+1, 2+1)'
-option:numbers=(1,1+1,2+1,2+2)

As you can see from the examples, the option's name can either be a simple string (if the name of the option to be set is in no namespace) or be given using the Q{}-notation to name an option in a namespace. How the string after “=” is interpreted depends on the option's type:

  • If the option in the pipeline is declared without a type, the string following “=” is taken as parameter of xs:untypedAtomic().

  • If the option's type is an atomic type (optionally with occurrence indicator “?”) the processor will try to cast the given string to an instance of the option's type.

  • If the option's type is xs:QName or xs:QName? you can use the Q{}-notation.

  • In all other cases the processor will try to interpret the string as an XPath expression, evaluate it and cast the resulting sequence to an instance the option's type.

Caveat: Please remember that the command line processor splits up the command line into tokens using “ ” (space) as a separator. As a consequence you can not have a “ ” in the string specifying the option's value. If you want to use a “ ” you have to put quotes around the whole string. See the last but one example above and compare it to the last one.

One last thing: If you want to load a document (XML, text, or JSON) from a file, use the corresponding XPath function.

Setting static options

In XProc 3.0 a pipeline may declare static options which are supplied to pipeline's static analysis. As static options are a special kind of option, MorganaXProc-IIIse uses the same syntax to parse the command line. The only difference is, that you have to use “-static” instead of “-option”.

Unlike ordinary or dynamic options, static options can also be set from a document. This can be handy if you want to run your pipeline with different sets of basic configurations, say one for debugging and one for production. To load a set of values for your pipeline's static options, use -statics=uri-or-path. If the URI or path is relative, it will be resolved against the current working directory. Please see the file “static-defs.xml in your distribution for an example. Your will notice that configuration is cascading because one configuration file can hold a reference to another configuration file to be loaded.

MorganaXProc-IIIse will process -static and -statics from left to right so the later settings will override earlier settings for a static option with the same name.

Setting the configuration for MorganaXProc-IIIse

Using the command line interface you can also set specific configurations for internal features of MorganaXProc-IIIse. Those control can appear everywhere on the command line after the specification of the pipeline to run.

Setting the catalog resolver

MorganaXProc-IIIse uses XMLResolver (developed by Norm Tovey-Walsh) as catalog system to resolve resources used in your pipeline. Use -catalogs=uri-or-path to set an initial XML catalogs as a semicolon-separated list of URIs or paths. If a path or a URI is relative it is resolved against the current working directory.

Selecting the XSLTConnector

MorganaXProc-IIIse implements a flexible way letting you choose which XSLT processor is used in p:xslt. For each supported XSLT processor MorganaXProc-IIIse provides a connector which needs to be registered to be used. Only one connector can be registered for a specific pipeline run. To register a connector use -xslt-connector=java-classname-or-shortcut. Currently the following XSLT processors are supported:

processor java classname shortcut
Saxon 9.9 com.xml_project.morganaxproc3.saxon99connector.Saxon99XSLTConnector Saxon99
Saxon 10 com.xml_project.morganaxproc3.saxon10connector.Saxon10XSLTConnector Saxon10

Currently Saxon 9.9 is used a default setting, so if you do not set an XSLTConnector, this version will be used. Please mind that you have to put the requested XSLT processor on the Java classpath to. The easiest way is to just drop the JAR-file into folder “MorganaXProc-IIIse_lib”. As the classes in the different version of an XSLT processor typically overlap, it is in general a risky strategy to have different versions of the same processor on the classpath. The current search order for a Saxon implementation is Saxon9EE, Saxon9PE, Saxon9HE, Saxon10EE, Saxon10PE and Saxon10HE. MorganaXProc-IIIse will use the first Saxon version found by this search order.

Another problem in setting the configuration for XSLTConnector is that MorganaXProc-IIIse will always use the same instance of a Saxon processor if it is used for p:xslt and p:xquery. Therefor it is not possible to use Saxon 9.9 for XSLT processing and Saxon 10 for XQuery processing or the other way around.

Selection the XQueryConnector

With -xquery-connector=java-classname-or-shortcut you can select the XQuery processor to be used in p:xquery in the same ways, an XSLTConnector is used for p:xslt. Currently the following connectors are supported:

processor java classname shortcut
Saxon 9.9 com.xml_project.morganaxproc3.saxon99connector.Saxon99XQueryExpression Saxon99
Saxon 10 com.xml_project.morganaxproc3.saxon10connector.Saxon10XQueryConnector Saxon10

Currently version Saxon 9.9 is used as default. Please see the hints on Saxon as XSLTConnector for further configuration details.

Loading configuration for XSLT- and XQuery processors

Some XSLT- and XQuery processors accept configuration files to control their settings. With -xslt-config=uri-or-path and -xquery-config=uri-or-path you can provide such files. MorganaXProc-IIIse does not do anything with these files but just pass them through to the XSLT- or XQuery processors when they are instantiated. The files nature and the semantics of its content is therefor completely determined by the receiving product. Please see its documentation.

As stated above, MorganaXProc-IIIse will use the only one processor instance if you choose to do p:xslt and p:xquery with Saxon. If you do so, please make sure to give the same configuration file for -xslt-config and -xquery-config. As only one Saxon instance is created, only one of them will be used, but as it is not predictable, whether this will be done by a p:xslt or a p:xquery.

Switches

Switches on the command line could be used to control the running behaviour of MorganaXProc-IIIse. They could appear anywhere on the command line after the specification of the pipeline to run. Currently the following switches are available:

-cp

As said above, MorganaXProc-IIIse usually does not create any folder necessary to store documents appearing on an output port to a specific location in your file system. This behaviour can be overridden by using switch -cp (read: “create paths”).

-silent

If you use -silent on the command line, no additional information will be written to the console while running the pipeline. This might be handy if you want to pipe the console output to a file containing just the document(s) appearing on an output port.

-mopl

Using this switch will output the MoPL pipeline generated by the XProc 3.0 compiler on the console. This is currently a purely debugging feature and might not be of public interest.

-graph

If you use this switch on the command line, you get a textual representation of the execution graph created by the XProc 3.0 compiler. This might be helpful if your pipeline has an unexpected behaviour. For every step's input and output port it is listed how they are connected. Of course the current format is not the last word on this. Some graphical representation might surely be more useful.

-debug

Turns on debugging information of running the compiled XProc 3.0 pipeline in MoPL. It displays every message send in the running MoPL pipeline together with it's sender and it's receiver. This is currently only used for internal debugging purposes of MorganaXProc-IIIse and might not be useful to the public.

Chapter 2. Using a configuration file

Setting the configuration file on the command line

Additional to the explicit settings on the command line, MorganaXProc-IIIse also allows you to bundle specific configuration settings in a file. This is done by using -config=uri-or-path as first element on the command line. If the given URI or path is relative it is resolved against the current working directory. You can find an example of a configuration file in your distribution folder. All switches and configuration controls can be used in the configuration document. The local name of the elements are the same as on the command line without the trailing “-”.

As the configuration file has to be the first element in the command line settings, latter settings for the same switch or control will override the settings in the configuration document.