Enhancing MorganaXProc With Third Party Software

MorganaXProc runs most XProc pipelines out of the box with the standard distribution. However there are some steps or features of steps that only can be processed by MorganaXProc if you install additional software (all under free software licences). Additionally MorganaXProc has some steps, where you are free to choose, whether you want to use the software of the standard distribution or if you want to enhance MorganaXProc's abilities by using alternative software packages.

This document gives you an overview, which XProc steps need support by additional software and for which steps you can choose between different third party software projects:

Two ways to enhance MorganaXProc's functionality:

MorganaXProc offers two ways to enhance its functionality via third party software: You can either download the additional software mentioned below and add the JAR files to CLASSPATH or -more convenient- you can drop the JAR files in a special place, where MorganaXProc will automatically look for them. Both ways have their pros and cons, use whatever way you like. If you decide to use CLASSPATH, please have a look at the file "runMorganaTests.sh" in your downloaded distribution of MorganaXProc.zip for a detailed example.

p:unescape-markup

The step <p:unescape-markup> is used to parse a string as an xml document and return the resulting document on the output port. By default an XProc implementation is supposed to take the supplied string as content type "application/xml". This means, the string's content is taken to be well-formed XML. This will not always be the case, e.g. the document on <p:unescape-mark>'s input port might come from a <p:http-request> delivering a html page.

To cover this kind of cases, MorganaXProc supports the use of a special parser for content-type="text/html", as suggested in XProc: An XML Pipeline Language. You are free to choose any parser you like for "text/html" as long as this parser implements org.xml.sax.XMLReader.

To install a parser/reader for "text/html" the following steps have to be taken:

  1. Download the software you would like to use.
  2. Find the .jar file containing the implementation of org.xml.sax.XMLReader.
  3. Put it (and all other files required to run the .jar file) into folder "Extensions" in your MorganaXProc folder.
  4. Figure out the name of the Java class to implement org.xml.sax.XMLReader.
  5. Write the name of this class into the value attribute of element HTMLParserClass in a configuration document.

This may sound complicated but it is not as soon as you have found out the relevant facts from the software's documentation. For example if you choose to use TagSoup as your parser for "text/html", you just have to download the file "tagsoup-xxx.jar" and put it into the folder "Extensions". The value you have to give to the HTMLParserClass element is "org.ccil.cowan.tagsoup.Parser". That is all. The next time you run MorganaXProc with the modified configuration document, <p:unescape-markup> will work with content-type "text/html" and give you a well-formed version of the html-string on the result port.

As an alternative to TagSoup MorganaXProc was tested with Validator.nu HTML Parser (1.4.1). To use it, put the file "htmlparser-xxx.jar" into the Extension-folder and enter "nu.validator.htmlparser.sax.HtmlParser" as value for HTMLParserClass.

p:validate-with-relax-ng

<p:validate-with-relax-ng> uses a RELAX NG document on port "schema" to validate the document on port "source". In MorganaXProc's standard distribution you will not be able to run XProc pipelines with a <p:validate-with-relax-ng> step. In order to use this kind of document validation (and namespace based validation with pxp:nvdl as well) you will have to install Jing. A RELAX NG validator in Java (Copyright © 2001, 2002, 2003, 2008 Thai Open Source Software Center Ltd).

Installation:

  1. Download the software from the Jing homepage at Github.
  2. Unzip the loaded file and put the resulting folder into the folder "Extensions" in your MorganaXProc folder.
  3. Rename the folder to "jing".
  4. Done! No futher operations are required. XProc pipelines with <p:validate-with-relax-ng> will now perform as expected.

p:validate-with-schematron

MorganaXProc's standard distribution is ready to run XProc pipelines with steps <p:validate-with-schematron> if you install the required files from the Schematron project yourself in the folder "Extensions". Here is how to do it:

  1. Download "iso-schematron-xslt1.zip" or "iso-schematron-xslt2.zip" (supported since MorganaXProc 0.95-10) from the Schematron project download page.
  2. Unzip the downloaded file and put the resulting folder ("iso-schematron-xslt1" or "iso-schematron-xslt2") into the folder "Extensions" of your MorganaXProc folder.
  3. Now MorganaXProc will run every XProc pipeline with steps <p:validate-with-schematron> as expected.

Please mind that you will need to set an XSLT processor for XSLT 2.0 (or higher) as value of XSLTConnector in order to use "iso-schematron-xslt2". Please see instruction for p:xslt.

p:xsl-formatter

Since release 1.0.10 MorganaXProc provides a generic interface for FO-processors named "FOConnector". By implementing this interface you can use any FO-processor with MorganaXProc's implementation of p:xsl-formatter.

The standard distribution of MorganaXProc comes with two implementations of "FOConnector" for Apache™ FOP (Formatting Objects Processor): FOP11Connector or FOP22Connector are needed to use p:xsl-formatter with version 1.1 or version 2.2 of Apache™ FOP. You will find "FOPxxConnector.jar" in folder "Extensions" of the standard distribution. If you only want to use one version of Apache™ FOP, you can remove the other connector from the folder.

To use <p:xsl-formatter> in your XProc pipelines the following installation steps are required:

  1. Download the binaries of FOP (version 1.1 or version 2.2) from the Fop distribution mirror.
  2. Unzip the loaded archive.
  3. Create a new folder named "fop-1.1" or "fop-2.2" in folder "Extensions" in your MorganaXProc folder.
  4. Copy file "fop.jar" from the "build" folder of the FOP distribution into "Extensions/fop"
  5. Copy folder "lib" of the FOP distribution into "Extensions".
  6. Set MorganaXProc's configuration property "FOConnector" to "FOP11Connector" or "FOP22Connector". See documentation on configuration properties for details.
  7. That's all. Now you can use <p:xsl-formatter> in your XProc pipelines with MorganaXProc.

p:xslt

MorganaXProc comes with a ready to run support for <p:xslt> using Java's built in standard mechanism to obtain an XSLT processor. So what ever XSLT processor runs with your Java environment will also run out of the box with MorganaXProc. See Java api doc on how to select which XSLT processor is used.

This approach is easy to use and works pretty well. However there are flaws using Java's standard mechanism to obtain an XSLT processor resulting from the interface implemented. It is basically an interface used to support XSLT (1.0) transformations and therefore it is not possible to implement all XSLT related features of XProc using this interface. Here is a list of the features of <p:xslt> that will not work:

  • you cannot set an initial mode for the stylesheet.
  • you cannot set an initial template for the transformation.
  • you cannot set output's base uri.
  • you will not have a default collection even if using XSLT 2.0.

If you do not need one of these features in your pipelines, MorganaXProc's standard mechanism will work fine for you. For all those who cannot live (or work) with these constraints, MorganaXProc comes with a solution involving third party software, either Xalan-Java by The Apache XML Project or Saxon-HE, the Home Edition version of The Saxon XSLT and XQuery processor, developed by Saxonica.

Here is what to do, if you want to use Xalan-Java as XSLT processor:

  1. Download Xalan-J binaries from the Xalan download mirrors.
  2. Unzip the loaded archive.
  3. Rename the resulting folder to "Xalan" and put it into folder "Extensions" of your MorganaXProc folder.
  4. Write "XalanXSLTConnector" into the value attribute of element XSLTConnector in your configuration document to enable XSLT processing with Xalan-J.

If you choose to use Saxon-HE as your XLST processor for <p:xslt>, the procedure is very similar:

  1. Download Saxon-HE 9.6.x or 9.7.x from its download page.
  2. Unzip the loaded archive.
  3. Rename the resulting folder to "Saxon" (for Saxon 9.6.x) or "Saxon97" (for Saxon 9.7.x) and put it into the folder "Extensions" of your MorganaXProc folder.
  4. Write "SaxonXSLTConnector" or "Saxon96XSLTConnector" into the value attribute of element XSLTConnector in your configuration document to enable XSLT processing with Saxon-HE 9.6.x. To use Saxon-HE 9.7.x you have to select "Saxon97XSLTConnector".

Note: If you have purchased Saxon PE or Saxon EE you just have to create a folder called "Saxon" (for version 9.6.x) or "Saxon97" (for version 9.7.x) in the folder "Extensions" of your MorganaXProc folder and copy the required JAR-files and the Licence file into it.

p:xquery

MorganaXProc comes with a ready to run support for <p:xquery> which does support most but not all features required for an XQuery processor. Since version 0.95-4 there is also an implementation of the optional feature "module import". MorganaXProc's own XQuery processor is always used, when you import xquery functions to XProc's XPath context.

Additionally MorganaXProc offers the possibility to use a variety of free software XQuery processors as an alternative. To choose an alternative, you have to put the connector's name into the value attribute of element XQueryConnector in your configuration document.

  • Of course you can use SaxonHE not only as an XSLT processor but also in <p:xquery>. The connector's name to use for SaxonHE is "SaxonXQueryConnector", "Saxon96XQueryConnector" or "Saxon97XQueryConnector".
  • Another option as XQuery processor is MXQuery. A lightweight, full-featured XQuery Engine. In order to do this, you have to create a folder named "MXQuery" within the folder "Extensions" in your MorganaXProc folder. Download MXQuery and put "mxquery.jar" into the folder "Extensions/MXQuery". Finally you have to set the connector's name to "MXQueryConnector". Please note that this connector does not use MorganaXProc's filesystem, so there is no security control for the documents accessed in a query.
  • MorganaXProc has also a built-in support for xquery processing with BaseX. The XML Database. The file "BaseX.jar" needs to be placed in the folder "Extensions/BaseX". The connector's name for the configuration document is "BaseXConnector". There are two caveats for this XQueryConnector:
    1. This connector does not use MorganaXProc's filesystem, so there is no security check for the documents accessed in the query.
    2. There is no support for the default collection, so using "collection()" in your query will result in an XQuery error.

If the chosen XQuery processor supports versions other than XQuery (1.0), you can use the extension attribute "mox:version" on <p:xquery/> to select one of these versions.

pxp:nvdl

Since MorganaXProc 0.95-9 the proposed extension step <pxp:nvdl/> is available to perform a NVDL (Namespace-based Validation Dispatching Language) validation of documents.

As with p:validate-with-relax-ng MorganaXProc needs Jing. A RELAX NG validator in Java (Copyright © 2001, 2002, 2003, 2008 Thai Open Source Software Center Ltd) to perform namespace based validation of documents. Please see the documentation on p:validate-with-relax-ng for further information.

pxp:compress and pxp:uncompress

The proposed extension steps <pxp:compress/> and <pxp:uncompress/> (supported since MorganaXProc 0.95-9) are used to store compressed data and to expand compressed data. With option "compression-method" pipeline authors are able to control the method used to compress/uncompress the data. MorganaXProc uses "gzip" as default method and this is also the only compression method supported directly.

If you need support for another compression methods, MorganaXProc is prepared to work with Apache Commons Compress™. Using this library, the additional compression methods "bzip2" and "deflate" are available for compressing and uncompressing data. If you install the optional library XZ for Java together with Apache Commons Compress™ compression method "xz" can be used in <pxp:compress/> and <pxp:uncompress/>. Additionally there is read-only support (<pxp:uncompress/>) for methods "LZMA" and "Z". (See documentation of Apache Commons Compress™ for more details.)

Here is what to do to use additional compression methods:

  1. Download Apache Commons Compress™ and unzip the loaded archive.
  2. Rename JAR-file "commons-compress-X.XX.jar" to "commons-compress.jar".
  3. Move the renamed file to the folder "Extensions" of your MorganaXProc folder.
  4. Only if you need support for compression-method "xz":
    1. Download XZ for Java.
    2. Rename JAR-file "xz-X.X.jar" to "xz.jar".
    3. Move the renamed file to the folder "Extensions" of your MorganaXProc folder.