Skip to content

Document contents โ€‹

You can supply custom XSLT files to transform the document contents and metadata.

Only for XML

We try to detect whether documents are actually XML before applying the transformation. Documents of other formats will be displayed on the page as unformatted text.
Metadata transformation is always available, regardless of the document format.

File location
/etc/projectConfigs/
โ””โ”€ 
corpus-1/
   โ”œโ”€ 
search.xml
   โ”œโ”€ 
help.inc
   โ”œโ”€ 
about.inc
   โ”œโ”€ 
article.xsl
   โ”œโ”€ 
meta.xsl
   โ”œโ”€ 
static/
   |  โ”œโ”€ 
locales/
   |  |  โ”œโ”€ 
en-us.json
   |  |  โ””โ”€ 
...
   |  โ””โ”€ 
...
   โ”œโ”€ 
corpus-2/
   โ”œโ”€ 
corpus-3/
   โ”œโ”€ 
...
   โ”œโ”€ 
default/
   |  โ”œโ”€ 
search.xml
   |  โ””โ”€ 
...
   โ””โ”€ 
...
<--- The location set in the corporaInterfaceDataDir setting
<--- Exact name/ID of the corpus as in BlackLab
ย 
ย 
ย 
ย 
ย 
ย 
<--- Language files for the interface, specific to this corpus
ย 
ย 
<--- Anything else you want to make available client-side
ย 
ย 
ย 
<--- Fallbacks / Defaults go here
ย 
ย 
ย 
Example customized document

example customized document

There are two files that are used to transform the document contents and metadata respectively:

  • article.xsl for the document contents
  • meta.xsl for the document metadata

Default behavior โ€‹

Normally you don't have to write your own article.xsl file.
BlackLab has the ability to generate a default transformation based on the .blf.yaml config. If that fails (because the .blf.yaml no longer exists, for example) we will fall back to just outputting all text() nodes.

For metadata we provide a built-in meta.xsl that just outputs a table with all metadata in your documents.

Document Contents (article.xsl) โ€‹

  1. First, create an article.xsl file in the customization directory (see above).
  2. When the user opens a document, the contents of the document will be ran through the xslt processor (saxon 10 currently) before being shown to the user.

Varying document formats in the same corpus

All documents are transformed using the same article.xsl file.
So if your corpus contains document with wildly different XML content, your article.xsl file will have to detect and handle this appropriately.

Splitting up your XSLT

It's possible to import other XSLT files in your article.xsl file.
You can put dependencies anywhere, as long as they are within the corporaInterfaceDataDir directory. E.g. import a shared library xslt file from the default directory:

/etc/projectConfigs/
โ”œโ”€ 
corpus-1/
|  โ””โ”€ 
article.xsl
โ””โ”€ 
_shared/
   โ”œโ”€ 
tei.xsl
   โ””โ”€ 
folia.xsl
<--- the location set in the corporaInterfaceDataDir setting
ย 
ย 
ย 
ย 
ย 
xsl
<!-- article.xsl -->
<xsl:import href="../_shared/tei.xsl" />

Document Metadata (meta.xsl) โ€‹

The meta.xsl file is used to transform the metadata of documents in your corpus. The output is shown under the metadata tab on the document's page. You can customize this file to change how the metadata is displayed. The metadata is actually the metadata BlackLab has about the document, not from the document contents.
The built-in meta.xsl file will simply generate a table with all the metadata fields and their values, which is usually sufficient.

โ„น๏ธ BlackLab's document metadata schema

User corpora, priority, fallbacks โ€‹

Like for all corpora files, the default directory is also used for the article.xsl and meta.xsl files.
There is a special behavior, where the article.xsl file name can be suffixed with a specifier, e.g. article_tei.xsl or article_folia.xsl.

The suffix should match exactly the id of the format used to create the user corpus. This is the name of the .blf.yaml file, minus the extension. This allows you to provide some extra built-ins for data formats you might expect your users to upload.

.blf.yaml file Format name article.xsl file
tei.blf.yaml tei article_tei.xsl
folia.blf.yaml folia article_folia.xsl
myformat.blf.yaml myformat article_myformat.xsl
- (any) - (any) article.xsl

Priority of XSLT files โ€‹

The first match will be chosen:

  1. The article_<format>.xsl file in the corpus's own directory (e.g. /etc/projectConfigs/corpus-1/article.xsl).
  2. The article_<format>.xsl file in the default directory (e.g. /etc/projectConfigs/default/article_tei.xsl).
  3. The article.xsl file in the corpus's own directory (e.g. /etc/projectConfigs/corpus-1/article.xsl).
  4. The article.xsl file in the default directory (e.g. /etc/projectConfigs/default/article.xsl).
  5. The BlackLab generated file (if available)
  6. The article_<format>.xsl built-in
  7. The article.xsl built-in

Apache license 2.0