TBX Dialects

Dialects are divided into two categories: public and private.

Public dialects are endorsed and promoted for general use on this website.  Public dialects are the recommended dialects to use for terminology exchange. They are created from public modules. Maintenance and support of these dialects is approved by the TBX Council. Tools and utilities hosted on TBXinfo.net are developed to work best with public dialects. Regarding the public dialects, TBX-Min and TBX-Basic build upon and extend the previous dialect.

Private dialects are not actively endorsed or promoted by TBXinfo.net, however a list of known private dialects is provided as a service (see Private Dialects section below). The burden of maintenance falls on the creator of the dialect. Therefore, TBXinfo.net cannot guarantee the stability or reliability of private dialects. If a private dialect becomes widely adopted, it is possible for it to become a public dialect. To be highly successful and useful, a private dialect should ideally represent a superset of one of the public dialects, such as of TBX-Basic.

There are two XML styles that can be used to represent terminological data: DCA (data category as attribute) and DCT (data category as tag). See the DCA vs. DCT page for more information.

Public Dialects

TBX-Core

This section is intended for users, implementors, and creators: those who intend to learn about the core of TBX or create new dialects using TBX-Core as a reference or starting point. All dialects of TBX must comply with the core.

TBX-Core can be thought of as dialect of TBX which uses only the Core module. Unlike other dialects, it looks the same in both "DCA" and "DCT" styles. However, like the DCA styles of the other dialects, TBX-Core uses an integrated RNG schema, a modified version of the Core module RNG which disallows inclusion of any data categories which are not explicitly defined in the Core module.

For the most up-to-date schemas and definitions, see the TBX-Core GitHub page (https://ltac-global.github.io/TBX-Core_dialect).

TBX-Min

This section is intended for Users/Implementors/Creators: those who intend learn about the TBX-Min dialect, validate TBX-Min document instances, or create new dialects using TBX-Min as a reference or starting point.

TBX-Min is a dialect of TBX designed for simple and straightforward storage of bilingual or monolingual glossaries. The primary use case for TBX-Min is to transmit a glossary to a translator, but TBX-Min can also be used by translators to submit glossaries they have created.

For the most up-to-date schemas and definitions, see the TBX-Min GitHub page (https://ltac-global.github.io/TBX-Min_dialect).

TBX-Basic

This section is intended for Users/Implementors/Creators: those who intend learn about the TBX-Basic dialect, validate TBX-Basic document instances, or create new dialects using TBX-Basic as a reference or starting point.

TBX-Basic is the primary dialect of terminology exchange. TBX-Basic is designed to efficiently store a large quantity of terminology glossaries in a straightforward XML format. It can be used to handle monolingual, bilingual, or multilingual glossaries.

For the most up-to-date schemas and definitions, see the TBX-Basic GitHub page (https://ltac-global.github.io/TBX-Basic_dialect).

Creating a New Dialect

This section is intended for Creators of TBX dialects

Notice: Information on this website about dialect creation is not intended to be stand-alone. It is assumed a dialect creator has access to a legally purchased copy of ISO 30042/DIS:2018.

  1. Select from existing data category modules on tbxinfo.net/tbx-modules the modules which contain the data categories needed for this new dialect.
    • If a required data category or several data categories are not included by any combination of modules, new modules can be created following the appropriate module creation guidelines.
  2. The dialect is defined as the combination of the core and the modules which have been selected.
  3. The dialect must be given a name according to the naming convention:    TBX-[UniqueName]
    • Examples:
      • TBX-Micro
      • TBX-Basic
      • TBX-Min
    • Invalid Examples:
      • Micro-TBX
      • Micro
      • TBX – Micro Terminologies
  4. The RNG/XSD validation schemas for the dialect must have the TBX core as the foundation.
  5. A Schematron may be necessary for co-constraint validation.

 

Validating Dialects

This section is intended for Implementors/Creators: those who intend to validate TBX document instances, or create validation schemas for a TBX dialect.

For DCA dialects, the recommended validation method is to create an integrated schema from the TBX Core RNG.  The TBX Core RNG provides extension points which allow for ease of defining available Data Categories.  For each of the classification elements (<descrip>, <admin>, <termNote>, etc.) the exact allowed data categories — as defined in the Modules which make up a dialect — must be defined as the permitted values of the @type attribute of the classification element to which the data categories belong.  The sections defining these @type attributes can be found at the end of the TBX Core RNG in the section which is labeled as follows:

<!-- Extension Points for DCA type values. If co-constraint validation is required, such constraints must be supplied via a Schematron file -->

For example, the section for the @type attribute of <descrip> looks like this:

<define name="descrip.types">
    <choice>
        <documentation xmlns="http://relaxng.org/ns/compatibility/annotations/1.0">
        <p>For an integrated schema for use in DCA,
        replace <empty/> with the allowed values of the "type" attribute.</p>
        </documentation>
        <text/>
    </choice>
</define>

To define the allowed <descrip> classified data categories, the <empty/> tag must be replaced by <value> elements which define the names of the permitted data categories:

<define name="descrip.types">
    <choice>
        <documentation xmlns="http://relaxng.org/ns/compatibility/annotations/1.0">
        <p>For an integrated schema for use in DCA,
        replace <empty/> with the allowed values of the "type" attribute.</p>
        </documentation>
        <value>subjectField</value>
        <value>definition</value>
    </choice>
</define>

To ensure that rogue data categories do not pass validation for your specific dialect, change the <text/> element to <empty/> for classification elements which have no permitted data categories in this dialect:

<define name="descripNote.types">
    <choice>
        <documentation xmlns="http://relaxng.org/ns/compatibility/annotations/1.0">
        <p>For an integrated schema for use in DCA,
        replace <empty/> with the allowed values of the "type" attribute.</p>
        </documentation>
        <empty/>
    </choice>
</define>

If the data categories which are included in the dialect have restricted picklist values (e.g., "subjectField" must contain "manufacturing" or "finance", etc.), these are easily defined in a Schematron file.  The Schematron file can also be used to enforce the name of the dialect and the style of the dialect (DCA/DCT):

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process">
    <ns uri="urn:iso:std:iso:30042:ed-2" prefix="tbx" />

    <pattern id="coreEnforecement">
        <rule context="tbx:termNote">
            <assert test="parent::tbx:termSec or parent::tbx:termNoteGrp/parent::tbx:termSec">Any termNote is only allowed at the termSec level.</assert>
        </rule>
        <rule context="tbx:*[@type]">
            <assert test="@type != ''">Data category must be declared.  If no permitted data categories are listed in the grammar schema, blank values are also not allowed.</assert>
        </rule>
    </pattern>

    <pattern id="dialectEnforcement">
        <rule context="tbx:tbx">
            <assert test="attribute::type='TBX-Micro'">The name of this dialect should be TBX-Micro</assert>
            <assert test="attribute::style='dca'">The style of this dialect should be declared as 'dca'</assert>
        </rule>
    </pattern>
    
    <!-- Fictional Module Rules -->
    <pattern id="module.fictional.descrip">
        <rule context="tbx:descrip[@type='definition']">
            <assert test="parent::tbx:conceptEntry or parent::tbx:langSec or parent::tbx:descripGrp/parent::tbx:conceptEntry or parent::tbx:descripGrp/parent::tbx:langSec">Definition must appear at the conceptEntry or langSec levels.</assert>
        </rule>
        <rule context="tbx:descrip[@type='subjectField']" >
            <assert test="parent::tbx:conceptEntry or parent::tbx:descripGrp/parent::tbx:conceptEntry">Subject Field can only appear at conceptEntry level.</assert>
            <assert test=".='manufacturing' or .='finance'">The subjectField must be either 'manufacturing' or 'finance'.</assert>
        </rule>
    </pattern>
    <pattern id="module.fictional.termNote">
        <rule context="tbx:termNote[@type='termType']">
            <assert test=".='fullForm' or .='abbreviatedForm'">The termType must be either 'fullForm' or 'abbreviatedForm'</assert>
        </rule>
    </pattern>
</schema>

Using Oxygen XML Editor (or similar program), it is then possible to validate an instance of TBX against the appropriate RNG Schema (modified to define data categories) and Schematron file.

Private Dialects

For a list of private dialects, see the private dialects page.

Back to Home

Last updated: June 18, 2024 at 1:54 am

© 2021 LTAC Global, see About Us page for details on Licensing