Language resources management -- Multilingual information framework

This International Standard provides a generic platform for modelling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modelling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF, TMX, smilText and ITS.

Gestion des ressources langagières -- Plateforme d'informations multilingues

Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije

Ta mednarodni standard zagotavlja splošno platformo za modeliranje večjezikovnih informacij in upravljanje z njimi na različnih področjih: lokalizacija, prevajanje, multimedijsko označevanje, upravljanje z dokumenti, podpora digitalni knjižnici in aplikacije za modeliranje poslovanja. MLIF (ogrodje za večjezične informacije) zagotavlja metamodel in sklop splošnih podatkovnih kategorij [ISO 12620:2009] za različna področja uporabe. MLIF zagotavlja tudi strategije za interoperabilnost in/ali povezovanje modelov, med drugim XLIFF, TMX, smilText in ITS.

General Information

Status
Published
Publication Date
11-Jun-2013
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
31-May-2013
Due Date
05-Aug-2013
Completion Date
12-Jun-2013

Buy Standard

Standard
ISO 24616:2013 - BARVE
English language
46 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24616:2012
English language
50 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24616:2012 - Language resources management -- Multilingual information framework
English language
42 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24616:2013
English language
46 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)

SLOVENSKI STANDARD
SIST ISO 24616:2013
01-julij-2013
Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije
Language resources management -- Multilingual information framework
Gestion des ressources langagières -- Plateforme d'informations multilingues
Ta slovenski standard je istoveten z: ISO 24616:2012
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24616:2013 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24616:2013

---------------------- Page: 2 ----------------------
SIST ISO 24616:2013

INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01

Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues




Reference number
ISO 24616:2012(E)
©
ISO 2012

---------------------- Page: 3 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

COPYRIGHT PROTECTED DOCUMENT


©  ISO 2012
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO 2012 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Contents Page
Foreword . iv
1  Scope . 1
2  Normative references . 1
3  Terms and definitions . 1
4  Specification principles . 2
4.1  Key standard used in the specification: Unified Modeling Language (UML) . 2
4.2  Metamodel and adornment . 2
4.3  XML serialization . 2
5  Metamodel specification . 2
6  MLIF compliance . 3
7  Metamodel adornment . 3
7.1  Introduction . 3
7.2  General principles concerning the use of W3C generic attributes . 3
7.3  Recommended adornment for GI . 4
7.4  Recommended adornment for GroupC . 4
7.5  Recommended adornment for MultiC . 4
7.6  Recommended and mandatory adornment for MonoC . 5
7.7  Recommended adornment for SegC . 5
7.8  Recommended adornment for HistoC . 5
7.9  Recommended online annotation adornment . 5
7.10  Recommended adornment for localization. 6
7.11  Recommended adornment for internationalization . 6
7.12  Recommended adornment for temporal synchronization . 6
8  Relation with other standards . 6
Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) . 8
Annex B (informative) Example: representing TMX data . 11
Annex C (informative) Example of XLIFF data representation . 14
Annex D (informative) Example: representing smilText data . 18
Annex E (informative) Example of MLIF usage for subtitles (captioning) . 20
Annex F (informative) Using MLIF for MAF data . 26
Annex G (normative) Detailed specification . 27
Bibliography . 42

© ISO 2012 – All rights reserved iii

---------------------- Page: 5 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.

iv © ISO 2012 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24616:2013
INTERNATIONAL STANDARD ISO 24616:2012(E)

Language resources management — Multilingual information
framework
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
3.3
subtitle
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
3.4
working language
language in which linguistic sequences are expressed
© ISO 2012 – All rights reserved 1

---------------------- Page: 7 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.

Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved

---------------------- Page: 8 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
 (Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
 (Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
 (Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
 (Multilingual Component), which groups together all variants of a given textual content;
 (Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
 (History Component), which traces modifications to the component to which it is anchored (i.e.
versioning);
 (Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
 by fully implementing the MLIF metamodel starting at the level of ;
 by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
 the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
 the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
© ISO 2012 – All rights reserved 3

---------------------- Page: 9 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
7.3 Recommended adornment for GI















7.4 Recommended adornment for GroupC

7.5 Recommended adornment for MultiC









4 © ISO 2012 – All rights reserved

---------------------- Page: 10 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
7.6 Recommended and mandatory adornment for MonoC
 att.lang


 att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC







 att.linguistic
 att.xlink
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:




7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
© ISO 2012 – All rights reserved 5

---------------------- Page: 11 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)





7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:


7.11 Recommended adornment for internationalization

7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:



8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
respectively.
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
scenarios.
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
6 © ISO 2012 – All rights reserved

---------------------- Page: 12 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
© ISO 2012 – All rights reserved 7

---------------------- Page: 13 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex A
(informative)

Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
database.
For example, using a translation memory that contains the English sentence "The meal is nice." and its
1)
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:

  creationtool="TRADOS Translator's Workbench for Windows"
  creationtoolversion="Edition 8 Build 863"
  segtype="sentence"
  o-tmf="TW4Win 2.0 Format"
  adminlang="EN-US"
  srclang="EN-GB"
  datatype="rtf"
  creationdate="20100528T144322Z"
  creationid="USER"/>

 
 
  The meal is nice.
 
 
  Le repas est bon.
 
 


To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
procedure:
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.

1)
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
product.
8 © ISO 2012 – All rights reserved

---------------------- Page: 14 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:


 
 
 
 
 


 
  SEMMAR
  20090922T140653Z
 
  The meal is nice.
 
 
  Le repas est bon.
 
 
 
 
  The
      class="word"
   lemma="meal"
   pos="commonNoun"
   tag="#nS">meal
      class="word"
   lemma="be"
   pos="verb"
   tag="#mP #p1 #nS">is
  nice
  .
 
 
      class="word"
   lemma="le"
   pos="definiteArticle"
   tag="#gM #nS">Le
      class="word"
   lemma="repas"
   pos="commonNoun"
   tag="#gM #nS">repas
      class="word"
   lemma="être"
© ISO 2012 – All rights reserved 9

---------------------- Page: 15 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
   pos="verb"
   tag="#mP #p1 #nS">est
      class="word"
   lemma="bon"
   pos="qualifierAdjective"
   tag="#gM #nS">bon
  .
 
 



10 © ISO 2012 – All rights reserved

---------------------- Page: 16 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex B
(informative)

Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
 maps onto the element;

maps onto the element;
 is a container for the element and maps onto the element;
 maps onto the element;
 maps onto the element;
 maps onto the element;
 of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
 The "creationtool" attribute maps onto the element;
 The "creationdate" attribute maps onto the element;
 The "tuid" attribute maps onto the element within MultiC.
 The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
© ISO 2012 – All rights reserved 11

---------------------- Page: 17 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.

  adminlang="en"
  creationdate="20040731T164933Z"
  creationtool="Heartsome TM Server"
  creationtoolversion="1.0.1"
  datatype="xml"
  o-tmf="unknown"
  segtype="block"
  srclang="*all*"/>

 
 
  Le processus de contrôle de
      qualité en dix étapes qu'il a créé il y a plus
     de 1300 ans est beaucoup plus complet et précis que ceux
     existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300 years
     ago is far more thorough and exacting than any existing
     today.
 
 
  El proceso de control de
      calidad en diez pasos que inició hace más de
     1300 años es mucho más completo y preciso que los que
     existen en la actualidad.
 
 
  Il suo metodo di controllo di qualità in 10 fasi risale a più
     di 1300 anni fa ed è molto più accurato e preciso di
     qualsiasi metodo attuale.
 
 
  그가 1300여년 전 시작한 10단계 품질
      관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
 
 


The corresponding representation in MLIF default representation is as follows:


 TMX
 1.4
 20040731T164933Z
 Heartsome TM Server
 1.0.1


 
12 © ISO 2012 – All rights reserved

---------------------- Page: 18 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
  1091303313515
  20020930T004233Z
 
  Le processus de contrôle
      de qualité en dix étapes qu'il a créé il y a
     plus de 1300 ans est beaucoup plus complet et précis que
     ceux existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300
     years ago is far more thorough and exacting than any
     existing today.
 
 


B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).

Figure B.1 — TMX and MLIF interaction
© ISO 2012 – All rights reserved 13

---------------------- Page: 19 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex C
(informative)

Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
C.2 Ma
...

МЕЖДУНАРОДНЫЙ ISO
СТАНДАРТ 24616
Первое издание
2012-09-01


Управление языковыми ресурсами.
Многоязыковая информационная
система
Language resources management – Multilingual information framework



Ответственность за подготовку русской версии несѐт GOST R
(Российская Федерация) в соответствии со статьѐй 18.1 Устава ISO

Ссылочный номер
ISO 24616:2012(R)

©
 ISO 2012

---------------------- Page: 1 ----------------------
ISO 24616:2012(R)

ДОКУМЕНТ ЗАЩИЩЁН АВТОРСКИМ ПРАВОМ


©  ISO 2012
Все права сохраняются. Если не указано иное, никакую часть настоящей публикации нельзя копировать или
использовать в какой-либо форме или каким-либо электронным или механическим способом, включая фотокопии и
микрофильмы, без предварительного получения письменного согласия ISO по указанному ниже адресу или
организации-члена ISO в стране запрашивающей стороны.
Бюро ISO по авторским правам:
Case postale 56  CH-1211 Geneva 20
Тел.: + 41 22 749 01 11
Факс: + 41 22 749 09 47
Эл. почта: copyright@iso.org
Веб-сайт: www.iso.org
Опубликовано в Швейцарии
ii © ISO 2012 – Все права сохраняются

---------------------- Page: 2 ----------------------
ISO 24616:2012(R)
Содержание Страница
Предисловие .iv
1 Область применения .1
2 Нормативные ссылки .1
3 Термины и определения .1
4 Принципы описания .2
4.1 Основополагающий стандарт спецификаций: универсальный язык моделирования
UML .2
4.2 Метамодель и стилистический орнамент .2
4.3 XML-сериализация .2
5 Спецификация метамодели .2
6 Применимость MLIF .3
7 Стилистический орнамент метамодели .4
7.1 Общие замечания .4
7.2 Общие принципы использования групповых атрибутов W3C .4
7.3 Рекомендуемый стилистический орнамент для компонента GI .4
7.4 Рекомендуемый стилистический орнамент для компонента GroupC .5
7.5 Рекомендуемый стилистический орнамент для компонента MultiC .5
7.6 Рекомендуемый стилистический орнамент для компонента MonoC .5
7.7 Рекомендуемый стилистический орнамент для компонента SegC .5
7.8 Рекомендуемый стилистический орнамент для компонента HistoC .6
7.9 Рекомендуемый стилистический орнамент для оперативно доступной аннотации .6
7.10 Рекомендуемый стилистический орнамент для локализации .6
7.11 Рекомендуемый стилистический орнамент для интернационализации .6
7.12 Рекомендуемый стилистический орнамент для синхронизации во времени .7
8 Связь с другими стандартами .7
Приложение A (информативное) Пример использования MLIF для автоматизированного
перевода .8
Приложение B (информативное) Пример: представление данных TMX . 11
Приложение C (информативное) Пример представления данных в формате XLIFF. 14
Приложение D (информативное) Пример представления данных smilText . 18
Приложение E (информативное) Пример использования MLIF для создания субтитров . 20
Приложение F (информативное) Использование MLIF применительно к данным MAF . 26
Приложение G (информативное) Детализированная спецификация . 27
Библиография . 42

© ISO 2012 – Все права сохраняются iii

---------------------- Page: 3 ----------------------
ISO 24616:2012(R)
Предисловие
Международная организация по стандартизации (ISO) является всемирной федерацией национальных
организаций по стандартизации (комитетов-членов ISO). Разработка международных стандартов
обычно осуществляется техническими комитетами ISO. Каждый комитет-член, заинтересованный в
деятельности, для которой был создан технический комитет, имеет право быть представленным в этом
комитете. Международные правительственные и неправительственные организации, имеющие связь с
ISO, также принимают участие в работе. ISO работает в тесном сотрудничестве с Международной
электротехнической комиссией (IEC) по всем вопросам стандартизации в области электротехники.
Проекты международных стандартов разрабатываются согласно правилам, приведѐнным в Директивах
ISO/IEC, Часть 2.
Разработка международных стандартов является основной задачей технических комитетов. Проекты
международных стандартов, принятые техническими комитетами, рассылаются комитетам-членам на
голосование. Для публикации в качестве международного стандарта требуется одобрение не менее
75 % комитетов-членов, принявших участие в голосовании.
Принимается во внимание тот факт, что некоторые из элементов настоящего документа могут быть
объектом патентных прав. ISO не принимает на себя обязательств по определению отдельных или
всех таких патентных прав.
ISO 24616 был подготовлен Техническим комитетом ISO/TC 37, Терминология и другие языковые и
информационные ресурсы, Подкомитетом SC 4, Управление языковыми ресурсами.

iv ISO 2012 – Все права сохраняются

---------------------- Page: 4 ----------------------
МЕЖДУНАРОДНЫЙ СТАНДАРТ ISO 24616:2012(R)

Управление языковыми ресурсами. Многоязыковая
информационная система
1 Область применения
Настоящий Международный стандарт обеспечивает универсальную платформу для моделирования
многоязыковой информации и управления ею в самых разных сферах: локализации, перевода,
мультимедийного аннотирования, организации документооборота, ведения цифровых библиотек и в
прикладных системах моделирования хозяйственной деятельности предприятий. Многоязыковая
информационная система MLIF (multilingual information framework) предоставляет соответствующую
высокоуровневую модель (метамодель) и множество универсальных категорий данных [согласно
ISO 12620:2009] для многочисленных прикладных областей. Она обеспечивает также необходимые
стратегии взаимодействия и/или связывания различных моделей, включая, в частности, широко
используемые модели XLIFF, TMX, smilText и ITS.
2 Нормативные ссылки
Перечисленные ниже ссылочные документы обязательны для применения данного документа. В
случае датированных ссылок действующим является только указанное издание. Применительно к
недатированным ссылочным документам применяются их самые последние издания (включая все
последующие изменения):
ISO 12620:2009; Терминология, другие языковые ресурсы и ресурсы содержания. Спецификация
категорий данных и ведение реестра категорий данных для языковых ресурсов
ISO 8879, Обработка информации. Текстовые и офисные системы. Стандартный обобщѐнный
язык разметки (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Термины и определения
В рамках настоящего документа используются термины и определения, приведѐнные ниже:
3.1
стилистический орнамент
adornment
категория данных, приписываемая компоненту метамодели
3.2
внутритекстовый код
inline code
внутритекстовые команды, встроенные в исходный документ
Примечание к статье: на естественном языке могут записываться, в частности, команды
представления информации (например, коды HTML).
ISO 2012 – Все права сохраняются
1

---------------------- Page: 5 ----------------------
ISO 24616:2012(R)
3.3
субтитр
subtitle
текстовые эквиваленты диалогов в кинофильмах, телепрограммах, видеоиграх и т.п., обычно
отображаемые внизу экрана
3.4
рабочий язык
working language
язык, с помощью которого выражаются последовательности лингвистических единиц
4 Принципы описания
4.1 Основополагающий стандарт спецификаций: универсальный язык
моделирования UML
В основе спецификации MLIF лежат принципы построения моделей на языке UML, как он был
определѐн Группой объектного управления OMG [Object Management Group]. В спецификации
используется подмножество элементов языка UML, подходящее для целей MLIF.
4.2 Метамодель и стилистический орнамент
Наряду с терминологической системой разметки TMF (Terminological Markup Framework), как она
определена в ISO 16642, MLIF определяет метамодель, орнаментированную категориями данных, как
она представлена в ISO 12620.
4.3 XML-сериализация
Совместно с метамоделью и еѐ стилистическим орнаментом MLIF даѐт представление информации на
языке XML, называемое “XML-сериализацией”, в сочетании с расширяемым языком разметки XML
(Extensible Markup Language), как он определѐн в ISO 8879.
5 Спецификация метамодели
Метамодель MLIF описывается объектной диаграммой на языке UML, показанной на Рисунке 1. Эту
модель определяют следующие семь “компонентов ядра”, перечисленных ниже в порядке их XML-
сериализации:
 (Multilingual Data Collection / Многоязыковая коллекция данных), которая представляет
собой совокупность данных, содержащих информацию глобального характера и несколько
многоязыковых лингвистических единиц;
 (Global Information / Глобальная информация), включающая в себя сведения технического и
административного характера, применимые ко всей коллекции многоязыковых данных в целом;
 (Grouping components / Компоненты группирования), которые представляют
подмножество многоязыковых данных, имеющих общий источник или общее целевое назначение
в рамках конкретного проекта;
 (Multilingual Component / Многоязыковой компонент), обеспечивающий группировку всех
вариантов определѐнного текстового содержания;
 (Monolingual Component / Одноязычный компонент), обеспечивающий группировку
информации, которая относится к одному и тому же языку и является частью многоязыкового
компонента MultiC;


2
ISO 2012 – Все права сохраняются

---------------------- Page: 6 ----------------------
ISO 24616:2012(R)
 (History Component / Компонент предыстории), обеспечивающий отслеживание
изменений компонента, к которому он привязан (т.е. контроль версий);
 (Segmentation Component / Компонент сегментации), который обеспечивает возможность
любого уровня сегментирования текстовой информации, в том числе – с использованием
рекурсивного метода.


Рисунок 1 — Схематическое представление метамодели MLIF
6 Применимость MLIF
Метамодель MLIF может использоваться применительно к любому формату, совместимому с
настоящим международным стандартом, двумя способами:
 посредством полной реализации метамодели MLIF, начиная с уровня ;
 путѐм специального вложения информации, совместимой с MLIF, в другую модель, с целью
реализации низкоуровневых элементов MLIF, а именно , или .
© ISO 2012 – Все права сохраняются 3

---------------------- Page: 7 ----------------------
ISO 24616:2012(R)
7 Стилистический орнамент метамодели
7.1 Общие замечания
XML-сериализация MLIF предполагает наличие множества элементов и атрибутов XML, которые
описываются в последующих разделах настоящего стандарта и в которых символы “<” и “>”
ограничивают имя элемента. В соответствии с руководящими указаниями TEI (http://www.tei-c.org),
некоторые из атрибутов определяются путѐм указания их класса, и в этом случае атрибут имени
класса предваряется префиксом “att.” (например “att.xlink”). В то же время другие атрибуты XML
определяются списком, в котором имена атрибутов выделяются кавычками (например “xml:lang”). При
этом должны использоваться спецификации, представленные в Приложении G настоящего стандарта.
7.2 Общие принципы использования групповых атрибутов W3C
Во всех MLIF-совместимых приложениях подлежат использованию следующие атрибуты,
определѐнные консорциумом W3C:
 атрибут xml:lang должен применяться для представления рабочего языка любого релевантного
элемента и, в частности, использоваться систематически при любой реализации компонента
MonoC;
 атрибут xml:id должен использоваться в соответствии с рекомендациями W3C для предоставления
уникального идентификатора элемента метамодели MLIF.
7.3 Рекомендуемый стилистический орнамент для компонента GI

















4
ISO 2012 – Все права сохраняются

---------------------- Page: 8 ----------------------
ISO 24616:2012(R)
7.4 Рекомендуемый стилистический орнамент для компонента GroupC

7.5 Рекомендуемый стилистический орнамент для компонента MultiC









7.6 Рекомендуемый стилистический орнамент для компонента MonoC
 att.lang


 att.xlink
Для компонента MonoC обязательно наличие языкового атрибута; все другие атрибуты -
факультативны.
7.7 Рекомендуемый стилистический орнамент для компонента SegC







 att.linguistic
 att.xlink
© ISO 2012 – Все права сохраняются 5

---------------------- Page: 9 ----------------------
ISO 24616:2012(R)
7.8 Рекомендуемый стилистический орнамент для компонента HistoC
Групповой компонент HistoC обеспечивает отслеживание изменений компонента, к которому он
привязан (например, этапов создания, модификации и подтверждение достоверности). В метамодели
MLIF компонент HistoC может привязываться к компоненту GI, MultiC или MonoC, благодаря чему
становится возможной регистрация всех эволюционных изменений или расширений компонента.
Компонент HistoC может иметь стилистический орнамент из четырѐх элементов:




7.9 Рекомендуемый стилистический орнамент для оперативно доступной аннотации
Многоязычные текстовые документы часто появляются только на одном этапе сложного
технологического процесса, в котором участвуют внешние источники документов, имеющих самые
разные форматы. Отсюда часто возникает необходимость сохранения внутритекстовой разметки,
указывающей на характеристики представления данных, которые подлежат сохранению и в целевом
документе на языке перевода. Поэтому в рамках MLIF-совместимых приложений применительно к
компоненту должны использоваться следующие элементы, отображаемые на аналогичные
подмножества элементов в TMX и XLIFF:





7.10 Рекомендуемый стилистический орнамент для локализации
Для предоставления необходимой информации, касающейся локализации, подлежат использованию
следующие элементы:


7.11 Рекомендуемый стилистический орнамент для интернационализации



6
ISO 2012 – Все права сохраняются

---------------------- Page: 10 ----------------------
ISO 24616:2012(R)
7.12 Рекомендуемый стилистический орнамент для синхронизации во времени
Когда текстовое содержание документа подлежит передаче (в письменной или устной форме) вместе с
некоторыми сопутствующими ограничениями, должны использоваться элементы:



8 Связь с другими стандартами
Применительно к структуре терминологической разметки TMF [ISO 16642] при работе с терминологией
MLIF предоставляет метамодель, которая в сочетании с выбранными категориями данных образует
надѐжную основу для обеспечения надлежащего взаимодействия между несколькими многоязыковыми
приложениями в рамках работы с текстовыми корпусами. При этом MLIF обеспечивает работу с
многоязыковыми корпусами, многоязычными фрагментами и отношениями, характеризующими
перевод с одного языка на другой. В любой сфере применимости MLIF для целей сегментирования и
описания текстов может выбираться определѐнный уровень разбиения текстовой информации. В этой
части процессы сегментирования и описания могут основываться на использовании MAF [ISO 24611],
SynAF [ISO 24615] и структуры терминологической разметки (TMF) для морфологического описания,
синтаксического аннотирования и терминологического описания, соответственно.
MLIF поддерживает процессы разработки и взаимодействия ресурсов памяти переводов и процедур
локализации, а также работу с описанием метамодели в части обработки еѐ многоязычного контента.
MLIF не предоставляет исчерпывающего списка характеристик используемых описаний, а вместо этого
даѐт перечень категорий данных, который гораздо более удобен для обновления и расширения. Этот
перечень является отправной точкой для обработки многоязычной информации в контексте различных
сценариев, реализуемых приложениями.
Однако MLIF не только описывает элементарные лингвистические сегменты (например, предложение,
синтаксический фрагмент, слово и часть речи), но может также использоваться для представления
структуры документа (например, заголовка, резюме, абзаца и раздела). Кроме того, MLIF допускает
установление внешних и внутренних связей (аннотаций и ссылок).
MLIF предназначается для создания общей основы, облегчающей работу с такими форматами, как
TMX (LISA OSCAR) и XLIFF (OASIS). MLIF может рассматриваться как родительский узел этих
форматов, поскольку оба они относятся к многоязычным данным, выраженным в форме сегментов или
текстовых единиц. Оба этих формата могут храниться, использоваться и преобразовываться
одинаковым образом.
Примеры использования MLIF приведены в Приложениях от A до F.
© ISO 2012 – Все права сохраняются 7

---------------------- Page: 11 ----------------------
ISO 24616:2012(R)
Приложение A
(информативное)

Пример использования MLIF для автоматизированного перевода
Главная цель использования таких структур, как лемма, часть речи и морфологические элементы
состоит в том, чтобы придать инструментальным средствам автоматизации перевода (CAT), основой
которых является память переводов, способность к выполнению перевода новых слов и предложений,
которые не содержатся в базе данных автоматизированной системы перевода.
1 )
Например, такая современная система памяти переводов, как SDL TRADOS , в которой будут
записаны английское предложение "The meal is nice" (“Эта еда великолепна”) и его перевод на
французский язык "Le repas est bon", не способна будет дать очевидный перевод предложения "The
meals are nice", несмотря на то, что текстовые леммы "The meal is nice" и "The meals are nice"
полностью совпадают. Причина такой слабости кроется в том факте, что в данной системе
автоматизации в процессе перевода задействовано недостаточное число лингвистических критериев.
В рассматриваемом случае данные, формируемые модулем TRADOS Translator's Workbench, выглядят
следующим образом:

  creationtool="TRADOS Translator's Workbench for Windows"
  creationtoolversion="Edition 8 Build 863"
  segtype="sentence"
  o-tmf="TW4Win 2.0 Format"
  adminlang="EN-US"
  srclang="EN-GB"
  datatype="rtf"
  creationdate="20100528T144322Z"
  creationid="USER"/>

 
 
  The meal is nice.
 
 
  Le repas est bon.
 
 


Для обеспечения перевода предложения "The meals are nice", MLIF-совместимое инструментальное
средство должно было бы реализовать следующую процедуру:
Шаг 1 Представить в рамках MLIF с добавлением лингвистических характеристик все слова,
хранящиеся в памяти переводов.

1)
Система SDL TRADOS Translator's Workbench взята как подходящий для примера коммерческий программный
продукт, широко доступный для приобретения. Информация приведена здесь для удобства пользователей
настоящего Международного стандарта и не должна рассматриваться как одобрение указанной системы со
стороны ISO.


8
ISO 2012 – Все права сохраняются

---------------------- Page: 12 ----------------------
ISO 24616:2012(R)
Шаг 2 Пропустить предложение через программу частеречной разметки для получения правильных
морфосинтаксических категорий слов.
Шаг 3 Осуществить перевод лемм с использованием двуязычного англо-французского словаря.
Шаг 4 Обратиться к французскому словарю форм склонения для выбора правильной словоформы
по заданной лемме и морфологическим признакам.
Шаг 5 Сформировать переводной эквивалент фразы "The meals are nice" путѐм замены каждого
английского слова его французской формой склонения следующим образом:
"The meals are nice." => "Les repas sont bons."
Данные на языке XML должны включать в себя объявление признаковой структуры, определяющее
набор тегов (например для "nS"), и сегментацию слов с использованием набора тегов, определѐнного
в рамках MAF:


 
 
 
 
 


 
  SEMMAR
  20090922T140653Z
 
  The meal is nice.
 
 
  Le repas est bon.
 
 
 
 
  The
      class="word"
   lemma="meal"
   pos="commonNoun"
   tag="#nS">meal
      class="word"
   lemma="be"
   pos="verb"
   tag="#mP #p1 #nS">is
  nice
  .
 
 
      class="word"
   lemma="le"
   pos="definiteArticle"
   tag="#gM #nS">Le
      class="word"
© ISO 2012 – Все права сохраняются 9

---------------------- Page: 13 ----------------------
ISO 24616:2012(R)
   lemma="repas"
   pos="commonNoun"
   tag="#gM #nS">repas
      class="word"
   lemma="être"
   pos="verb"
   tag="#mP #p1 #nS">est
      class="word"
   lemma="bon"
   pos="qualifierAdjective"
   tag="#gM #nS">bon
  .
 
 





10
ISO 2012 – Все права сохраняются

---------------------- Page: 14 ----------------------
ISO 24616:2012(R)
Приложение B
(информативное)

Пример: представление данных стандарта TMX
B.1 Вводные замечания
TMX (Translation Memory eXchange / информационный обмен в памяти переводов) – это название
инвариантного к поставщику переводческих услуг открытого стандарта XML, определяющего обмен
данными памяти переводов (TM), создаваемыми в рамках инструментальных средств
автоматизированного перевода и локализации. Целью TMX является упрощение процедур обмена
данными между инструментальными системами и/или поставщиками переводческих услуг без потери
либо с минимальной потерей критически важных данных в процессе выполнения перевода.
Аттестованный стандартный формат TMX присутствует на рынке систем программного обеспечения с
1998 года. Он разработан и поддерживается Рабочей группой по открытым стандартам контента
повторного использования OSCAR (Open Standards for Container/Content Allowing Re-use)
Международной ассоциации отраслевых стандартов локализации LISA (Localisation Industry Standards
Association).
B.2 Отображение формата TMX на MLIF
Структура формата TMX почти изоморфна структуре метамодели MLIF. Отображение макроструктуры
TMX на MLIF реализуется следующим образом:
 элемент отображается на компонент ;
 элемент

отображается на компонент ;
 элемент - контейнер элемента - отображается на компонент ;
 элемент отображается на компонент ;
 элемент отображается на компонент ;
 элемент отображается на компонент ;
 элемент , указывающий тип термина, отображается на элемент для типа термина.
Прочие элементы и атрибуты TMX отображаются на элементы MLIF как указано ниже:
 атрибут "creationtool" отображается на элемент ;
 атрибут "creationdate" отображается на элемент element;
 атрибут "tuid" отображается на элемент в рамках компонента MultiC.
 элемент не отображается ни на какой другой конкретный элемент, поскольку он является
универсальным заполнителем, указывающим местоположение данных, зависящих от конкретного
приложения; в случае его применения конкретный элемент явным образом отображается
на элементы MLIF или на стандартизованные категории данных ISO/TC 37, доступные в каталоге
ISOCat.
© ISO 2012 – Все права сохраняются 11

---------------------- Page: 15 ----------------------
ISO 24616:2012(R)
B.3 Пример данных
Приведѐнный ниже пример, основанный на TMX версии 1.4, охватывает многоязычные
лингвистические единицы документа в формате TMX и не передаѐт всех подробностей заголовка.

  adminlang="en"
  creationdate="20040731T164933Z"
  creationtool="Heartsome TM Server"
  creationtoolversion="1.0.1"
  datatype="xml"
  o-tmf="unknown"
  segtype="block"
  srclang="*all*"/>

 
 
  Le processus de contrôle de
      qualité en dix étapes qu'il a créé il y a plus
     de 1300 ans est beaucoup plus complet et précis que ceux
     existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300 years
     ago is far more thorough and exacting than any existing
     today.
 
 
  El proceso de control de
      calidad en diez pasos que inició hace más de
     1300 años es mucho más completo y preciso que los que
     existen en la actualidad.
 
 
  Il suo metodo di controllo di qualità in 10 fasi risale a più
     di 1300 anni fa ed è molto più accurato e preciso di
     qualsiasi metodo attuale.
 
 
  그가 1300여년 전 시작한 10단계 품질
      관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
 
 


Соответствующее стандартное представление в MLIF будет иметь вид:


 TMX
 1.4
 20040731T164933Z
 Heartsome TM Server
 1.0.1



12
ISO 2012 – Все права сохраняются

---------------------- Page: 16 ----------------------
ISO 24616:2012(R)

 
  1091303313515
  20020930T004233Z
 
  Le processus de contrôle
      de qualité en dix étapes qu'il a créé il y a
     plus de 1300 ans est beaucoup plus complet et précis que
     ceux existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300
     years ago is far more thorough and exacting than any

...

INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01

Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues




Reference number
ISO 24616:2012(E)
©
ISO 2012

---------------------- Page: 1 ----------------------
ISO 24616:2012(E)

COPYRIGHT PROTECTED DOCUMENT


©  ISO 2012
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO 2012 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24616:2012(E)
Contents Page
Foreword . iv
1  Scope . 1
2  Normative references . 1
3  Terms and definitions . 1
4  Specification principles . 2
4.1  Key standard used in the specification: Unified Modeling Language (UML) . 2
4.2  Metamodel and adornment . 2
4.3  XML serialization . 2
5  Metamodel specification . 2
6  MLIF compliance . 3
7  Metamodel adornment . 3
7.1  Introduction . 3
7.2  General principles concerning the use of W3C generic attributes . 3
7.3  Recommended adornment for GI . 4
7.4  Recommended adornment for GroupC . 4
7.5  Recommended adornment for MultiC . 4
7.6  Recommended and mandatory adornment for MonoC . 5
7.7  Recommended adornment for SegC . 5
7.8  Recommended adornment for HistoC . 5
7.9  Recommended online annotation adornment . 5
7.10  Recommended adornment for localization. 6
7.11  Recommended adornment for internationalization . 6
7.12  Recommended adornment for temporal synchronization . 6
8  Relation with other standards . 6
Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) . 8
Annex B (informative) Example: representing TMX data . 11
Annex C (informative) Example of XLIFF data representation . 14
Annex D (informative) Example: representing smilText data . 18
Annex E (informative) Example of MLIF usage for subtitles (captioning) . 20
Annex F (informative) Using MLIF for MAF data . 26
Annex G (normative) Detailed specification . 27
Bibliography . 42

© ISO 2012 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO 24616:2012(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.

iv © ISO 2012 – All rights reserved

---------------------- Page: 4 ----------------------
INTERNATIONAL STANDARD ISO 24616:2012(E)

Language resources management — Multilingual information
framework
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
3.3
subtitle
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
3.4
working language
language in which linguistic sequences are expressed
© ISO 2012 – All rights reserved 1

---------------------- Page: 5 ----------------------
ISO 24616:2012(E)
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.

Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved

---------------------- Page: 6 ----------------------
ISO 24616:2012(E)
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
 (Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
 (Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
 (Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
 (Multilingual Component), which groups together all variants of a given textual content;
 (Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
 (History Component), which traces modifications to the component to which it is anchored (i.e.
versioning);
 (Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
 by fully implementing the MLIF metamodel starting at the level of ;
 by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
 the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
 the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
© ISO 2012 – All rights reserved 3

---------------------- Page: 7 ----------------------
ISO 24616:2012(E)
7.3 Recommended adornment for GI















7.4 Recommended adornment for GroupC

7.5 Recommended adornment for MultiC









4 © ISO 2012 – All rights reserved

---------------------- Page: 8 ----------------------
ISO 24616:2012(E)
7.6 Recommended and mandatory adornment for MonoC
 att.lang


 att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC







 att.linguistic
 att.xlink
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:




7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
© ISO 2012 – All rights reserved 5

---------------------- Page: 9 ----------------------
ISO 24616:2012(E)





7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:


7.11 Recommended adornment for internationalization

7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:



8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
respectively.
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
scenarios.
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
6 © ISO 2012 – All rights reserved

---------------------- Page: 10 ----------------------
ISO 24616:2012(E)
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
© ISO 2012 – All rights reserved 7

---------------------- Page: 11 ----------------------
ISO 24616:2012(E)
Annex A
(informative)

Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
database.
For example, using a translation memory that contains the English sentence "The meal is nice." and its
1)
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:

  creationtool="TRADOS Translator's Workbench for Windows"
  creationtoolversion="Edition 8 Build 863"
  segtype="sentence"
  o-tmf="TW4Win 2.0 Format"
  adminlang="EN-US"
  srclang="EN-GB"
  datatype="rtf"
  creationdate="20100528T144322Z"
  creationid="USER"/>

 
 
  The meal is nice.
 
 
  Le repas est bon.
 
 


To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
procedure:
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.

1)
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
product.
8 © ISO 2012 – All rights reserved

---------------------- Page: 12 ----------------------
ISO 24616:2012(E)
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:


 
 
 
 
 


 
  SEMMAR
  20090922T140653Z
 
  The meal is nice.
 
 
  Le repas est bon.
 
 
 
 
  The
      class="word"
   lemma="meal"
   pos="commonNoun"
   tag="#nS">meal
      class="word"
   lemma="be"
   pos="verb"
   tag="#mP #p1 #nS">is
  nice
  .
 
 
      class="word"
   lemma="le"
   pos="definiteArticle"
   tag="#gM #nS">Le
      class="word"
   lemma="repas"
   pos="commonNoun"
   tag="#gM #nS">repas
      class="word"
   lemma="être"
© ISO 2012 – All rights reserved 9

---------------------- Page: 13 ----------------------
ISO 24616:2012(E)
   pos="verb"
   tag="#mP #p1 #nS">est
      class="word"
   lemma="bon"
   pos="qualifierAdjective"
   tag="#gM #nS">bon
  .
 
 



10 © ISO 2012 – All rights reserved

---------------------- Page: 14 ----------------------
ISO 24616:2012(E)
Annex B
(informative)

Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
 maps onto the element;

maps onto the element;
 is a container for the element and maps onto the element;
 maps onto the element;
 maps onto the element;
 maps onto the element;
 of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
 The "creationtool" attribute maps onto the element;
 The "creationdate" attribute maps onto the element;
 The "tuid" attribute maps onto the element within MultiC.
 The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
© ISO 2012 – All rights reserved 11

---------------------- Page: 15 ----------------------
ISO 24616:2012(E)
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.

  adminlang="en"
  creationdate="20040731T164933Z"
  creationtool="Heartsome TM Server"
  creationtoolversion="1.0.1"
  datatype="xml"
  o-tmf="unknown"
  segtype="block"
  srclang="*all*"/>

 
 
  Le processus de contrôle de
      qualité en dix étapes qu'il a créé il y a plus
     de 1300 ans est beaucoup plus complet et précis que ceux
     existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300 years
     ago is far more thorough and exacting than any existing
     today.
 
 
  El proceso de control de
      calidad en diez pasos que inició hace más de
     1300 años es mucho más completo y preciso que los que
     existen en la actualidad.
 
 
  Il suo metodo di controllo di qualità in 10 fasi risale a più
     di 1300 anni fa ed è molto più accurato e preciso di
     qualsiasi metodo attuale.
 
 
  그가 1300여년 전 시작한 10단계 품질
      관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
 
 


The corresponding representation in MLIF default representation is as follows:


 TMX
 1.4
 20040731T164933Z
 Heartsome TM Server
 1.0.1


 
12 © ISO 2012 – All rights reserved

---------------------- Page: 16 ----------------------
ISO 24616:2012(E)
  1091303313515
  20020930T004233Z
 
  Le processus de contrôle
      de qualité en dix étapes qu'il a créé il y a
     plus de 1300 ans est beaucoup plus complet et précis que
     ceux existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300
     years ago is far more thorough and exacting than any
     existing today.
 
 


B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).

Figure B.1 — TMX and MLIF interaction
© ISO 2012 – All rights reserved 13

---------------------- Page: 17 ----------------------
ISO 24616:2012(E)
Annex C
(informative)

Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF
XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language
for monolingual information. This is handled through the appropriate use of the data
category in together with the language declarations ( and ) in
.
The core elements of the XLIFF macro-structure map to MLIF as follows:
 maps onto the element;
maps onto the element;
 is a container for the element and maps onto the element;
 the element maps onto the element;
 maps onto the element;
maps onto the element and simultaneously sets the value of the
element to . The corresponding textual content is placed in a element;
 maps onto the element and simultaneously sets the value of the
element to . The corresponding textual content is placed in a element;
 maps onto the element and simultaneously sets the value of the
element to alternate.
XLIFF further elements and attrib
...

SLOVENSKI STANDARD
SIST ISO 24616:2013
01-julij-2013
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DYHþMH]LþQHLQIRUPDFLMH
Language resources management -- Multilingual information framework
Gestion des ressources langagières -- Plateforme d'informations multilingues
Ta slovenski standard je istoveten z: ISO 24616:2012
ICS:
01.020 7HUPLQRORJLMD QDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
SIST ISO 24616:2013 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------

SIST ISO 24616:2013

---------------------- Page: 2 ----------------------

SIST ISO 24616:2013

INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01

Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues




Reference number
ISO 24616:2012(E)
©
ISO 2012

---------------------- Page: 3 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)

COPYRIGHT PROTECTED DOCUMENT


©  ISO 2012
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO 2012 – All rights reserved

---------------------- Page: 4 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
Contents Page
Foreword . iv
1  Scope . 1
2  Normative references . 1
3  Terms and definitions . 1
4  Specification principles . 2
4.1  Key standard used in the specification: Unified Modeling Language (UML) . 2
4.2  Metamodel and adornment . 2
4.3  XML serialization . 2
5  Metamodel specification . 2
6  MLIF compliance . 3
7  Metamodel adornment . 3
7.1  Introduction . 3
7.2  General principles concerning the use of W3C generic attributes . 3
7.3  Recommended adornment for GI . 4
7.4  Recommended adornment for GroupC . 4
7.5  Recommended adornment for MultiC . 4
7.6  Recommended and mandatory adornment for MonoC . 5
7.7  Recommended adornment for SegC . 5
7.8  Recommended adornment for HistoC . 5
7.9  Recommended online annotation adornment . 5
7.10  Recommended adornment for localization. 6
7.11  Recommended adornment for internationalization . 6
7.12  Recommended adornment for temporal synchronization . 6
8  Relation with other standards . 6
Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) . 8
Annex B (informative) Example: representing TMX data . 11
Annex C (informative) Example of XLIFF data representation . 14
Annex D (informative) Example: representing smilText data . 18
Annex E (informative) Example of MLIF usage for subtitles (captioning) . 20
Annex F (informative) Using MLIF for MAF data . 26
Annex G (normative) Detailed specification . 27
Bibliography . 42

© ISO 2012 – All rights reserved iii

---------------------- Page: 5 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.

iv © ISO 2012 – All rights reserved

---------------------- Page: 6 ----------------------

SIST ISO 24616:2013
INTERNATIONAL STANDARD ISO 24616:2012(E)

Language resources management — Multilingual information
framework
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
3.3
subtitle
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
3.4
working language
language in which linguistic sequences are expressed
© ISO 2012 – All rights reserved 1

---------------------- Page: 7 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.

Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved

---------------------- Page: 8 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
 (Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
 (Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
 (Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
 (Multilingual Component), which groups together all variants of a given textual content;
 (Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
 (History Component), which traces modifications to the component to which it is anchored (i.e.
versioning);
 (Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
 by fully implementing the MLIF metamodel starting at the level of ;
 by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
 the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
 the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
© ISO 2012 – All rights reserved 3

---------------------- Page: 9 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
7.3 Recommended adornment for GI















7.4 Recommended adornment for GroupC

7.5 Recommended adornment for MultiC









4 © ISO 2012 – All rights reserved

---------------------- Page: 10 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
7.6 Recommended and mandatory adornment for MonoC
 att.lang


 att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC







 att.linguistic
 att.xlink
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:




7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
© ISO 2012 – All rights reserved 5

---------------------- Page: 11 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)





7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:


7.11 Recommended adornment for internationalization

7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:



8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
respectively.
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
scenarios.
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
6 © ISO 2012 – All rights reserved

---------------------- Page: 12 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
© ISO 2012 – All rights reserved 7

---------------------- Page: 13 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
Annex A
(informative)

Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
database.
For example, using a translation memory that contains the English sentence "The meal is nice." and its
1)
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:

  creationtool="TRADOS Translator's Workbench for Windows"
  creationtoolversion="Edition 8 Build 863"
  segtype="sentence"
  o-tmf="TW4Win 2.0 Format"
  adminlang="EN-US"
  srclang="EN-GB"
  datatype="rtf"
  creationdate="20100528T144322Z"
  creationid="USER"/>

 
 
  The meal is nice.
 
 
  Le repas est bon.
 
 


To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
procedure:
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.

1)
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
product.
8 © ISO 2012 – All rights reserved

---------------------- Page: 14 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:


 
 
 
 
 


 
  SEMMAR
  20090922T140653Z
 
  The meal is nice.
 
 
  Le repas est bon.
 
 
 
 
  The
      class="word"
   lemma="meal"
   pos="commonNoun"
   tag="#nS">meal
      class="word"
   lemma="be"
   pos="verb"
   tag="#mP #p1 #nS">is
  nice
  .
 
 
      class="word"
   lemma="le"
   pos="definiteArticle"
   tag="#gM #nS">Le
      class="word"
   lemma="repas"
   pos="commonNoun"
   tag="#gM #nS">repas
      class="word"
   lemma="être"
© ISO 2012 – All rights reserved 9

---------------------- Page: 15 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
   pos="verb"
   tag="#mP #p1 #nS">est
      class="word"
   lemma="bon"
   pos="qualifierAdjective"
   tag="#gM #nS">bon
  .
 
 



10 © ISO 2012 – All rights reserved

---------------------- Page: 16 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
Annex B
(informative)

Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
 maps onto the element;

maps onto the element;
 is a container for the element and maps onto the element;
 maps onto the element;
 maps onto the element;
 maps onto the element;
 of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
 The "creationtool" attribute maps onto the element;
 The "creationdate" attribute maps onto the element;
 The "tuid" attribute maps onto the element within MultiC.
 The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
© ISO 2012 – All rights reserved 11

---------------------- Page: 17 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.

  adminlang="en"
  creationdate="20040731T164933Z"
  creationtool="Heartsome TM Server"
  creationtoolversion="1.0.1"
  datatype="xml"
  o-tmf="unknown"
  segtype="block"
  srclang="*all*"/>

 
 
  Le processus de contrôle de
      qualité en dix étapes qu'il a créé il y a plus
     de 1300 ans est beaucoup plus complet et précis que ceux
     existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300 years
     ago is far more thorough and exacting than any existing
     today.
 
 
  El proceso de control de
      calidad en diez pasos que inició hace más de
     1300 años es mucho más completo y preciso que los que
     existen en la actualidad.
 
 
  Il suo metodo di controllo di qualità in 10 fasi risale a più
     di 1300 anni fa ed è molto più accurato e preciso di
     qualsiasi metodo attuale.
 
 
  그가 1300여년 전 시작한 10단계 품질
      관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
 
 


The corresponding representation in MLIF default representation is as follows:


 TMX
 1.4
 20040731T164933Z
 Heartsome TM Server
 1.0.1


 
12 © ISO 2012 – All rights reserved

---------------------- Page: 18 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
  1091303313515
  20020930T004233Z
 
  Le processus de contrôle
      de qualité en dix étapes qu'il a créé il y a
     plus de 1300 ans est beaucoup plus complet et précis que
     ceux existant aujourd'hui.
 
 
  His 10-stage quality
      control process initiated more than 1300
     years ago is far more thorough and exacting than any
     existing today.
 
 


B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).

Figure B.1 — TMX and MLIF interaction
© ISO 2012 – All rights reserved 13

---------------------- Page: 19 ----------------------

SIST ISO 24616:2013
ISO 24616:2012(E)
Annex C
(informative)

Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF
XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language
for monolingual inform
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.