XeTeX

I was looking into how to typeset a Greek document in LaTeX using utf-8, and I must admit it proved to be a more complicated task than expected.

The LaTeX Way

I started by using my usual approach, i.e. using inputenc with the utf8 definition:

\documentclass[a4paper,10pt]{article}

\usepackage[british,greek]{babel}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\begin{document}
\selectlanguage{greek}
Κατάγομαι από την Ιρλανδία.
\end{document}

But I got an error message:

! Package inputenc Error: Unicode char \u8:Κ not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H   for immediate help.
 ...                                              

l.11 Κ
       ατάγομαι από την Ιρλανδία.
? 

Browsing the inputenc documentation, it appears that Greek characters are not set up: they have to be defined by the user.

Unfortunately the number of Unicode characters that in theory could be contained in a document is enormous. Thus even with today’s amount of computer memory it would be unrealistic to predefine all of them.

Characters can be defined using \DeclareUnicodeCharacter which takes two arguments, the first one being the Unicode code point, and the second one the character it maps to.

I was then redirected to another (older?) package called ucs (see here), bringing in a new definition for inputenc called utf8x:

\documentclass[a4paper,10pt]{article}

\usepackage[british,greek]{babel}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}

\begin{document}
\selectlanguage{greek}
Κατάγομαι από την Ιρλανδία.
\end{document}

And this proved to be a hit.

However, given the author’s message (“Due to time restrictions, I am not able to maintain this package anymore”), I was not quite happy that I had found the “right” solution.

So I decided to look back at Ω, an extension of TeX using Unicode (and as I’m currently reading Yannis Haralambous’ book Fonts and Encodings, everything was converging back to it!).

Enters XeTeX

Reading about Ω, references to XeTeX were popping up here and there, mentioning that it was a “recent Unicode capable TeX extension”, so it was definitely worth a look. And I wasn’t disappointed: XeTeX seems to be next logical step after LaTeX. In particular, it supports:

  • Unicode,
  • Font technologies such as AAT and OpenType (this makes life so much easier to select the font you want)
  • PDF: it produces PDF out of the box. To produce xdv, you can use the -no-pdf option.

These characteristics really make Xe(La)TeX a “modern” LaTeX. Let’s have a look at what our example now looks like:

\documentclass[a4paper,10pt]{article}

\usepackage{xltxtra}

\setmainfont[Mapping=tex-text]{DejaVu Sans}

\begin{document}
Κατάγομαι από την Ιρλανδία.
\end{document}

You then compile the document with:

xelatex greek-sample.tex

You may have noticed that I have had to remove the babel stuff. I was indeed getting the following error:

LaTeX Font Warning: Font shape `LGR/DejaVuSans(0)/m/n' undefined
(Font)              using `LGR/cmr/m/n' instead on input line 2.

and at the second pass:

! Corrupted NFSS tables.
wrong@fontshape ...message {Corrupted NFSS tables}
                                                  error@fontshape else let f...
l.6 \select@language{greek}

According to the fontspec documentation:

The babel package is not really supported! Especially Vietnamese, Greek, and Hebrew at least might not work correctly, as far as I can tell.

No panic, there is actually a replacement package called polyglossia, which “aims to remain as compatible as possible with the fundamental features of Babel while being cleaner, light-weight, and modern.”

Our document now becomes:

\documentclass[a4paper,10pt]{article}

\usepackage{polyglossia}
\usepackage{xltxtra}
\setdefaultlanguage{greek}
\setmainfont[Mapping=tex-text]{DejaVu Sans}

\begin{document}
Κατάγομαι από την Ιρλανδία.
\end{document}

And here is the result:

Having struggled with NFSS in the past, this really makes a user’s life so much easier.

 
---

Comment

 
---