XeLaTeX: Unicode font fallback for unsupported characters

Note: This post is 7 years old. Some information may no longer be correct or even relevant. Please, keep this in mind while reading.

Traditionally I only used to use LaTeX to typeset documents, and it works perfectly when you have a single language script (e.g. only English or German). But as soon as you want to typeset Unicode text in multiple languages, you’re quickly out of luck. LaTeX is just not made for Unicode, and you need a lot of helper packages, documentation reading, and complicated configuration in your document to get it all right.

All I wanted was to typeset the following Unicode text. It contains regular latin characters, chinese characters, modern greek and polytonic (ancient) greek.

Latin text. Chinese text: 紫薇北斗星  Modern greek: Διαμ πριμα εσθ ατ, κυο πχιλωσοπηια Ancient greek: Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆος. And regular latin text.

I thought it was a simple task. I thought: let’s just use XeLaTeX, which has out-of-the-box Unicode support. In the end, it was a simple task, but only after struggling to solve a particular problem. To show you the problem, I ran the following straightforward code through XeLaTeX…

\documentclass[]{book}

\usepackage{fontspec}

\begin{document}
Latin text. Chinese text: 紫薇北斗星 Modern greek: Διαμ πριμα εσθ ατ, κυο πχιλωσοπηια
Ancient greek: Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆος. And regular latin text.
\end{document}

… and the following PDF was produced:

XeLaTeX rendering Computer Modern font with unsupported unicode characters
XeLaTeX rendering Computer Modern font with unsupported unicode characters

It turns out that the missing unicode characters are not XeLaTeX’s fault. The problem is that the used font (XeLaTeX by default uses a slightly more encompassing Computer Modern font) has not all unicode characters implemented. To implement all unicode characters in a single font (about 1.1 million characters) is a monumental task, and there are only a small handful of fonts whose maintainers aim to have full support of all characters (one of them is GNU FreeFont, which is already part of the Debian distribution, and therefore available to XeLaTeX).

So, I thought, let’s just use a font which is dedicated to unicode. I selected in my document the pretty Junicode font:

\setmainfont{Junicode}

The result was:

XeLaTex and Junicode font with chinese and greek characters
XeLaTex and Junicode font with chinese and greek characters

Now, Greek worked, but still no Chinese characters. It turned out that even fonts which are dedicated to Unicode do not yet have all possible characters implemented. Because it’s a lot of work to produce high-quality fonts with matching styles for millions of possible characters.

So, how do regular web browsers or office applications do it? They use a mechanism called font fallback. When a particular character is not implemented in the chosen main font, another font is silently used which does have this particular character implemented. XeLaTeX can do the same with a package called ucharclasses, and it gives you full control over the fallback font selection process. The ucharclasses documentation gives an example using the \fontspec  font selection. I decided to use the font IPAexMincho which supports Chinese characters. So I added to my document:

\usepackage[CJK]{ucharclasses}
\setTransitionsForCJK{\fontspec{IPAexMincho}}{\fontspec{Junicode}}

… but when running XeLaTeX with this addition, ucharclasses somehow entered an endless loop with high CPU usage for the TexLive 2014 distribution (part of Debian). It hung at the line:

(./ucharclass.aux) (/usr/share/texmf/tex/latex/tipa/t3cmr.fd)

Endless googling didn’t bring up any useful hints. Something must have changed in the internals, and the ucharclasses documentation needs updating. In any event, it took me 4 hours to find a fix. The solution was to use a font selection other than \fontspec{} — because it doesn’t seem to be compatible with ucharclasses any more. Instead, I used fontspec‘s suggested \newfontfamily  mechanism. Here is the final working code:

\documentclass[]{book}

\usepackage{fontspec}
\setmainfont{Junicode}
\newfontfamily\myregularfont{Junicode}
\newfontfamily\mychinesefont{IPAexMincho}

\usepackage[CJK]{ucharclasses}
\setTransitionsForCJK{\mychinesefont}{\myregularfont}

\begin{document}
Latin text. Chinese text: 紫薇北斗星  Modern greek: Διαμ πριμα εσθ ατ, κυο πχιλωσοπηια Ancient greek: Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆος. And regular latin text.
\end{document}

Here is the result: Mixed latin, chinese, and greek scripts with two different fonts: Junicode and IPAexMincho:

XeLaTeX with unicode font fallbacks
XeLaTeX with unicode font fallbacks

Pretty!

XeLaTeX with unicode font fallbacks
XeLaTeX with unicode font fallbacks

How to set up audio streaming (internet radio) in Linux

Note: This post is 8 years old. Some information may no longer be correct or even relevant. Please, keep this in mind while reading.

This tutorial will show you how you can go live with your own internet radio station in a few minutes.

Demystifying “streams”

There is a lot of information, disinformation and irrelevant information about this in the internet. When you listen to internet radio, and you inspect the network requests in your Google Chrome Developer Tools (yes, you should use Chrome anyway), you will discover that a ‘magickal’ stream is nothing else than a blatantly simple HTTP download of a regular file which never finishes. Yup, jawdroppingly simple.

What do you need?

In order to broadcast audio (e.g. internet radio) into the internet, you need

  1. a remote streaming server with high bandwidth to which many clients can collect
  2. a local stream generator, which is sending a single stream to the streaming server

The following tutorial shows how you can easily achieve this with free and open source tools which are part of the Debian (Ubuntu) distributions. It will take you 15 minutes to start your first rudimentary broadcast.

We will use Icecast2 as a streaming server, simply for the reason that it is part of the Debian distribution and that I got it to work immediately. As the local stream generator we will use darkice, for the same reasons.

Why not Windows? Well, since the majority of remote servers are running Linux distributions, you can use Icecast2 anyway. If you want to use a different stream generator for Windows, you can do so. This screencast shows you how it’s done.

Icecast2

Is Icecast a professional-grade solution? According to a blog,

Very much so. ICEcast is an industry standard platform used by thousands and thousands of radio stations all over the world. Its wide compatibility means people can listen with most players and operating systems.

Listeners will be able to connect to your MP3 stream from all over the world, with all the popular media players including Windows Media Player, iTunes, Winamp, Realplayer, XMMS, and many more media players besides.

Although incredibly simple, it can cope with even the heaviest demands and will not break under pressure. Its simplicity works to the broadcaster and listeners favor.

According to Wikipedia,

Version 2 [of Icecast] was started in 2001, a ground-up rewrite aimed at multi-format support (initially targeting Ogg Vorbis) and scalability.

A ground-up rewrite for scalability certainly sounds like good news! So, let’s dive in!

You would do the following steps on a server which is located at a large internet node with enough bandwidth to serve all your audience. To install, simply type

apt-get install icecast2

During the installation you will be asked if you want to configure Icecast2. Answer yes. You will be asked the hostname. Here simply leave the default “localhost”. Next, you will be asked for source, relay and administration passwords. For testing, leave “hackme”. If you want to change the configuration at a later point, edit the configuration file /etc/icecast2/icecast.xml

Next, you have to enable the Icecast2 server by setting ENABLE  in the configuration file /etc/default/icecast2  to true .

Now, start the server by typing

service icecast2 start

You now can access the web admin interface on port 8000 of your machine:

Icecast2 web-based admin interface
Icecast2 web-based admin interface

The log file is in /var/log/icecast2/error.log  and access.log  . Best to tail -f  both files to observe what is going on.

Darkice

Darkice is a stream generator. It encodes audio into various formats (e.g. ogg, mp3, etc.) from various inputs (e.g. microphone jack, line-in jack, or the stereo mix of your operating system) and sends a single stream to our Icecast2 server, which in turns re-broadcasts it to all connected clients.

To install, simply type:

apt-get install darkice

By default it does not install a configuration file. But there is an example one in the documentation. Copy this to the /etc directory:

cp /usr/share/doc/darkice/examples/darkice.cfg /etc

You will need to edit this file according to your needs. Here is an example that worked for me:

# this section describes general aspects of the live streaming session

[general]
duration = 0
bufferSecs = 5
reconnect = yes

[input]
device = default
sampleRate = 44100
bitsPerSample = 16
channel = 2

[icecast2-0]
bitrateMode = abr
format = vorbis
bitrate = 96
server = 192.168.0.250
port = 8000
password = hackme
mountPoint = example1.ogg
name = DarkIce trial
description = This is only a trial
url = http://www.yourserver.com
genre = my own
public = yes
localDumpFile = dump.ogg

Make sure that the password and the IP address of the Icecast2 server (which we installed earlier on the other machine) match. Also, remember the mountPoint of this stream. This is simply a label, in my case it is “example1”. Then you simply run as normal user

darkice

It is a console-only application and you will see some messages. This is what I get:

DarkIce 1.0 live audio streamer, http://code.google.com/p/darkice/
Copyright (c) 2000-2007, Tyrell Hungary, http://tyrell.hu/
Copyright (c) 2008-2010, Akos Maroy and Rafael Diniz
This is free software, and you are welcome to redistribute it 
under the terms of The GNU General Public License version 3 or
any later version.

Using config file: /etc/darkice.cfg
Using ALSA DSP input device: default
Could not set POSIX real-time scheduling, this may cause recording skips.
Try to run darkice as the super-user.

The note about the realtime stuff is just a warning, it works for me nevertheless. It would be easy to run it as superuser.

Making a simple stream player

We will simply make a small website with one <audio> element. That is enough to play streams. Create an empty file called streamtest.html  with the following contents:

<html>
  <body>
    <audio controls>
      <source src="http://192.168.0.250:8000/example1" />
    </audio>
  </body>
</html>

Make sure that the IP address corresponds to the server where the Icecast2 server is running on. Open this html file in a browser and click the play button. Now you should hear the same audio that the darkice client has as its input.

Changing the audio input for darkice

In case you don’t have the Pulse Audio Volume control installed, install it with

apt-get install pavucontrol

Then run it. As soon as you have darkice running, the “Recording” tab will show the text “ALSA plug-in [darkice]: ALSA Capture from …” From the drop down you can select the input source. The text is a bit misleading. In my case “Monitor” means the stereo-mix of the entire computer (e.g. all system sounds, all played back audios). “Built-in Analog Stereo” means the microphone / line-in jack.

Pulse Audio volume control pavucontrol
Pulse Audio volume control pavucontrol

For professional radio applications you of course would not use such a simple software mixer, but have an external hardware-based mixer to which all the microphones and the line-out of your computer are attached. Then you would connect the final output of the hardware mixer to your computer line-in and select “Built-in Analog Stereo” for darkice’s input.

Linux has a more professional audio system called Jack as a replacement for the standard system Pulse Audio (we were using Pulse Audio in the above tutorial, which is similar to what Windows uses). Both are running on the Linux Kernel’s sound system called ALSA.

Conclusion

In the face of tons of documentation and blogs in the internet it is surpirisingly easy to set up your own, simple internet radio station with zero investment, all thanks to the Open Source movement.

Citations within footnotes in LaTeX

Note: This post is 8 years old. Some information may no longer be correct or even relevant. Please, keep this in mind while reading.

Writing a tutorial on programming, I needed citations within footnotes. Luckily, the biblatex package added support (see first comment on the sourceforge page of biblatex) for citations within footnotes in 2011. Apparently, this is not straightforward, since a low-level citation command has to be used to satisfy LaTeX. Anyway, this is now done automatically by thebiblatex  package, so I didn’t have to make any changes.

In my document, I only use the LaTeX command \autocite  which behavior I can define in the document preamble. In my preamble I have:

\usepackage[backend=biber,autocite=footnote,style=authortitle-ibid]{biblatex}
\bibliography{bibliography.en.bib}

Now, with the following TeX code…

\chapter{My chapter}

This is a test\footnote{TeX is awesome \autocite[see][p. 120]{Knuth}. I agree.} for
a citation within a footnote. However, this is a citation\autocite[]{Ritchie75} in
normal text.

… you get the following output:

LaTeX citation within footnote on HTML and Kindle output
LaTeX citation within footnote on HTML and Kindle output

You will notice that footnote 2 is a citation from within the normal text. Since I’ve specified autocite=footnote  in the preamble, this is rendered as a footnote. However, the citation from within the footnote 1 is rendered in-line.