Skip to main content

Tiparilo: How to generate missing characters quickly (WIP)

·2344 words·12 mins
NN Esperanto Coding Python
Luĉjo
Author
Luĉjo
Studento kaj via loka esperantisto
Table of Contents

I absolutely love Esperanto and love writing and translating to the language. One issue that one often encounters is that a lot of fonts do not support Esperanto characters - the famous ĉapelliteroj - such as Ŝ, Ĉ or Ĥ. When I translated Mario Kart 8 Deluxe into Esperanto I had to manually create these letters, which wasn’t too hard especially given that the game used raster images for its letters, but often we are working with vector images - and while yes it is possible to manually add the missing letters - this is very hard and some fonts are just very challenging to copy because of their unique style!

But this problem isn’t only limited to Esperanto. Most European languages use special letters. German has the famous Umlauts Ü, Slovenian has Č, French has Ê and Portuguese has Ã. And let’s not even talk about languages which use completely different fonts like Thai, Bulgarian or Mandarin. It’s not even limited to the translation community either! Imagine you find a font online that you really like and that you would love to use in your projects, but alas! The font only includes English letters…

All of this leads to the question: Is it possible to automatically generate new letters in the style of the font, without having to modify anything manually? This is the question I sought to answer in my latest project: Tiparilo.

Introduction
#

The plan
#

The plan would be to use a generative neural network that would be trained on fonts which cover a great deal of the UNICODE characters. Then the user could load a .ttf or .otf file and select which missing characters to infer. The generativee model would then be trained on the new style and infer the vector shape of the missing characters and add them to the .ttf or .otf file.

Raster vs Vector Graphics
#

First and foremost we have to differentiate between raster and vector graphics.

Raster images have fixed resolutions and each pixel has a fixed colour value - raster images appear pixelated when their size is increased.

Vector images on the other hand have unlimited resolutions, they are stored as geometric functions and can be resized as much as wanted and will still appear crispy.

Comparison of vector and raster graphics

Old hardware often relied on bitmaps - a very simple form of a raster image - for letters, but most modern computers use vector images for letters, because one can increase the size of each letter without the letters becoming pixelated. This is important, because a lot of generative networks work with raster images, but I want to generate new vector images based on existing ones.

Most common typeface classifications
#

There are many font families (or typefaces) but the most basic ones are serifs (these contain special design features at the end of the letter), sans-serifs (without special features on ends), monotypes (which use equal spacings and are typically used in technical environments), scripts (hand-written fonts) and display or decorative typefaces (which have unique shapes but aren’t handwritten). These common font families are common across most scripts (including Cyrillic, Greek, Thai, Devanagari, Korean, Chinese, Japanese) though with their own unique features.

Basic type classifications

TTF vs OTF
#

Bitmap vs TrueType

The most common font formats are TTF (TrueTypeFont) and OTF (OpenTypeFont). TrueType is a font standard which was developed by Apple in the 1980s, but is now used across many operating systems. TrueTypeFonts save their characters in the form of line segments and quadratic bézier curves:

$$ p(t)= (1-t)^2p_0 + 2t(1-t)p_1 + t^2 p_2 $$

Image of a bézier curve

Complex glyphs are then just a sum of bézier curves and straight lines:

Image of a letter as a sum of vector components

TTFs save a lot of information! From their relative position and size in a coordinate system, metadata to kerning and hinting.

Kerning is pretty special - the distance between letters is actually not the same, we have to adapt their relative distances so that we perceive an equal spacing.

Kerning demonstrated

Finally there is also font hinting, which uses clever tricks to adapt the letters on low-resolution displays - for example by making the left edges red and the right edges blue.

Hinting demonstrated

The OTF standard on the other hand was developed by Microsoft and Adobe - it is an expansion of the TTF format. Fonts can now for example be saved with cubic bézier curves:

$$ p(t)= (1-t^3)^2p_0 + 3t(1-t^2)p_1 + 3(1-t)t^2 p_2 + t^3p_3 $$

OTFs can also combine multiple letters into a single symbol and also allow for alternate characters such as swashes, stylistic alternates and decorative forms.

UNICODE
#

Letters and symbols aren’t saved as letters and symbols in computer memory, but rather as strings of 1 and 0. There were many standards that were used in the past, but today Unicode rules supreme. Unicode supports 168 scripts and 154 998 characters. Unicode UTF-8 which is the modern web standard has a neat way of encoding symbols! The first symbols in the Unicode table are represented with just one byte (such as the basic latin alphabet, which is the most common character set online and in the world), but the letters which appear later on can be represented with up to 4 bytes. The following table illustrates this quite well:

CharacterUnicodeUTF-8 encoding
Letter yU+007901111001
Letter äU+00E411000011 10100100
Violin key 𝄞U+1D11E11110000 10011101 10000100 10011110

But the most relevant part of this excursion into Unicode is the fact that Unicode is divided into 17 planes, each containing many blocks. The most relevant plane for us is the Basic Multilingual Plane (BMP), because it contains all modern scripts from Latin to Chinese - other planes contain historic characters, emojis etc.

The first block of BMP contains basic latin characters, the second one contains supplementary letters for French, Portuguese, Spanish and German. The following two blocks expand latin even further. The Esperanto letter Ĥ is contained in the extended-A latin block. Other blocks then include Greek, Cyrillic, Arabic, Devanagari, Chinese, Japanese, Korean and so much more. Finding fonts which cover as many symbols in the BMP will be important, because these will be used to train the tool and teach it the most important patterns.

Generative Models
#

To make Tiparilo a little bit of research on generative models is needed. There are the big models that are used in image generation.

GANs
#

Generative Adversarial Networks involve two separate networks, which compete against each other:

  • Generator (Model that is used to generate content)
  • Discriminator (Model that is used to classify examples as real or fake)

The generator is a deep neural network which uses random noise to generate images - it learns the patters in a training set and then transforms the random noise to a image.

The discriminator is a convolutional neural network, whose purpose it is to classify an image as real or as fake. It is trained to detect fake samples more accurately.

Let \(J_G\) and \(J_D\) be the measure of how well the generator or discriminator is performing, \(x_i\) a real data sample, \(G(z_i)\) an image generated from an input, \(D(x)\) the probability that the sample is real, then we get two equations:

$$ J_G = - \frac{1}{m}\sum_{i=1}^{m} \log{(D(G(z_i)))} $$ $$ J_D = - \frac{1}{m}\sum_{i=1}^{m} \log{(D(x_i))} - \frac{1}{m}\sum_{i=1}^{m} \log{(1-D(z_i))}$$

These represent the loss functions of the generator and discriminator. During the training process the discriminator is fed with a batch of real images and a batch of generated images. Based on the performance, the weights in both the discriminator and generator are then adapted, the discriminator tries to maximise the loss function, while the generator tries to minimise it.

graph TD; A[random input] -->|Input| B[Generator]; B -->|Generates| C[generated image]; D[real data samples] -->|Input| E[Discriminator]; C -->|Input| E; E -->|Evaluates| F[is it correct?]; F -->|Yes| G[Update Discriminator Weights]; F -->|No| H[Update Generator Weights]; G --> E; H --> B;

GANs produce high-quality sharp images but the training process is often instable. There are many variants of GANs, but the most relevant variant in the context of this project would be the conditional GAN which use an additional conditional parameter to guide the generation process.

VAEs
#

Variational Autoencoders (VAEs) are type of generative model in machine learning that create new data similar to the input they are trained on. They not only compress and reconstruct data like traditional autoencoders but also learn a continuous probabilistic representation of the underlying features.

VAEs can be trained easily but the images they produce typically aren’t as sharp as those generated by GANs.

Comparison of images generated with VAEs and GANs

Diffusion models
#

Denoising Diffusion Probabilistic Models are also generative models, which use random noise as input and then then gradually shapes the noise into an image.

The model learns by taking an image from the training dataset, it adds noise in each step and then the model predicts the noise that was added.

Then it is possible to take a random noisy image and the model removes a bit of noise with each step until a sharp, realistic image is created. There are likewise many variants of diffusion models and they form the basis of DALL-E and Stable Diffusion - they have the drawback however that they’re computationally very heavy, have long training times and have problems with glyphs.

Comparison of images generated with VAEs and GANs

Existing attempts and challenges
#

As far as I looked on the internet, I could not find a openly-accessible tool that automatically expands a font, so I’m the first one to work on the topic and I hope I can solve the problem well.

Måns Grebäck’s 🇸🇪 tool is the closest one to what I’m aiming for. He used an AI to complement existing fonts and create completely now ones as well, but as far as I can tell the tool is propriertary and works only on raster images - the vectorisation needs to be done by hand.

He used Stable Diffusion 1.5 and trained the model on the following categories: blackletter, boldscript, brushscript, handwriting, scriptwriting, serif, and tallsans, where each category contained 10 fonts, which do not overlap stylistically with other fonts.

There’s also a two very interesting papers I would like to address. First up is GlyphGAN: Style-Consistent Font Generation Based on Generative Adversarial Networks by Hideaki Hayashi 🇯🇵 et al. The paper focuses on the automatisation of new font creation. One of the chief motivations of the papers is to quickly create hiragana, katakana and kanĝi glyphs. GlyphGAN takes in a double input vector - a style vector and a character class vector, which determines which characters are generated. It is a deep convolutional conditional GAN that uses ReLUs for all layers except the output layers which use sigmoids. The GAN used 6561 fonts for training and was trained on raster images. The results were pretty impressive and the glyphs all retained the same style.

Finally there is JointFontGAN: Joint Geometry-Content GAN for Font Generation via Few-Shot Learning by Yankun Xi 🇺🇸 et al, which is a very interesting paper, because it shows a model which can derive fonts with very few available font samples. JointFontGAN uses a extended conditional GAN - this is effectively a model that is made of two separate GANs, one focusing on the general shapes and the second one on the details of the glyphs. I really like its few-shot learning feature - it can learn the style of a font from a set of glyphs and then infer the shape of other glyphs - however only as raster images. Two datasets were used to train the model Capitals64 and SandunLK64 encompassing a whooping 20 000 fonts with all the basic latin letters and punctuation. The results look very impressive!

There are also some unique challenges we need to address. One of them is that some letters don’t have consistent conventions. The best example for this is the letter Ĥ, specifically the lower case letter ĥ, where it is not so obvious where to place the breve/ĉapelo:

Image of various ĥ-conventions

Thus I’m very curious how any generative tool can be set to create the famous Esperanto letter ĥ. For curious ones I can really recommend this video:

Finally there’s the question of vectorisation. This project will seek to create a GAN that can complement missing glyphs in vector form from exsiting vector glyphs, but perhaps this could turn out to be too ambitious in which case I will attempt to first generate raster images and then add a vectorisation tool.

graph TD; A[Incomplete vector font] --> B[Tiparilo]; B --> C[Completed vector font];

And if this Ansatz fails,g I’d attempt:

graph TD; A[Incomplete vector font] --> B[Rasterised font]; B --> C[Tiparilo]; C --> D[Vectorisation]; D --> E[Completed vector font];

Implementation
#

Fonttools
#

One of the first things to do is to see if we can access and extract letters from the font files. We need to install Fonttools with pip (and also svgwrite to write the output to svgs). I’m going to use the Game Boy Boot font by Akihiro which I also use for my website.

from fontTools.ttLib import TTFont
from fontTools.pens.svgPathPen import SVGPathPen
import svgwrite

path = 'Gbboot.ttf'
letter_a = 'a'
svg_path = 'a.svg'

def a_to_svg(path, letter, output_path):
    font = TTFont(path)
    glyphSet = font.getGlyphSet()
    pen = SVGPathPen(glyphSet)
    glyph = glyphSet[letter]
    glyph.draw(pen)

    # SVG
    dwg = svgwrite.Drawing(output_path, size=('500px', '500px'))
    path_data = pen.getCommands()
    dwg.add(dwg.path(d=path_data, fill='black'))
    dwg.save()

    font.close()

a_to_svg(path, letter_a, svg_path)

Executing this code gives us… 9?

Image of the letter a

What if we want to extract the capital letter A?

Image of the letter A

Aha the letters are upside down! Perhaps this is because svg and ttf use different y-axis coordinate systems, but the choice of coordinate systems is not important for this undertaking. What matters is that we can print the letters and check if the tool that is being built works as it should!

Setting up the cGAN
#

Sources
#

Reply by Email