A crowdsourced visual morphology for letters

Olivier Morin 

Pierre Déléage

The inventors of visual symbols managed to create shapes that are diverse and distinguishable, usually by combining a relatively small set of basic building blocks — for instance, in latin minuscules, the arch in “n” is repeated in “h”, redoubled in “m”, etc.  Written symbols are combinatorial symbols. Some scripts avail themselves of this combinatorial principle more than others. Some, like the Canadian Aboriginal Syllabics (where a typical phrase can read: “ᐁᐘᑺᒪ ᓃᔩᓇ”) use small set of basic shapes, extensively recombined  In other scripts (e.g. the Vai script of Liberia, where a typical phrase can read: “ꕉꕐꕮ ꔔꗌ  ꖸ ꔰ E“) the building blocks are much more numerous, many of them occurring in one or two letters only. We want to find out to what extent scripts use this combinatorial principle, and whether they do so efficiently, creating letter shapes that are numerous and still easily distinguishable. For this we need a general morphology for letter shapes, similar to what phonologists have done for the sounds of language. Yet no suitable dataset exists: descriptive typologies of letter shapes are limited to one or a few scripts, and seldom allow for comparison across vast cultural distances. We aim to overcome this lacuna using a crowdsourcing browser app. It will invite large numbers of users to sort the letters of several writing systems unfamiliar to them, according to intuitive rules. A system of points will encourage participants to produce as many distinct sorting rules as they can. For each script, we will draw an inventory of the rules most commonly proposed. Such a set of rules represents a classification of letters that is specific to a given script, but allows us to answer a range of comparative questions. First, it will allow us to quantify the regularity of various scripts: in some scripts, the variety of letter shapes will be captured by a smaller number of rules, while the letters in other scripts will defy a simple classification. We will track this difference quantitatively and build models to predict it. Second, we seek to know whether or not letter shapes are as distinctive as they can be. In any script, the basic constituents of letters and their assembly rules define a space of possible shapes. It could be that all the far corners of the space are occupied (optimal exploration) or on the contrary, all letters may be clustered in the same area (sub-optimal exploration). The sounds of spoken language are known to obey the first principle: we will see whether this holds true for letter shapes.

previous next