Preserving the treasure of rarely-used words

Rarely-used words are a treasure; we should use them rarely. Human languages are rich and there are many possible sentences someone can use to express a concept. Try to make a good use when drawing from this treasure because each time you use a word you take away a bit of the words' singularity.


Our ancestors devised a number of tools to help us arbitrage which construction is best for expressing a particular concept. One such tool is information theory, which is at the root of most, if not all, recent communication breakthroughts.
This is not an essay on communication theory and I'll just use some fascinating findings to hand-wave a number of properties of human languages: on one hand, frequent words are generally short to save resources when transmitting sentences (i.e., they necessesitate fewer characters, fewer paint strokes than their less-frequent counterparts). On the other hand, prevalent words carry fewer information (i.e., you can remove a `the` and still understand this sentence, but the sentence becomes misleading if you remove `prevalent`). The only thing preventing us from writing un-necessary verbiage is the amount of energy and time one is willing to spend writing and speaking, or listening and reading a piece of document. We do not have infinite time/energy. Hence, what matters at the end of the road is how precisely one can express a concept within an acceptable given word-budget. A word budget can be well-spent, in which case redundancies are rare. But a word budget can also be misspent -- because communicating is difficult.

Speaking, writing is diffcult. Most often, short sentences are better than long sentences, simple words are better than elevated language, and direct style is more engaging than convoluted constructions. Unfortunately it requires effort to come up with short, simple, and direct sentences. In particular, face-to-face and phone-based discussions happen synchronously. That is, your audience is busy waiting for your opinion. It is quite impolite or uncomforting (in some culture) to make your audience wait for an extended duration before replying. Regrettably, we humans are poorly-trained to build flawless sentences on-the-fly. We also are quite tolerant of clunky sentences, because we want to be nice. As a result, we do not help each other much. This status quo is self-reinforcing: we get used to bad communication, which means the lower efforts are required to communicate and we do not train to communicate above this level. More concerning, once a rarely-used word is being introduced at an increasing frequency, there is a risk that we adopt an imprecise or a misleading meaning for the word.

This page is my attempt to preserve the rarity of some infrequent words: technical jargon in computer engineering. In this field, we ought to describe computer programs, system behaviors, or interfaces accurately. At the same time we need to bend the meaning of human words to fit the mechanical processes of computer codes. Most often, it's just our human languages evolving and we should not fight excessively, but sometimes people will misemploy a word or stretch a concept to gain some advantage, to sell more products, or attracting more mindshare. Such situations are unfortunate and this page discusses when to best use some technical terms.

-= Composable =-

Software is built from small pieces put together in a program or a library. Software development may look a bit abstract to people out of the software industy. Developers will often explain their craft using car-analogies or references to Lego-bricks to illustrate that a final product is assembled from many small parts which fit together thanks to articulations and connections. An equally valid -- yet less flattering -- analogy of software development is the sandcastle: we can easily reshape a sandcastle by taking or adding small bits of sand; we don't really know why it stands but it always breaks apart after a short timespan. Summarizing, we can build castle with Legos or with sand but the experience will be different and the results will have vastly-different properties. It turns out that developers have many choices to write programs and systems. "Prefer composition over inheritance" is a sentence you may have heard before, "composable configuration" is another one. It seems that "composable" is an attractive property of software and you will see it used more and more. Pay attention because the term "composition" is often used when "combination" is more adequate (but less flattering because of "combinatorial explosion"). I hope I can convince you that these two notions diverge in subtle-yet-crucial ways for software engineering.

In a sentence, composition is a form of combination that destroys information. That is, after combining two items into a new item you no longer want to access these items as you would have done before combining them. What object-oriented programming (OOP) texts presents as 'composition' in the famous mantra "composition over inheritance" is essentially 'combination' with 'encapsulation'. Sometimes, OOP texts distinguish 'composition' from 'aggregation'. Hence, restricting the definition of 'composition'. I find this restriction essential but unsatisfying because it needs another concept of 'ownership' of objects. Ownership of objects does not translate well outside OOP, yet we use 'composition' for templates or configurations too (what really matters is the dependency link). Thus, there is something more and I believe the meaning of 'composition' is stretched in the OOP mantra.

Let's take an example using simple structures: natural numbers like zero, one, two etc. Natural numbers have little content and "just exist" thanks to the power of the human mind and a few axioms. One can take two numbers and combine them. For instance, one can sum two numbers:

C = A + B
One could also combine numbers using their product:
D = A x B
Last but not least, one could combine numbers by storing them in a two-element list:
L = [A, B]
I would argue that the first two combinations capture the essence of 'composition' but not the later one. This example seems silly or outrageous but I assure you that my intentions are good: we can get some intuition about the nature of composition by looking at the difference between the sum and product on one hand, and the list on the other hand. Where we do 'composition' the right-hand side is no longer needed once it has been reduced to the left-hand side. That is, we can destroy information from our program. In the blatantly not-composed list example, we however did not delete information, we added more structure to hold the same amount of information than before. Here lies the distinction between 'composition' and 'combination': one destroys information and the other adds information. The more information we need to process, the more time-consuing it becomes, hence it's normal that we prefer programs and library which "compose well".

I like the number sum vs. number list example because it is simple enough to give us an intuition distinguishing 'combination' and 'composition'. One could argue that this example is not honest to god because one way to represent numbers is using a linked-list of "nothing" and the sum of two numbers is again a list, hence the sum of two numbers is not destroying any of the two original lists. Let's have a close look using the following (ill-defined) notation:

 3 = () - () - () - . 
 4 = () - () - () - () - . 
which means we sort of have these two foundamental rules:
 0 = . AND +1 = () -
Using this notation, summing values means concatenating the two lists.
7 = 4 + 3 = () - () - () - () - () - () - () - .
It's not clear whether we destroyed information, let's have a look at making a list of numbers (abusing ill-defined notation):
[4, 3] = (() - () - () - () - .) - (() - () - () - .) - .
There is much noise on this line, but the distinction between the two operations becomes clearer: we have a single 'dot' after summing two numbers; we have three 'dots' when combining tow numbers in a list. These disappearing dots, I claim, are the information one destroys by combining two numbers. You already know that if I gave you 7, you have no clue what the original numbers were; I could have built 7 with '5+2'. Hence '5+2' is interchangeable with '4+3' but [5,2] is not interchangeable with [4,3]. I know this last statement teaches you nothing. However this whole disgression had the only point to show before your eyes that there exist at least two combination process with different mechanics, and you knew about it already. Hence, pay attention to marketing speech conflating the two notions, ask yourself which information is destroyed or added when combining elements.

-= Abstract =-

You might have heard that some codebase is too abstract when a better term would be to use indirected.

Most code is terrible, my own code likely is a good contributor to the terribleness of code in general. When discussing bad code, developers often refer to "too-abstract" code when they actually mean "too-indirected" code. Indirection is useful: indirection allows to setup decoupling points between two subparts of a same system. For instance, to let tests run in a different configuration from the production code. That said, indirections are often over-used. Resulting in "too-indirected" code: the business logic becomes burried within layers of adapters, interface implementations, and other flavors of dependency injections. Unfortunately, indirections come with their own logic and do not _compose_ well with other indirections. One result of integrating multiple libraries with varying flavor of indirections produces hard-to-follow code. In such code, the glue-code logic outweights the business-logic code. Codebases with lots of indirections give rise to code with a configuration ceremony, where passing arguments requires a dance with leaps and bounds, jumps, and hops. Such symptoms do not correspond to abstract code, they correspond to indirected code.

Abstractions have a different nature than indirections. Abstractions are mental constructs more than code constructs. Abstractions lets us speak using accurate descriptions of a behavior for a class of systems. That is, rather than being useful once, an abstraction is all about re-use. Although indirections enable re-use, indirections do not help people have accurate and useful discussions outside code discussions (e.g., refactoring how a parameter trickles down to the business logic vs. find a parallel in another business domain).

It is correct that abstractions make code hard to follow for people who do not know a particular abstractions. However, abstractions can be taught outside the scope of a particular problem. For instance, you can learn about convexity in mathematics and use it in your code as a property of a particular function. Someone not well-versed in mathematics might not understand the importance of this property and hence, may be unable to grasp how such a library works. The converse is not true: code that defies human understanding is not always abstract.

Indirections make code hard to follow for a simple reasons: indirections add distance between where objects in code are created and where they are used. Indirections can take different shapes: layers, factories, super-classes, dependencies, etc. In short, there is no lack of imagination for hiding indirections behind a nicer name, but keep in mind that abstractions are not indirections.