Possible Accidental Complexity in Nexmo Voice API

Intro:

Accidental and Essential complexity are the terms that were mentioned in the book The mythical man month.

In my own words, the definitions are,

Accidental Complexity is a complexity that software has by accident and could be avoided with a careful thought process. For example, ArrayList offers indefinite element addition. If you ask the user resize the internal array before adding each element, that would be insane. Because that can be internally taken care without the knowledge of the user.
Essential Complexity is a complexity in the software that occurs by the nature of the problem. For eg, to communicate a server with a self-signed certificate, the user must take the responsibility of providing the truststore or necessary configuration to ignore SSL check. Because it is not possible for the JVM to make a choice. This type of complexity is inevitable.

In an ideal system, there must be no "Accidental Complexity". Extreme care and a review by third-eye are required to avoid this type of complexities.

Recent experience with Accidental Complexity:

Problem:

Recently, I was playing around with Nexmo Voice API. Basically, the Nexmo Voice is to connect a call to a given number and to talk a given SSML or stream audio.

There are a lot of APIs for Nexmo Voice. But let’s consider a couple of APIs to discuss the concept,

We have a startTalk to talk a given message in the call identified by the given id.
A stopTalk to stop speaking in the call identified by the given call id.

Assume that you have to speak a message, pause for 2 seconds, talk another message and stop talking. You might expect the code something like this,

client.startTalk("<call Id>", "SSML for message#1"); #1
Thread.sleep(2000); #2
client.startTalk("<call Id>", "SSML for message#2"); #3
client.stopTalk("<call Id>"); #4

But it won’t work that way.

The Nexmo server will respond to your call asynchronously.

Hence,

Message in #1 will be spoken for 2s, whether the 'message #1' is completed or not, Nexmo will start talking 'message#2'. Hence if 'message#1' is not completed, you will hear those 2 messages spoken in parallel.
Immediately after that, whether the "messages" are completed or not, stopTalk will abruptly stop the messages being spoken.

Hence to have it work in the expected way, you have to code in the below way,

client.startTalk("<call Id>", "SSML for message#1"); #1
Thread.sleep(<number of seconds taken to complete speaking message#1>); #1.1
Thread.sleep(2000); #2
client.startTalk("<call Id>", "SSML for message#2"); #3
Thread.sleep(<number of seconds taken to complete speaking message#2>); #3.1
client.stopTalk("<call Id>"); #4

You have to experiment and calculate the number of seconds required for "#1.1" and "#3.1" to have the Nexmo speak the message in expected way.

Now the problem is "Nexmo expects the caller to know the number of seconds required to speak a given message", to synchronize between multiple messages. In my opinion, this is an "Accidental complexity". A client can just give the set of instruction to be executed by Nexmo server. It need not know how much time the server takes to execute the instruction.

Proposal:

To avoid the "Accidental Complexities", probably Nexmo might have implemented a queue based approach, in which the server executes the instruction one by one that is being called on a "call id". So that the works just fine,

client.startTalk("<call Id>", "SSML for message#1"); #1
Thread.sleep(2000); #2
client.startTalk("<call Id>", "SSML for message#2"); #3
client.stopTalk("<call Id>"); #4

Summary:

In case, if we need a facility to preempt, there can be additional calls such as preemptTalk, preemptStop which pre-empts the calls that has been made already.
As I used Nexmo, the APIs that we are talking about has nothing to do with what the other user talks, so it can be exected as fire-and-forgot.
If you still think, you need the status for each invocation before making the next call, the Nexmo can provide the libraries that let us write the Nexmo instructions conditionally or platform that supports DSLs with conditional instructions for Nexmo, but never adding the complexities to the users as mentioned above.

Note: Nexmo is one of the widely used servers for Voice. I believe there might be some points on the design considerations that I miss. In that case, this article will open up the discussion on that front.

If you found the article helpful, please share or cite the article, and spread the word:

For any feedback or corrections, please write in to: kannangce@rediffmail.com

My Thought Buddy

(->> thoughts (filter #(curious? %)) (post-here))