One of the most intriguing technologies on the original Star Trek TV show was the universal communicator. The writers of the show needed to explain how aliens from across the galaxy were able to speak English. The answer was that the human crew of the Enterprise was wearing a device that listened to the other speakers and instantaneously translated their spoken words into English.

Just a couple of weeks ago, I saw a demonstration by Microsoft regarding a new technology that they had incorporated into a real time remote communication tool called Skype. Skype has been around actually for quite some time and it is used by millions of people to have free conversations over the web, whether to a friend across the street or to a family member across the world. I personally make tremendous use of it, including the option for video calling, so that my mother can see her grandchildren any time she wishes. The new feature introduced by Microsoft was a real time translation tool that listened to the conversation and immediately translated the spoken words into the desired language. The demonstration was between a German-speaking and English-speaking individual. But it could have been between any two languages that the system supports. Was the translation perfect? No. Was it good enough so that friends could communicate freely? I would say, definitely yes. Was it good enough that business partners from across the world could have a serious meeting where important decisions are being made? Probably. The important thing to remember is that in five years from now, Skype will be much better than it is now. And the same can be said about every other text to speech and real time translation tool.

The implications for medicine are tremendous. Especially if one lives in a multicultural environment, having the ability to carry on a serious conversation with a patient is critical. Imagine a case of a patient after a car accident where no one in the emergency room speaks the patient’s native tongue. At the very least, this is incredibly stressful for the patient. In the worst-case scenario, the patient will be unable to tell the doctor about a critical illness that could literally cost the patient’s life if not considered when beginning treatment. So, I personally hope that this type of technology will become standard very soon in all medical environments. Actually, I think this could be a killer application [which means an application that drives people to buy a given technology] for smart watches. Imagine the doctors simply placing the smart watch in between him or herself and the patient and then speaking freely, as the watch speaks out the translation. This literally could be transformative in healthcare. I am not aware of any hospitals presently working with Google or Pebble or Apple or any of the smart watch companies to bring this technology to bear. I hope that it will be implemented soon.

Perhaps it is the technologist in me that continues to be astounded by a computer’s ability to understand language. In this case, the use of the term “understand” does not mean that the computer actually has a mind that can internalize and even emotionally respond to language. In this case, the term “understand” means that the computer can intake natural language in the same way that it might intake a typed computer program. The computer then analyzes the syntax and semantics and context of the human language, in a similar way that the computer analyzes a software program. One of the key missing elements in understanding human language, which is also called natural language processing [NLP], is “common sense”.

Common sense is that magical ability we have to understand the following phrase: “I was at the beach and went into the water. It was cold”. First of all, humans inherently understand that there is a connection between being on a beach and having access to water. Also, when the individual refers to the fact that “it was cold”, the greater likelihood is that the speaker is referring to the water being cold. Why? Because people tend not to go to the beach on a cold day. More so, if this phrase is being spoken during summer months, then the likelihood is that the air temperature will be warm, not cold. Let’s assume that the previous sentence to the phrase above, was “I was in Florida last week”. The assumption now is that the following statement refers to a beach in Florida and not in the speaker’s home city. And, if based on the weather for the last few days, Florida has been consistently warm, this further confirms the assumption that the word “cold” was referring to the water.

We do all of this language processing in a fraction of a second and subconsciously. Before we even get to the point of understanding the meaning of the various phrases, our brains have to break down the sentences spoken into components. So our brains determine what is the subject and what is the object and what is the descriptor in these sentences. A linguist could give far more detail regarding the entire process of understanding a sentence. But even with what I have written here, I believe it is quite clear that we are demanding a tremendous amount from a computer in order to mimic a human’s ability to understand human language. A lot of the understanding of such sentences is based on the human characteristic of having “common sense”. It is common sense that water is wet and that it can be cold and that Florida is warm and that people avoid the beach when it is cold outside. This information is not only important, but is actually critical, for a human to fully understand the sentences above. As computers understand more and more of the context in which we speak, and our personal context i.e. where we are and what we are doing, as well as the associations between concepts like wet and beach and sand and cold and so on, computers will get better and better at NLP. There is a more detailed discussion of this whole concept in the following article. It is definitely worth reading.

Thanks for reading and thanks for listening.