The Turing Test developed by Alan Turing is a test of a machine's ability to exhibit intelligent behavior equivalent to or indistinguishable from a human. In the test human observers decide whether the responses are produced by a bot or human.
The original proposal involved dialogue taking place between two keyboards (teleprinters) situated in seperate rooms. Today’s chatbots essentially perform the same task.
Companies such as Apple and Amazon invest a lot of resources making their chatbot dialogue sound convincingly human. They currently use teams of human evaluators to analyze conversational exchanges and make improvements but the process is lengthy and costly. What if you could speed up the process of evaluating dialogue responses using… a machine?
Researchers at McGill University in Montreal Canada set out to build an Automatic Dialogue Evaluation Model (ADEM).  During testing the model was able to produce exactly the same rankings as humans when evaluating responses.
The team plan to release the model as open-source.