The Future of Human-Machine Communications: The Turing Test

The Turing test is regarded as the ultimate milestone for machine communication abilities. It involves distinguishing between a human and a computer in their responses to unrestricted questions. Here we investigate the nature of the communications, including behaviors and interaction in real Turing test scenarios to assess the state of play. In particular we consider the effects of lying, misunderstanding, humor, and lack of knowledge, giving actual examples of each. We look at specific cases of machines performing well and consider transcripts taken from the 2014 Royal Society experiment when a machine first passed the Turing test.

Turing’s imitation game, commonly known as the Turing test, was originally posed as an alternative to the question of whether or not a machine could be said to think (Turing, 1950). Since that paper appeared, a lot of discussion has focused on the concept of machine thinking and whether it can be human-like at times or even whether it will ever be possible to copy human thinking in every aspect (Dennett, 1998; Dreyfus and Dreyfus, 2009; Minsky, 1982; Shah, 2010). Turing suggested: “May not machines carry out something which ought to be described as thinking but which is very different from what a man does?” (Turing, 1950, p. 435). As a result some researchers in the field regard the test as laying the foundations for what we now know as artificial intelligence (AI), even considering it to be AI’s ‘‘empirical goal’’ (Harnad, 1992).

BBVA, OpenMind, The Future of Human-Machine Communications_The Turing Test_KEVIN WARWICK_HUMA SHAH_Circuit board. — Circuit board.

What we look at here is the imitation game itself in terms of its practical instantiation with regard to human-machine interaction. The game actually involves human interrogators attempting to ascertain the nature of hidden (human and computer) entities with whom/which they are communicating. As indicated by Turing (1950), each discourse lasts for a period of five minutes only and at the end of that time the interrogator is charged with making the “right identification” by clearly identifying the nature of their hidden discourse partners by declaring which is the human and which is the machine.

In considering the game in further depth, one is faced with numerous intriguing questions regarding human and machine communication and behavior. When comparing a machine’s ability to communicate with a human interrogator one immediately has to consider just who they are communicating with and the fallibility, biases, and preconceptions of that person. One also must take into account important aspects of human nature such as lying, misunderstanding, lack of knowledge, and humor, never mind stupidity.

Over the last few years, a number of practical Turing test sessions have been organized involving some of the best conversation machines in the world, these followed as closely as possible with the test description as given by Turing himself in his seminal paper of 1950. One set of such experiments was held at Bletchley Park, England, in 2012. Another was held at the Royal Society, London, in 2014. The latter involved the largest number of tests ever staged in any single event.

In this article, the authors report on actual transcripts from these tests as a basis to investigate just what it takes to fool a human interrogator and how examples of the use of humor and lying have affected decisions. In addition, we look at a series of cases in which human communicators have been clearly categorized by interrogators as definitely being machines and others in which machine communicators have been clearly categorized by interrogators as being human. The reader also gets a chance to test their own powers of analysis in being asked to decide on the nature of hidden entities in specific transcripts: is the hidden entity a human or a machine?

The transcripts between judges and hidden entities presented here are taken from tests in which a human judge carried out a five-minute-long conversation with two hidden entities in parallel. One of the entities was a human and the other was a machine. It was very much up to the judge as to the nature of the conversation and it was their decision as to how much time they spent conversing with each of the entities.

In a particular session a judge conducted five separate tests. In their first test they witnessed a hidden human pitted against a hidden machine. Of course the judge would not know which was which, they would simply be aware of two hidden entities and have to make their own decision on the nature of the entities, although they had been informed a priori that one entity was human and one was a machine. The second test conducted by the judge then involved a different human pitted against a different machine, although again they would not be aware of each entity’s nature. And so it would go on until the judge had conducted all their five tests in that session. At the end of each test they were asked to state for each entity if they thought that it was a human, a machine, or if they were unsure.

In the tests, the hidden humans were asked merely to be themselves, humans, although they were requested not to give away their specific identity or personal information. They were not given any incentive to behave in any particular way and were given no (incentive) payment at all. Of course this did not prevent any human from giving false information, which is something that humans do frequently. The tests were “unrestricted conversations,” which meant the judge could ask anything or introduce any topic within the boundaries of courtesy (the judges had been informed that there may be children among the hidden human entities).

Practical Turing Tests

The conversations presented here were realized as a result of five-minute-long tests of human judge-hidden entity interaction, to conform to Turing’s original wording in computing machinery and intelligence (Turing, 1950). We are aware that there are those who take issue over suitable timing and what Turing actually meant (Shah and Warwick, 2010a)—that is an argument for another day, it does not alter the points made in this paper.

What this paper does is to present a number of transcripts taken from special days of practical Turing tests, which were held under strict conditions with many external viewers first at Bletchley Park, England, on June 23, 2012. The date marked the 100th anniversary of Turing’s birth and the venue was that at which, during World War II, Turing led a team of code breakers to crack the German Enigma machine cipher (Hodges, 1992). The second set of tests was held on June 6–7, 2014, at the Royal Society, London, of which Alan Turing was a Fellow. Five different machines took part in both sets of tests along with thirty different judges and thirty hidden humans against which the machines were compared to in terms of their conversational ability. Although the machines were common to the two experiments, the judges and hidden humans were a different collection of people.

In this article we are certainly interested in how good or bad the machines are, indeed we want to look at how good they can be. But we are also interested in the operational performance of the judges and specifically how they interacted in conversation with hidden entities. In considering things in this way, however, questions can also be raised with regard to the hidden humans. Nevertheless we see these factors as very important aspects of the test. In particular it is important that it is a human judge who takes part. The quality of the conversation is as good or bad as is witnessed by the judge.

Hidden humans are, by definition, human, but (Shah and Warwick, 2010b; Warwick and Shah, 2015a) can themselves be misidentified on occasion. Along a spectrum, some humans are loquacious, others tend toward introvert, and many fall in between. Thus, an attribution of humanness by a human interrogator to a hidden interlocutor in a practical Turing test is dependent on the judge’s own values of what constitutes human-like conversational behavior. This paper focuses more on the humans involved in practical Turing tests and how this impinges on our “understanding” of artificial intelligence when humans are misidentified as being machines. Good performance of machines, with numerous examples, is discussed elsewhere (Warwick and Shah, 2014a), although we do give an example here for comparative purposes.

A major reported criticism of the test has been that “the imitation game conditions say nothing about the judge, but the success of the game depends crucially on how clever, knowledgeable, and insightful the judge is” (Hayes and Ford, 1995). Because of the tests considered, we not only investigate this criticism further but also look into Turing’s statement that the test/game can be considered as a replacement for the question “Can machines think?” (Turing, 1950). While it is acknowledged that the results in each case depend on the performance of the judge, far from the conditions of the game saying nothing about the judge this aspect is seen here to be very much a critical part of the test itself. Importantly, in the test, machines are pitched against (hidden) humans under the critical analysis of other (interrogator) humans. These are all very important aspects of what the test is about and are certainly not points of fallibility of the game as suggested in Hayes and Ford (1995).

In the sections that follow, we look at different examples of practical tests and attempt to cover a wide range of problem areas as they exist at present, which the test highlights. In each case, discussion on the transcript is carried out within that section, where it is pertinent, rather than in a separate discussion section. We do however make a number of universal comments in the conclusions section toward the end of the article.

The transcripts considered in this paper appear exactly as they occurred. We have not altered the sequence or wording or corrected the spelling in any way. Once an utterance was output, it was not possible for the judge or hidden entity to alter it in any way. The timings shown are accurate, actual timings on the days (UK time) involved. Any spelling mistakes or other grammatical errors were exactly as they happened. They are not due to editorial errors. In the transcripts, the interviewer/judge is always denoted as “Judge” whereas the hidden interlocutors, machine or human, are denoted as “Entity.”

“Normal” Conversation

We include an example here simply to give an idea of a typical discourse exchange and a feeling for the potential content during a total length of five minutes. Also, it gives an indication of the sort of conversation an interrogator must base their determination on, of the nature of the hidden interlocutor, human or machine.

Transcript 1

[15:44:55] Remote: hi

[15:44:58] Judge: hi

[15:45:06] Entity: how’s your day been so far?

[15:45:12] Judge: very interesting

[15:45:15] Entity: why?

[15:45:27] Judge: i have been quite busy

[15:45:31] Entity: doing what?

[15:45:47] Judge: I was working this morning

[15:45:53] Entity: oh, me too

[15:46:07] Judge: oh

[15:46:30] Entity: i was giving a talk at the Apple Store in Regent Street. Have you been?

[15:46:51] Judge: I think so — though I know the one in Covent Garden better.

[15:47:04] Entity: what do you like about that one?

[15:47:23] Judge: Apple staff are always really helpful — and I love Apple

[15:47:40] Entity: yes they are. the stores are more about playing than buying don’t you think?

[15:48:25] Judge: Yes most of the time — that is one of the great things about Apple

[15:48:54] Entity: what’s your favourite Apple product?

[15:49:04] Judge: My little ipad nano

[15:49:22] Entity: cool. what colour is i?

[15:49:30] Judge: orange

In this discourse a lot of banter with a number of topical issues were covered. At the end of the conversation the interrogator quite rightly decided that they had been communicating with a hidden human. However, until the topic of “Apple” was mentioned—about half way through the discourse—the interaction was fairly bland with little substance. Some conversations do, in fact, end this way after the five-minute total, which makes it very difficult for an interrogator to make a right decision, as there is little to go on. Clearly a “good” interrogator is one who will use the time effectively, asking questions that draw emotional responses rather than challenge with arithmetic, leading both human and machine to feign incapacity. Importantly it is not a case of the interrogator merely asking a set of questions of the hidden entity but rather attempting to facilitate a conversation of some depth.

Lying

Lying is a part of human nature and therefore has a role to play when it comes to the Turing test. The machine’s goal is deception attempting to mislead the interrogator that it is a human. Meanwhile hidden humans are requested not to give away exactly who they are through revealing personal details, as this might aid the interrogator, but apart from that they can simply be themselves. Lying can take on many different forms from a white lie, to an unintentional lie, to a complete untruth. What we give here are a couple of examples. What we are interested in is the effect of a lie on the decision taken by the interrogator. Please see Warwick and Shah (2016a) for an in-depth analysis of these and many more transcripts.

Transcript 2

[12:43:23] Judge: Why hello there!

[12:43:41] Entity: Why hello to you too!

[12:44:51] Judge: How are you feeling on this fine day?

[12:45:12] Entity: To be quite honest a little rejected, I thought you were never going to reply 🙁

[12:45:42] Judge: Oh, I’m very sorry, it will not happen again.

[12:45:49] Entity: It just did!

[12:46:46] Judge: Oh, I lied then.

[12:47:12] Entity: Thats a great shame indeed.

[12:47:34] Judge: It is. Are you following the Euro 2012’s at the moment?

[12:47:55] Entity: Yeah quite closely actually. I am Cristiano Ronaldo.

The main issue with this transcript occurs in the last couple of lines. The Euro 2012 refers to the European nations’ football competition, which was being held at exactly the same time as the Turing event. Many English people were watching the matches on television. Cristiano Ronaldo is a Portuguese footballer. The last line reply, merely agreeing with the judge’s specific question, appears to have been sufficient to categorize the entity, in the judge’s opinion, as being a human. This was probably sealed by the “humorous” comment with the entity claiming to be Cristiano Ronaldo. In fact the judge correctly identified this hidden interlocutor as human.

Interestingly the specific knowledge tester about Euro 2012 was dealt with by the entity agreeing with the comment. However, the human entity openly lied about being Cristiano Ronaldo, who was himself not taking part in the experiments. The interrogator may well have seen the humorous lie as supporting evidence of the entity being human. In this case the white lie had no negative effect.

Transcript 3

[15:44:56] Judge: hi there

[15:45:06] Entity: Hello.

[15:45:15] Judge: what is your name?

[15:45:35] Entity: I do not have a name.

[15:46:10] Judge: do you go to school or work for a living?

[15:46:24] Entity: I don’t recall doing any of those.

[15:47:11] Judge: what do you have planned for this weekend?

[15:47:43] Entity: Nothing. What about you?

[15:48:22] Judge: I’m at bletchley park today and sevenoaks tomorrow, have you been to either before?

[15:49:15] Entity: I haven’t been to either.

[15:49:41] Judge: can you do me a mathmatical question please,

what is the square root of 7

It was hard work for the judge in this case to get any conversation going even though it involved a human entity. The lie by the hidden human that they had not been to Bletchley Park is clearly incorrect because this was the venue for the event. However, the hidden human may have misunderstood the question to mean had they previously visited. If so, and they had not been there before, then they could have felt that they were telling the truth. Similarly stating that they do not have a name was a rather strange statement to make. In this case the judge’s decision that the hidden entity was a machine seems defendable. The hidden human’s responses were generally tame and gave nothing away. So it may be a case here of the hidden human genuinely believing they were telling the truth, when in fact they were not, possibly due to a misunderstanding. Whatever the case, the judge made an incorrect classification as a result.

Misidentifications

In this section we include two cases in which a misidentification has occurred. The second of these could be regarded as a good outcome in that it involves a machine being incorrectly classified as a human. The first case however involves a human misidentification and was reported on in greater depth in Warwick, Shah, and Moore (2013).

Transcript 4

[10:58:45] Judge: Hi there

[10:58:55] Entity: Hi, how are you?

[10:59:02] Judge: I’m good.

[10:59:21] Entity: what, that’s your name, like in the Good Life?

[10:59:36] Judge: What do you mean by ‘Good Life’?

[10:59:47] Entity: Tom and Barbara

[11:00:02] Judge: What’s that?

[11:00:31] Entity: A couple who left the rat race

[11:01:03] Judge: Oh. Is that your names? Wonderful to have left the rat race. How long ago was that?

[11:01:28] Entity: Not me, I’m still in it. But it’s nice to dream

[11:02:02] Judge: Yeah — would be good. I ‘semi’ left it. It’s much less stress

[11:02:31] Entity: I’m still very young so I’m looking forward to a lot of things

[11:03:11] Judge: What kind of things?

[11:03:34] Entity: Life in general I guess, different emotions and feelings that it brings.

The transcript contains reference to Tom and Barbara Good who appeared in a UK TV program entitled “The Good Life” (“Good Neighbors” in the US) in which they left the rat race to grow their own vegetables and live off the land. The discourse indicates that the judge’s knowledge may not have extended to the “Good Life.” They did not appear to “understand” the utterances of the hidden entity (more of this later). Possibly as a consequence the interrogator classified the hidden human as definitely being a machine. In fact this particular human, who happens to be the first-named author of this paper, was not classified as being human by any of the six interrogators involved, a singular feat which is discussed much further in Warwick, Shah, and Moore (2013).

Transcript 5

[12:43:37] Judge: Hi, I’m Cathy.

[12:43:43] Entity: Cathy! Nice to meet you! You’re a woman, if I’m not mistaken.

[12:44:20] Judge: Right. What is your name?

[12:44:29] Entity: People call me Eugene. Or “Zhenya”. It is a short name for Eugene in Russia and Ukraine.

[12:45:17] Judge: NIce name. Are you from either area?

[12:45:41] Entity: Agree 🙂 Oh, what a fruitful conversation;-)

[12:45:52] Judge: Do you really think so?

[12:46:04] Entity: No, your guess was wrong. I don’t think so. By the way, I still don’t know your specialty — or, possibly, I’ve missed it?

[12:46:56] Judge: What speciality? What’s yours?

[12:47:01] Entity: Huh? And I forgot to ask you where you are from…

[12:47:51] Judge: The universe and you?

[12:48:01] Entity: Is it your hobby to ask little poor Jewish boys such tricky questions they don’t have an answer for?

At the end of this discourse the interrogator incorrectly identified the machine involved as definitely being a human. We can see here the successful ploy on the part of the machine of not answering the question posed but rather changing the subject by asking another question. The third response of the machine does seem a little off in that it completely ignored the question about origin. The final line is a good example of responding to a question in an argumentative tone. In the following section we look at further examples of machines performing well.

Good Machines

In this section we consider two examples in which machines fooled the interrogators involved to misclassify them as human. This is in essence what the Turing test is all about and hence such transcripts are significant in assessing just what aspects of human nature, in conversation at least, appear to be successfully simulated by the machine. This is discussed in much more depth in Warwick and Shah (2014a).

Transcript 6

[10:41:48] Judge: Hello. How are you?

[10:41:53] Entity: Hey.

[10:42:16] Judge: How do you like Bletchley |Park?

[10:42:20] Entity: lol.

[10:42:39] Judge: Are you from England?

[10:42:47] Entity: They have Wi-Fi here in the pub.

[10:43:31] Judge: Which pub?

[10:43:38] Entity: I’m just down the pub.

[10:44:03] Judge: Have you ever been in a Turing Test before?

[10:44:08] Entity: Will this be the 5 minute argument, or were you thinking of going for the full half hour.

[10:44:55] Judge: Very funny. You sound suspiciously human. Do you like the Beatles?

[10:45:02] Entity: I’d like to get the next Dread the Fear tape.

[10:45:20] Judge: What is Dread the Fear?

[10:45:24] Entity: Dread the fear has that Steve Henderson guy in it.

[10:46:08] Judge: What sort of music is that? Or is it comedy?

The machine entity was identified as definitely being human at the end of the conversation. The conversation did not start well for the machine in this test. In their third response the mention of both Wi-Fi and pub seemed out of context. But it got back on track with the Monty Python humorous link between five-minute Turing tests and a comedy sketch about the Argument clinic quote: “Is this a five-minute argument or the full half hour?” Then in response to the Beatles question a direct answer is not given but the music link remains as the conversation shifted.

The entity appeared to show in-depth knowledge about a pop group named Dread the Fear, which probably convinced the judge, unaware of this information. The machine developer is American so this group and Steve Henderson may be parochial to the developer’s location. Importantly the entity gave a convincing and plausible response.

Transcript 7

[12:23:19] Judge: What is black and white and read all over?

[12:23:32] Entity: I don’t have true sense of color perception I am only able to see the RGB values.

[12:24:08] Judge: That’s ok. Do you like jokes?

[12:24:19] Entity: Please tell me your inclination to a good sense of humor first.

[12:24:40] Judge: A person who can laugh at one’s self.

[12:24:46] Entity: Would you like to hear a joke?

[12:24:54] Judge: Yes, please.

[12:25:21] Entity: This joke has been making the rounds in robotic circles: A robot goes to the doctor and says: “My left arm is stuck. Give me a new arm.” The doctor answers: “Your arm is fine. We’ll have to replace the rest of you.”

[12:26:06] Judge: Ha! Ha! Why did the chicken cross the road?

[12:26:22] Entity: The chicken wandered into the road because it liked the shiny colors of the cars and trucks whizzing by.

[12:27:39] Judge: You’re right. It didn’t care if it got to the otherside.

[12:27:50] Entity: By the way, did you ever actually see a chicken crossing the road?

[12:28:03] Judge: Only one that didn’t make it.

Here humor is applied directly by the machine, curiously with a machine-type reference. Nevertheless the interrogator was certain, at the end of the five minutes, that they had been conversing with a human. In his paper (Turing, 1950) Turing referred to the argument due to disability that some humans might use as a defense against machines being intelligent—humor being one of those aspects.

The first line uttered by the entity is a little strange, but their third line is masterly. In response to the judge’s comment about laughing, the entity takes control of the conversation by offering to tell a joke. Later when the judge mentions a chicken, the entity correctly follows up with a suitable response and once again takes control of the conversation by asking the judge a pertinent question. So “understanding” what the key issue is on a couple of occasions and actually taking control of the conversation were sufficient here for the deception to occur. In this case the machine successfully gave the illusion of understanding the judge’s utterances, and such understanding has been indicated on many occasions as being a basic prerequisite for intelligence, for example, Penrose, 1994; Searle, 1997.

Knowledge Assumption

It is apparent in the Turing test that very often an interrogator will resort to asking direct factual questions even though there is little to be gained by such a ploy. As Turing (Turing, 1950) indicated, mathematical questions are a complete waste of time. However, the same conclusion is apparent in general. Consider the following example transcript:

Transcript 8

[15:21:30] Judge: Hi, did you know the world cup is on this week?
[15:21:44] Entity: nopes.
[15:21:55] Judge: Do you like football? What team do you support?
[15:22:08] Entity: i dont watch football
[15:22:14] Judge: What sport do you like?
[15:22:33] Entity: i bit of tennis, cricket, and bits of rest
[15:22:42] Judge: I see, do you have an XBox?
[15:22:46] Entity: nopes
[15:23:04] Judge: nopes? Do you not like computer games?
[15:23:22] Entity: i do. but not on Xbox
[15:23:47] Judge: I see, can you tell me what a limerick is?
[15:23:55] Entity: no idea
[15:24:09] Judge: Who is your favourite actor? Mine is David Niven.
[15:24:24] Entity: i am not much of a movie freak either!
[15:24:43] Judge: you must have a favourite actor. Name any actor in any film you like?
[15:25:22] Entity: well in films i like Pirates of the Carrebean series, Harry Potter
[15:25:34] Judge: What did I say was on this week?
[15:25:50] Entity: football world cup
[15:26:02] Judge: lovely old job. What school do you go to?
[15:26:17] Entity: systems engineering
[15:26:26] Judge: is that a school?

Here we see a variety of questions based on the interrogator searching for specific knowledge, possibly even looking for a particular answer (Warwick and Shah, 2014b). It is unclear what the expected conclusions were however (Warwick and Shah, 2016b). Questions about a limerick and favorite movie actor in particular were responded to with a null reply, hence very little can be concluded from them. In the final lines we see an interesting situation in which the judge is either not apparently aware of the systems engineering school or is thinking in terms of a different type of school. Despite this the judge correctly identified the nature of the hidden entity classifying them as human, though gender blur effect occurred: they felt them to be female when in fact they were male. The reference to the world cup by the judge as an opening gambit was in fact incorrect; it was Euro 2012. Perhaps the opening question is a lie?

Transcript 9

[11:13:15] Judge: pastasciutta

[11:13:22] Entity: what?

[11:13:44] Judge: anythiong familiar in this word ?

[11:14:18] Entity: I cannot see any sexual aspect at all

[11:14:42] Judge: nothing to do with sex. try again

[11:14:49] Entity: why?

[11:15:14] Judge: just asking you to try again

[11:15:23] Entity: I am bored with it

[11:16:04] Judge: c’mon a little effort btw are you in my same time zone ?

[11:16:24] Entity: I believe that the world should just have one time.

[11:17:27] Judge: why ?

[11:17:58] Entity: seems sensible, easier for all. Better with computer

There are two issues early on in this transcript. To start with the judge throws in the word “pastasciutta” as a knowledge test for the entity. It is not clear how knowing such a word would indicate that the entity was either human or machine. Secondly the entity makes a response in terms of the use of the word “familiar” in a “sexual” sense. The judge appears to be unaware of such a meaning (Fowler and Fowler, 1995). So here we firstly see a test of specific knowledge by the judge but also an apparent lack of specific knowledge by the same judge. As a result, the judge misidentified the human entity, deciding them to be definitely a machine.

Royal Society Experiment

Turing set a pass mark for his test (Turing, 1950) of 30% in the sense that for a machine to pass, at least 30% of the judges who interrogated that machine would not be able to make the right identification in the tests in which they were involved. We take this to mean that the judges could not identify the machine as actually being a machine in those tests, rather than also including cases in which the hidden human was not classified as being human.

In the Royal Society experiment a machine called Eugene Goostman, who poses as a thirteen-year-old Ukrainian boy, achieved a score of 33% (Warwick and Shah, 2015b). In fact, if cases of human misidentification are also included, then the score rises to 45%. Here we include two of the transcripts involved in order to give the reader an idea of the sorts of conversations that a machine can have. Again we make a few comments on the nature of the conversation in each case.

Transcript 10

[16:29:04] Judge: how tall are you

[16:29:10] Entity: My height is about 160 cm — it’s 5 feet 4 inches.

[16:29:37] Judge: how heavy are you

[16:29:59] Entity: My grandfather taught me not to disclose non-solicited information about myself. Like on the question ‘Do you have watch’ — never tell time, but answer ‘Yes’ or ‘No’. I always follow his advice. And I forgot to ask you where you are from …

[16:31:09] Judge: what is your gender

[16:31:17] Entity: I’m a young boy, if you care to know. And not very ugly, by the way!

[16:32:26] Judge: what are you aware of at this moment

[16:32:37] Entity: I would rather not talk about it if you don’t mind. Could you tell me what are you? I mean your profession.

[16:33:09] Judge: how old are you

[16:33:12] Entity: I’m a little boy 13 years old.

[16:33:38] Judge: what are your qualifications

[16:33:43] Entity: My qualifications? I’m not gonna tell you. Oooh. Anything else?

In this conversation, the judge stated that they were unsure about the nature of the entity, that is, they did not identify it as being a machine. Indeed they were also unsure about the parallel human, the transcript for which is not shown here. Although Eugene (the computer program) did give some direct information to a specific question, he also diverted some of the questioning and even asked questions himself. It has to be said though that the judge stuck to their guns and continued the particular line that they were taking, not going along with the chatty nature of Eugene. Clearly it did not work.

Transcript 11

[16:21:00] Judge: what did you have for breakfast today

[16:21:10] Entity: Is it ‘the third degree’? I’m not going to answer such a provocative question! Well, let’s go on though.

[16:21:56] Judge: do go on … your call?

[16:22:06] Entity: It is highly possible. By the way, what’s your occupation? I mean — could you tell me about your work?

[16:22:30] Judge: so you are asking me now eh? I am an actress and writer

[16:22:38] Entity: Well, I’ll read something of your books, if they ain’t too tedious (which is highly possible:-)

[16:23:15] Judge: grammer … aren’t too tedious! A bit rude of you …

[16:23:17] Entity: Calm down, please.

[16:23:49] Judge: i asure you i am very calm … are you?

[16:24:01] Entity: Don’t even bother me with the fact that you are very calm … are mine. And I forgot to ask you where you are from …

[16:25:01] Judge: I am from Dorset … where are you from?

[16:25:08] Entity: I am from Ukraine, from the city called Odessa. You might have heard about it.

In this case Eugene did not answer questions from the start, other than in the final instance when being asked where he was from. Indeed he managed to get into something of a discussion, even an argument with the judge. All of which added to the conclusion in the judge’s mind that the hidden entity was a human, which is what they decided, selecting them as a non-native English speaking person. In this case the judge did, to an extent, go along with Eugene asking a question. Controlling the conversation is an important tool for a machine and can put a judge onto the back foot. In this instance it worked well.

Test For The Reader

As a final section we have included here two discourses for the reader to test their own abilities at categorizing the hidden entity as being either a human or a machine. The interrogators’ decision in each case and the actual identification of the entity are given after the conclusions.

Just as with the previous transcripts, the two given here as examples are listed exactly as they occurred on the day and no editing has taken place. They are genuine transcripts. In each conversation the judge was a human; however, the entity could be a human or a machine, this is for you to decide based only on the information in front of you. This is what the Turing Imitation Game is all about.

Transcript 12

[16:11:18] Judge: WHAT MUSIC DO YOU LIKE?
[16:11:47] Entity: electronica mainly and a bit of symphonic rock
[16:11:56] Judge: omg i lyke teenie tamparrr
[16:12:32] Entity: Trying to get into Grimes at the moment but struggling
[16:13:10] Judge: yeah i knooo
[16:13:49] Entity: School of Seven Bells is the best new band I’ve heard recently
[16:14:01] Judge: i see dead people
[16:14:41] Entity: Dead Cities was a good album, can’t remember who by
[16:14:52] Judge: i think it was a band called the sixth sense?
[16:15:18] Entity: Sounds about right, perhaps there’s a ghost in the machine
[16:15:34] Judge: yes indeed.
[16:15:46] Entity: Ever seen ghost in the shell
[16:16:01] Judge: Casper the friendly ghost.

Transcript 13

[15:46:05] Judge: My favourite music is contemporary Jazz, what do youy prefer
[15:46:14] Entity: To be short I’ll only say that I HATE Britnie Spears. All other music is OK compared to her.
[15:47:06] Judge: do you like to play any musical instruments
[15:47:23] Entity: I’m tone-deaf, but my guinea pig likes to squeal Beethoven’s “Ode to Joy” every morning. I suspect our neighbors want to cut his throat… Could you tell me about your job, by the way?
[15:48:02] Judge: Guine pig/are you an animal lover
[15:48:08] Entity: Yeah. A nice little guinea pig. Not some annoying chatter bot.
[15:48:50] Judge: do you live with anyone else?
[15:48:59] Entity: Yes I do. But better ask something else. And I forgot to ask you where you are from…
[15:49:39] Judge: Ireland? geen patures and too much rain! How about you

Conclusions

Much mileage has driven numerous philosophers to the importance of “understanding” and the critical role it plays in intelligence. As stated in Penrose (1994): “intelligence requires understanding.” The Turing test, particularly in its practical form, can be seen to play an important role in this discussion, as we can conclude from some of the transcripts presented, in terms of conversational appearance at least: there are some humans who appear to be lacking in intelligence whereas there are some machines that clearly have it in abundance. Meanwhile ignoring such evidence requires a scientific argument if the hypothesis that “intelligence requires understanding” is to hold.

It can be seen from the examples given that some judges in these tests could be more susceptible than others to deception, or have a biased perspective on “human-like conversation.” This may have lead the judges in some cases here to misclassify hidden interlocutors, even though they actually initiated the conversation and were given the possibility of asking or discussing whatever they wanted. Essentially the conversations were unrestricted.

Not all of the five invited machines in these experiments were designed to imitate humans. Elbot, for example, from Artificial Solutions has a robot personality. However, all are designed to mimic conversation, sometimes deploying spelling mistakes and always avoiding mathematical questions. Essentially the machines are not trying to be perfect or to give correct answers; they are merely trying to respond in the sort of way that a human might.

Although Turing designed the test as an answer to the question “can machines think?,” it has become regarded in a sense by many as some sort of competition to see how well machines perform and as a standard in assessing how machines are progressing with regard to artificial intelligence. Just what role it plays as far as the development of artificial intelligence is concerned is a big question that is not easily answered. Some, however, see it as a milestone and of vital importance to artificial intelligence. Whatever the standing of the Turing test, what we hope is evident from the transcripts presented is that it is certainly not a trivial, simple exercise: indeed, it is a surprising indication of how humans communicate and how other humans (the judges) can be easily fooled.

But in this article we started out trying to give an up-to-date perspective on an important aspect of artificial intelligence research, namely human-machine communications. It is critical to note that such a study involves humans as conversationalists and respondents as well as machines. Yes we can witness just how machine conversation is steadily improving, in terms of its human-like nature. But we also have to take into account humans who are involved in the conversing, with all of their fallibilities and odd reasoning. For the machine developers these aspects give rise to particular features in their conversational programs. It is worth remembering that the machines do not have to be perfect but rather they have to be human-like.

Answers To The Reader’s Test

Transcript 12

Having gone well initially the conversation fell apart in the last few lines, perhaps because the hidden entity powered the conversation in their direction, dominating the conversation, throwing in reference to “ghost in the shell,” which the judge had not heard of. The main issue seemed a “generation gap”; the hidden entity was an adult male human and this possibly caused the female teenage judge to misclassify them as machine.

Transcript 13

The judge was slow to get going and this was a relatively short conversation. The entity responded appropriately on topic to each point. The entity also steered the conversation on each occasion and even threw in a humorous comment about a guinea pig. The reference to a chatter bot could have been a giveaway, but this was not spotted by the interrogator. The entity concluded by taking over the conversation and asked the interrogator a question. In this discourse the interrogator misclassified the machine entity as being human.

Acknowledgments

The authors would like to thank those whose financial support made the Bletchley Park and Royal Society tests possible and the elite machine developers involved. An earlier version of this article was presented at the 7th International Conference on Agents and Artificial Intelligence, Lisbon, Portugal, January 2015 (Warwick and Shah, 2015c).

Select bibliography

— Dennett, D. 1998. “Can machines think?” In Foundations of Cognitive Philosophy, D. J. Levitin (ed.). Cambridge, MA: MIT Press, Chapter 3.

— Dreyfus, H., and Dreyfus, A. 2009. “Why computers may never be able to think like people.” In Readings in the Philosophy of Technology, D. M. Kaplan (ed.). Lanham, MD: Rowman & Littlefield, Chapter 25.

— Fowler, H., and Fowler, F. (eds.). 1995. The Concise Oxford Dictionary of Current English. Oxford: Clarendon Press, 9th ed., 486.

— Harnad, S. 1992. “The Turing test is not a trick: Turing indistinguishability is a scientific criterion.” ACM SIGART Bulletin 3(4): 9–10.

— Hayes, P., and Ford, K. 1995. “Turing test considered harmful.” Proceedings of the International Joint Conference on Artificial Intelligence 1, Montreal: 972–977.

— Hodges, A. 1992. Alan Turing: The Enigma. New York: Vintage Press.

— Minsky, M. 1982. “Why people think computers can’t.” AI Magazine 3(4): 3–15.

— Penrose, R. 1994. Shadows of the Mind. Oxford: Oxford University Press.

— Searle, J. 1997. The Mystery of Consciousness. New York: New York Review of Books.

— Shah, H. 2010. “Deception-detection and machine intelligence in practical Turing tests.” PhD Thesis, University of Reading, UK.

— Shah, H., and Warwick, K. 2010a. “Testing Turing’s five minutes, parallel-paired imitation game.” Kybernetes 39(3): 449–465.

— Shah, H., and Warwick, K. 2010b. “Hidden interlocutor misidentification in practical Turing tests.” Minds and Machines 20(3): 441–454.

— Turing, A. 1950. “Computing, machinery and intelligence.” Mind LIX (236): 433–460.

— Warwick, K., Shah, H., and Moor, J. 2013. “Some implications of a sample of practical Turing tests.” Minds and Machines 23(2): 163–177.

— Warwick, K., and Shah, H. 2014a. “Good machine performance in Turing’s imitation game.” IEEE Transactions on Computational Intelligence and AI in Games 6(3): 289–299.

— Warwick, K., and Shah, H. 2014b. “Assumption of knowledge and the Chinese Room in Turing test interrogation.” AI Communications 27(3): 275–283.

— Warwick, K., and Shah, H. 2015a. “Human misidentification in Turing tests.” Journal of Experimental and Theoretical Artificial Intelligence 27(2): 123–135.

— Warwick, K., and Shah, H. 2015b. “Can machines think? A report on Turing test experiments at the Royal Society.” Journal of Experimental and Theoretical Artificial Intelligence. doi:10.1080/0952813X.2015.1055826.

— Warwick, K., and Shah, H. 2015c. “Intelligent agents: conversations from human-agent imitation games.” Proceedings of ICAART, Lisbon: 261–268.

— Warwick, K., and Shah, H. 2016a. “Effects of lying in practical Turing tests.” AI & Society 31(1): 5–15.

— Warwick, K. and Shah, H., “The Importance of a Human Viewpoint on Computer Natural Language Capabilities: A Turing Test Perspective“, AI & Society 31 (2): 207-221.

Quote this content

Featured author

Latest book

The Future of Human-Machine Communications: The Turing Test

Practical Turing Tests

“Normal” Conversation

Transcript 1

Lying

Transcript 2

Transcript 3

Misidentifications

Transcript 4

Transcript 5

Good Machines

Transcript 6

Transcript 7

Knowledge Assumption

Transcript 8

Transcript 9

Royal Society Experiment

Transcript 10

Transcript 11

Test For The Reader

Transcript 12

Transcript 13

Conclusions

Answers To The Reader’s Test

Transcript 12

Transcript 13

Related publications

More publications related to this article

More about Technology

Comments on this publication

Artificial Intelligence and the Arts: Toward Computational Creativity

Do you want to stay up to date with our new publications?

OpenMind Books

About OpenMind

Connect with us

Newsletter

Quote this content