Created by Materia for OpenMind Recommended by Materia
Start AI Reading Comprehension Systems – the Problems and Progress
14 December 2021

AI Reading Comprehension Systems – the Problems and Progress

Estimated reading time Time 5 to read

Reading comprehension is a skill we learn during our early school years. It’s a skill that enables us to understand the meaning of something that we read. It’s also recognised as important for AI systems, since they can engage better with users when they can comprehend the meaning or intentions of users. For example, search engines can deliver more concise and better answers if they can comprehend the meaning of user queries. Search engines usually answer queries by showing lists of Web sites that are ranked according to their perceived importance. However, these lists often contain many irrelevant references unless the intended meaning of the user is fully understood. Accessing precise information with its intended meaning is crucial to successful systems. There are many other uses of reading comprehension systems, such as using chatbots, virtual agents, and reading road signs in autonomous cars.

BBVA-OpenMind-Keith darlington-AI Reading Comprehension Systems-jerry-wang
Reading comprehension is a skill that enables us to understand the meaning of something that we read

There are some moderately successful AI reading comprehension applications in current use. In this article, I give some brief insights into how they work and describe some benchmark tests in use – some of which claim performance levels exceeding human capabilities. Many question these claims since current AI systems still fall short on semantic understanding.

The chasm between AI and human understanding

Reading comprehension poses a formidable challenge for the competency of AI systems because they exemplify the chasm between humans and AI: a lack of understanding. This inability for AI systems to understand as humans do is a difference that some say is irrevocable because the machine will never understand semantics and human intentions in the same way humans do. However, while it may be true that AI systems do not understand the meaning of language as humans do, that does not preclude them from simulating tasks that achieve certain levels of understanding. For example, suppose I said that my friend could run the 100 metres in less than 10 seconds. We would infer from this statement that my friend is a good athlete. It may be tempting to believe that the machine would need to have a similar understanding of such life events to draw the same conclusion. But answering this type of question is not beyond the capabilities of AI systems because this type of knowledge describing relationships between attainment and achievement level could be encoded and, therefore, inferences made that reflect some forms of human understanding. Tasks of this kind could be implemented, if a relatively small subset of natural language is used in a specific domain – such as chatbot sales assistants.

How AI reading comprehension systems work

Most reading comprehension AI systems work by reading queries, comprehending, and providing answers. The user would ask questions about written sections of text in a particular document (or perhaps a search of the World Wide Web) with answers given in a presentable concise format.  There are many commercially available AI systems that can read and comprehend text at various levels of ability. Well known examples include Alexa and Siri. In the case of Alexa, a user can ask a question like: “Alexa, for how long was Lloyd George, UK Prime Minister?”. Alexa might reply with an answer like: “Lloyd George was Prime Minister for five years and ten months.” This is one of the simplest types of AI reading comprehension task because Alexa is merely extracting the relevant sections of text. The text may be read from Wikipedia documents that are related to Lloyd George and presented in in a concisely re-arranged format.  This type of reading comprehension is called knowledge extraction and does not require a great deal of language understanding. Other Web based systems use variations using written queries. For example,  Microsoft use a Web system that allows the input of a document in one pane, the question to be asked in another and then comprehension in the third window pane on the same screen. Again, this is done mainly using knowledge extraction.

BBVA-OpenMind-Keith darlington-AI Reading Comprehension Systems-web system

But for other queries, understanding may become a necessary pre-requisite in as much as the meaning of the sentence may be unclear. For example, consider the following statement:

 The drill doesn’t fit into the box because it is too big.

What does “it” refer to in this statement? Most human readers would assume that the object of reference here is the drill from our common sense perception of these objects.

 The drill doesn’t fit into the box because it is too small.     

But what about this statement, what does “it” refer to here? The only word that has changed from one sentence to the other is the word big to small at the end of the sentence. In this case, we would infer from our common sense understanding that “it” refers to the box.

However, this is a difficult decision for an AI system to make without common-sense world knowledge particularly understanding the concept of size – i.e., the difference between big and small.

As another example of the problems AI systems have in interpreting natural language consider the ambiguity in the sentence as shown below:

We saw her duck.

This sentence could mean that the writer saw the duck belonging to the woman, or it could mean that the writer saw a woman duck from being hit by an object hurled in her direction, or it could even mean that we choose to saw (i.e., use a cutting saw) her duck in some parts. Humans would be able to answer this from the context from which the sentence is used. For example, if it were taken from a paragraph that included a previous sentence referring to an object thrown in towards the woman, then we would conclude its meaning as we saw her duck from an object thrown towards her.  To avoid ambiguity, AI reading comprehension systems would similarly understand intended meaning from the context.   

There are many other difficulties with AI comprehension including the use of aphorisms, metaphor, and the subtle understanding of a writer’s intentions – especially when interpreting prose and poetry. Other problems include identifying coded criticisms of others, humour, and much more. Humans are much better prepared than machines for the formidable challenges posed by communication in natural language. This was understood in the 1960s when AI research into language translation projects began. In the early days of this research, the approaches used rule-based AI systems to build them, such as using rules of the use of nouns, verbs, and so on.  This approach worked well on understanding the structure of sentences (i.e., the grammar), but failed with the semantics (i.e., the meaning of a sentence).

BBVA-OpenMind-Keith darlington-AI reading-olia-gozha
Humans can answer questions from the context

The systems implemented nowadays use deep learning. In such systems the learning takes place by using hundreds of thousands of paragraphs (usually from Wikipedia). A paragraph along with the questions are given as input and the output gives the deep learning network prediction of the answer. The deep learning approach outperformed humans in some comprehension tasks according to some tests that have been developed for measuring the effectiveness of AI reading comprehension systems. Another success came in 2019, when Alibaba, a Chinese AI company, outperformed humans when tested on a dataset developed by Microsoft.


Some tests have been developed for measuring AI reading comprehension performance. One test, that has become the de-facto standard, is called the SQuAD (Stanford Question Answering Dataset) test. The test has its origins in Stanford University, California and works by using paragraphs taken from Wikipedia articles. Each of these articles contains questions that are answered by paid human workers – called Mechanical Turk workers.  AI systems can then be tested against the answers of these questions and compared to human or other AI reading comprehension systems. 


AI systems are improving in reading comprehension but still lack the level of semantic understanding required to perform robustly. As with many deep learning algorithms, they work well but show signs of instability when deviating beyond their trained data sets. At present, they have some way to go before they can get close to a human level of competence. Nevertheless, they are time saving and easy to use, particularly when there is a need to comprehend sections in lengthy documents and are, like many machine learning applications, constantly improving.

By Dr Keith Darlington

Comments on this publication

Name cannot be empty
Write a comment here…* (500 words maximum)
This field cannot be empty, Please enter your comment.
*Your comment will be reviewed before being published
Captcha must be solved