HomeTechHow Machine-Learning can help students to tackle academic literature

How Machine-Learning can help students to tackle academic literature


Most college students today would agree that keeping on top of their reading list is challenging. Developing the skills to critically read and analyse academic literature is crucial to students’ success, but many get overwhelmed, not just with the volume of reading but with the complexity of that reading material. Techniques such as skim-reading, when done effectively, can help students get some insight into a new text, but many risk falling into the trap of diving straight into cover-to-cover reading, only to giving up in frustration quickly.

This long-standing challenge is now starting to be tackled with machine learning (ML). The capability now exists for research papers and other lengthy academic texts to be mined for information pivotal to understanding, and presented to students in a more accessible, non-linear way.

But first, let’s dive a little deeper into the challenges with current undergraduate research methods.

Information anxiety

The ‘information anxiety’1 that many students experience at college or university can have a very real impact on their academic performance. Developing strong research and analytic skills is vital, but the way in which reading is approached can influence the acquisition of knowledge and overall understanding of a subject.

Some common problems are:

1.   Finding reliable, relevant primary and secondary sources

Gen Z students have grown up finding information through internet searches and while that has its benefits when it comes to acquiring knowledge, there are obvious pitfalls in the form of fake or unreliable information. The first port of call for any student should be the reading list provided by their course tutor (which will also include wider reading titles), but at some point, they will also need to conduct sustained, independent research, for their dissertation for example. This is where the volume of freely available information becomes a double-edged sword: access to academic literature is greater than ever but locating the most useful, relevant, and reliable material is increasingly difficult.

Recent estimates2 suggest that as many as 2 million research papers are published every year, and the number is only expected to grow. The challenge then for students is to critically screen the growing volume of available literature and discern not only what is most relevant to their research, but also what is most reliable.

2.   Distilling papers and chapters into meaningful knowledge

Even when they’ve identified and downloaded relevant research papers, students are faced with the daunting task of converting it into actionable knowledge. Screening the abstract and skim-reading the full text can be a starting point here but a lot of crucial information about methodology, findings, limitations, and future work can be missed using this approach. Ultimately, the goal for students at this stage should be to know enough about a study or piece of research to be able to concisely articulate it, and its implications, in their own words – the acid test of understanding.

3.   Identifying and analysing significant statistics

Locating and interpreting relevant statistical and technical information in research literature can also prove tricky when you’re trying to screen lots of texts to find the most valuable ones. Results will rarely be covered comprehensively in the abstract and simply jumping to the results section can leave you without important context, which could completely alter your interpretation of the findings of a study. While it can be useful to tackle the full text section by section, sometimes in a non-linear way, it’s crucial to consider each part in the context of its whole to avoid gaps in knowledge.

How can machine learning help?

Reading technology that employs machine learning (ML) and natural language processing (NLP) is ‘trained’ to identify patterns in sequences of words and phrases from vast amounts of text. This training results in something called a language model which embeds knowledge in the form of word and phrase probabilities. This model is then ‘fine-tuned’ to use that knowledge for a particular task. For example, a summarization model might have been fine-tuned to learn how how sentences in a research paper map to sentences in its abstract in terms of a probability distribution. The model will use this probability distribution to generate a concise summary from a previously unseen paper.

A summarization model can be integrated with other models that have been trained to identify key terms, mentions of study participants, key methods, significant findings, and statements that discuss how the results compare with previous work. Such a combination of models provides more than a simple summary – it also highlights actionable knowledge such as:

The key result

A single-sentence summary that encapsulates the main takeaway message, in plain language, with, for example, complex terms defined and explained.

‘What’, ‘who’, and ‘why’

What research was done, with whom, what was found and why are the results important? How many participants were selected and from which larger sample? How many dropped out or were excluded? Which statistical methods were used to analyse the results? Some of this should be in the abstract, but machine-learning can highlight this information in the body of the paper instantly without having to search for it.

Reliability and validity

How does the study build on methods used previously? How do the results compare with what was previously known? Do they improve on them, confirm them, or differ from them? What were the limitations, and what needs to be done next? A good paper should always include this information – but with machine learning assisted reading, you can jump straight to it, rather than having to scan through many pages to find if it is there.

Machine-learning assisted reading is the future

With over 2 million papers being published each year, it is becoming increasingly challenging for college students to keep up to date with relevant and citable research. Summarization technology can help solve this problem by presenting the most important information as bite-sized knowledge in context.

With Scholarcy, students and researchers can harness the power of machine learning to turn their reading lists and research collections into summary flashcards that highlight the core concepts, key learning points, and critical analyses. This will help them to identify and retain the key facts, faster.


  1. Bawden, D. and Robinson, L. (2009) ‘The dark side of information: overload, anxiety and other paradoxes and pathologies’, Journal of Information Science, 35(2).

Available at : https://journals.sagepub.com/doi/abs/10.1177/0165551508095781

2.    Altbach, P. and de Wit, H., 2021. Too much academic research is being published. University World News. Available at: https://www.universityworldnews.com/post.php?story=20180905095203579