1

Case Study:

Large Generic Company Switching from Manual to Automated Literature Screening

Martti Ahtola | March 12, 2025

Symbolic illustration of automated literature review

In this case study, we look at specific aspects of a situation where a large generic company switched from manual literature screening to Tepsivo Literature.

But to be clear at the very beginning: Do not expect this article to be just another promotional material about our automated solution. In fact, what we want to share with you today is something quite different to simply listing benefits of the technology. Instead, let’s have a look at real customer concerns that arised from thorough assessment / daily use of the service and how we had to address, analyse and explain those in the past years.

The use case of the generic company, one of our very important customers who had several concerns, questions and observations, is a good example of the most common pain points that companies might face when it comes to automation of literature screening. For this reason, we feel that this customer case and its learnings can be an insightful read for anyone who wants to get better understanding of the implications that the use of automation has for this pharmacovigilance activity. 

A lot of valid questions

Generally speaking, whenever a pharmaceutical company updates their processes, starts using new tools, or experiences a change in staff performing the activities, it is natural that there are questions before, during and after the change.

No surprise this is also the case when a pharmaceutical company changes from manual literature screening to automated literature screening – there can be several questions that the product safety department would want to know beforehand, and once the automated screening has started, there is often a set of new questions.

The enquiries are usually related to:

🧩 The journals that will be covered in the automated screening

🧩 Keywords that will be used in the search

🧩 Safety assessment process

🧩 Why certain articles were not identified

🧩 Why there are fewer (or more) safety reports than before

and many more…

These are all valid questions, and they are the real pain points when switching to a new way of medical literature monitoring and assessing safety data in the articles.

Often, the concerns have more to do with the way things used to be done, rather than with the automation. Sometimes the concerns are linked to an actual issue with automation, for example detection of certain keywords.

In any case, these all touch topics that we know are very important and need to be addressed with care – which not only means to explain them to the customer in detail where a clear answer is available, but also challenge the automation process if the customer makes point that potentially asks for an ad-hoc test or analysis.

And time to time, such situation happens.

A suspicious drop in the number of safety reports

Tepsivo has been automating literature monitoring since 2021, and ever since the beginning, we have had internal and external questions and concerns about the functionality of the automation technology.

There were concerns about the daily screening automation stopping without us noticing it, there were concerns about structural changes at the source (website structure changing) which prevent us from receiving data, and there were concerns about picking up the drug safety relevant information in the large amounts of data scraped by the automation.

With those concerns assessed and addressed with corresponding control mechanisms, we recently performed an assessment for a generic company using our literature monitoring and literature review tool, Tepsivo Literature, whose main concern was that our solution may be missing articles that contain safety relevant information.

This is because the company was used to receiving large numbers of safety reports from the countries that were now covered by Tepsivo Literature, but now the number of safety reports had dropped significantly, even as the number of identified articles with potentially safety relevant data was large, much larger than it had been for the manual literature screening.

The assessment plan outline

Together with the customer we agreed on a plan to assess:

1) the safety reports they had received previously from the manual local literature screening,

2) the current automated literature screening process,

3) the safety assessment process, service which was also covered by Tepsivo

Before this particular plan we had already assessed the journals and other sources that the customer was screening manually before switching to Tepsivo Literature.

During this process, we reported which sources could be screened automatically, which journals were obsolete, and which could not be automatically screened (either because they were published as hard copies only or the journals were non-scientific, commercial-driven publications that allowed access to their article information only for paying subscribers).

We performed this analysis for two main reasons:

1) Our customer asked us to perform it, and we want to keep our customers happy.

2) While there have been unnecessary concerns about automation since the late 18th century and beginning of industrial revolution, there are also relevant concerns.

We wanted to demonstrate with this analysis that automated literature screening is reliable and provides better quality results than manual literature monitoring.

Full compliance, insightful learnings

Before diving deeper into the individual lessons learned, in summary the result of the analysis was to continue with Tepsivo Literature and the process as it had been done prior. No changes were necessary.

Of course, this might not be a surprising result, given that we developed the solution with the goal of providing significantly better alternative to manual screening. Tepsivo’s search strategy is in full compliance with the EU legislation and EU GVP, concentrating on finding matches for the brand names, active substances and related safety terms. Therefore, rather than questioning the functionality itself, the interesting part about the analysis was revealing the case-specific explanations answering the client doubts. 

The below can be helpful to understand what, particularly well-established, companies may be concerned with having moved from a traditional way of doing things to a modern-day approach.

Number of hits and safety relevant articles

The main challenge with this analysis was that the main data sets to compare were not really comparable.

You cannot compare the number of safety hits in journal X from 2022 to 2023 and expect them to be the same. In addition to that, the actual list of sources being screened had also been updated, because Tepsivo Literature is able to cover much larger set of sources.

Comparing apples to oranges

The variables that had changed:

1) Dataset

 – Articles from 2022 vs. articles from 2023

 – Different list of sources

2) Keywords

3) Monitoring process

4) Safety assessment process

In general, we should not expect that the past predicts the future. In some cases, the past data might give some indications to what can be expected from the future, but in this case, there were too many variables that had changed to make a reliable comparison.

Simple workaround

So instead, we decided to assess the past data from the perspective of current processes and created artificial data set that would mimic data which would be found by manual search to see what articles and safety data would have been identified during 2023.

The safety assessment process for detecting safety relevant articles during the manual literature screening and later in the automated literature screening varied and this was the main reason behind the difference in number of safety reports.

The customer had picked up several articles as safety relevant that would not be marked as safety relevant by Tepsivo Literature automation nor during the safety assessment by our PV Specialist.

In short, the main reason for the difference in the number of safety relevant articles was the fact that the first layer of literature screening, titles and abstracts of the articles, were missing the relevant keywords and sentences that would indicate that product related safety relevant information is discussed within the article.

There are a couple of different sides to this one main reason: picking up articles that contain information about something else than standard drug safety relevant terminology (COVID-19) and picking up articles trough laborious and non-harmonized manual reading processes, where the reviewer reads through full journals published locally.

Customer’s safety assessment process

To compare Tepsivo’s Safety Assessment process to the Customer’s Safety Process, we reviewed 25 safety reports from a list provided by the customer.

These 25 cases were all literature cases based on articles from journals that had previously been covered by a manual literature screening and now the local literature screening for these countries was automated by Tepsivo Literature.

Non-standard ways to identify relevant articles

Some of the 25 safety reports provided by the customer were from articles related to COVID-19. These articles did not meet the standard requirements for drug safety reports for the customer’s products based on the title and abstract.

We assumed that the company had had special rules related to “COVID-19” that had overridden the standard rules for identifying safety relevant information. COVID-19 related safety reports might, or might not be relevant to periodic pharmacovigilance assessment in the future.

In general, COVID-19 multiplied the number of adverse event reports sent to the local regulatory authorities in many countries and now large portions of the adverse event reports in central pharmacovigilance databases such as FAERS and EudraVigilance are flooded with COVID-19 reports.

Many of the example reports provided by the customer contained the contextual keyword “case report” without any product related keywords in the title or abstract. This means that “case report” had also overridden the basic rules of identification of potential safety relevant articles and had warranted a full article review.

It was noticed during the assessment that some of the abstracts contained terms that were most likely detected by a PV Specialist reading the text (and not by keyword search) as potentially safety relevant, but would not be part of a standardized search string.

Terms like this are for example specific adverse event terms, alternative terms for drugs, drug groups or treatments. In standardized string based keyword search, adverse events are searched with terms such as “adverse event” or “toxicity” but not necessarily with specific terms such as “headache” and “pain”. Products can be referred to with other terms than brand name or active pharmaceutical ingredient. For example terms such as “pain relievers” or “antibiotics” are used instead of “paracetamol”.

If these reports have been identified from the literature with different standards compared to “normal” cases, it should be noted during the review of literature in signal detection and preparation of aggregate safety reports.

There is safety relevant information in the article, but it is possible to find in the full article only

As per our assessment of those articles including the safety relevant information, about 90% of them did not contain relevant keywords in the title or abstract that would have suggested that further analysis of the full text article for safety purposes would be required.

This highlights one issue with manual local literature screening outside of databases: there are a variety of ways how the search is performed on the articles, whereas “global literature searches” are performed in one or more databases based on the titles and abstracts.

This does not mean that the articles would not contain safety data nor does it mean that reading full articles of local journals is necessarily a bad thing, but it means that there are different standards for identifying safety relevant articles in “local literature”, meaning that higher emphasis is put on articles from lower quality journals (not providing their article data to scientific community) and articles that discuss other topic than safety-risk profile of a medicinal product or substance (articles that do not mention safety topics in title or abstract).

In the traditional manual local literature screening, more safety data is being reported from a smaller lower quality dataset compared to the more organized ways of searching safety data from literature.

There might be safety relevant information in the article, but it could not be confirmed

We could not review all of the 25 article titles and abstracts fully because it was either unclear to what article the hit was referring to (article and journal not found in Google search) or the text was unavailable because it was presented in a webinar or in a poster in a conference, and not available to people outside of these individual events.

This issue can be of course avoided by attaching the source material in the safety reports, so it might not always be such a big issue, but if the journal or article cannot be found by Google search, it should ring some alarm bells.

Does a journal that’s not found with a Google search (or other major search engine or large language model) seem like a trustworthy source to make decisions related to your product?

Journal and article coverage

Not everything can be screened with automation. Some non-scientific journals actively try to prevent their articles from being read by non-paying people and automation.

And this is fine. Not everything needs to be monitored to meet the legal requirements, nor should they be monitored. We guide our customers to monitor journals that provide their article information to the scientific community in order to avoid low quality data that can easily ruin data sets. We will look at these different cases of “low-quality” literature sources below in more detail.

We received from our customer a list of safety reports they had tracked before starting to use Tepsivo Literature. The reports were for two specific countries where a specially high number of safety reports had been generated during the time before Tepsivo Literature was implemented. The main concern was the low number of valid safety reports identified by Tepsivo Literature the following year in these two countries.

Journal coverage

During the review we identified that the list contained 109 safety reports from 95 unique articles. These 95 unique articles were from 48 different journals. 34 journals (71%) of these 48 journals were at that point covered by Tepsivo Literature.

The remaining 14 journals were then investigated to identify reasons why they were not covered by Tepsivo Literature and whether they could be added to the automation. The results of this investigation are described below.

It should be highlighted that while at the time when we looked at the source list that the generic pharmaceutical company had been screening manually before Tepsivo Literature automatically searched only about two thirds of the journals our customer previously identified “case reports” in, the total number of relevant journals covered by Tepsivo Literature in these two countries was much higher (in multiples of 10 or 100) than before when the local literature was screened manually.

Not only was much larger number of journals covered, but in general these were journals of much higher quality.

Article coverage

Of the 95 articles that had resulted in the creation of one or more safety reports at the customer’s pharmacovigilance department, 62 articles (65 %) were from journals that were covered by the latest version of Tepsivo Literature, couple of the articles were from journals covered by EMA’s MLM and others were from sources not covered by Tepsivo Literature nor MLM.

Tepsivo Literature covers all or most of the articles covered by EMA’s MLM, but “global” literature screening was not in scope for this generic customer.

Assessment of journals not covered by Tepsivo Literature

Not everything can be screened with automation. Some non-scientific journals actively try to prevent their articles from being read by non-paying people and automation.

And this is fine. Not everything needs to be monitored to meet the legal requirements, nor should they be monitored. We guide our customers to monitor journals that provide their article information to the scientific community in order to avoid low quality data that can easily ruin data sets. We will look at these different cases of “low-quality” literature sources below in more detail.

We received from our customer a list of safety reports they had tracked before starting to use Tepsivo Literature. The reports were for two specific countries where a specially high number of safety reports had been generated during the time before Tepsivo Literature was implemented. The main concern was the low number of valid safety reports identified by Tepsivo Literature the following year in these two countries.

Journal is behind a paywall

With paywall, we refer to publications that do not share their contents in any way or in a very limited way, for example titles, table of contents or the cover, with the public. The information about the contents of the articles is available only for paying subscribers.

It is important to note that this is completely different from paid articles, which is a common practice in scientific journals. Most or all paid scientific journals provide the abstracts of their articles on their website and to databases. This way the most important content of the articles is available for review and commenting for the scientific community.

If the journal’s contents are fully behind a paywall, it should not matter if the journal claims to be “scientific”, “peer-reviewed” or “for specialists.” If the abstracts are not available, it is impossible to judge the quality and reliability of the content.

Many of these kinds of journals are membership journals of local associations (pharmacist association, national association of doctors) and the journals are a membership benefit. In other situations there might be purely economical reasons motivating the publisher.

Setting up an automated search for a journal with paywall is technically possible, but it can be difficult. Even if for some reason we would decide that we will automate screening of a journal that is behind a paywall and set up an automation that logs in to the website and searches the journal contents, there would be another big issue related to the paywall: user license.

Usually, when a person subscribes to a journal, they get into an agreement with the publisher that would state something in the line of “this is for your personal use only, do not use for commercial purposes, do not share with others.” In practice this would mean that any scraping of such a journal would be in violation of the subscription agreement.

Issue of the past – journal with PDF download only

As with paywall and print journals, if the journal is being published as a pdf only and the abstracts of the articles are not available to be searched via the internet (read Google), the journal’s importance can be considered to be much lower than for a publication that shares the abstract in a format that is searchable online.

This type of journal could be considered to be excluded from the list of sources.

It is technically possible to set Tepsivo Literature to download the pdfs, use OCR on them and enter them to the systems article database, the same way as any other article, but it should be considered whether this type of publications are of the quality that should be screened.

As of the latest version of Tepsivo Literature, this issue has been overcome by adding both fully automated pdf screening and a hybrid screening of pdfs.

Source information is incorrect

One issue detected in the testing we performed is that Tepsivo Literature’s source catalog depends on the indexing information provided by the database providers (PubMed, Embase, DOAJ, OpenAIRE etc.) and / or journal publishers or sometimes the journals themselves if a small local journal does not have a publisher.

The information source may claim that the journal is indexed in a certain national or global database, but when articles of a specific journal were searched in the database we found out that there were no articles from that journal in the database even though we could confirm from an alternative source, for example the journal’s website or library, that issues and articles had been published.

As Tepsivo Literature covers a large number (tens of thousands) of journals, Tepsivo is not regularly confirming for each journal separately whether the articles are actually available or not in each of the databases. The assumptions are that:

– The journal is not discontinued.

– The journal, publisher and/or the database indexing the journal provides correct information about the journal.

The risk that either the journal, or the database or both provide incorrect information needs to be taken into consideration, when performing source review activities.

These types of fail-safe and issue specific activities are actually the top improvement tasks for Tepsivo Literature to ensure we cover all bases, just in case. While these features do not provide additional benefit into the monitoring of the drug safety profiles, they give an additional peace of mind to those who put their trust into the automation.

🙏 We would like to use this opportunity to remind all those who take part in publishing life sciences journals, that if you stop publishing your journal or you no longer provide the articles information to databases or libraries, please remember to communicate it to those parties and also inform it on your own website.

Even if your journal is meant for the small community of local paediatricians, there might be 100 pharmaceutical companies who rely on the data you have provided to be correct and up to date.

Conclusion

This case study of the generic company’s transition from manual literature screening to Tepsivo Literature’s automated system reveals several advantages and challenges inherent in each approach. But without doubt, Tepsivo Literature, already in its current version offers the most comprehensive search available without incurring senseless costs to pharma companies and ultimately patients.

Automated literature screening like Tepsivo Literature provides a substantially broader and more consistent coverage of relevant journals, significantly increasing the number of articles identified for potential safety information. This wider net is crucial for ensuring that pharmaceutical companies do not miss emerging safety concerns.

The automation reduces the variability and subjectivity associated with manual screening, leading to more standardized and efficient processes aligned with current regulatory guidelines. Covering the largest dataset currently possible ensures legal compliance while achieving this with marginal human effort, substantially dropping costs down.

Some challenges remain. We mentioned the inability to automatically access content behind paywalls or in hard-copy-only journals, but considering the relevance of journals that do not share abstracts with the scientific community or those who do not even have an online version in the age of AI, it is unlikely to be limiting.

Moreover, the reliance on titles and abstracts for initial screening can miss safety-relevant information that requires full-text analysis, a step more common in manual processes. This is possible, but very rare, and not legally required seeing the abstract assessment is fully sufficient.

Despite these limitations, the automated approach of Tepsivo Literature proves to be much more scalable and significantly less time consuming, freeing up valuable human resources for more complex safety assessments and massively saving costs. Ensuring accurate indexing and comprehensive coverage continues to be a focus for improvement, but the benefits of automation in terms of speed, scope, and compliance are evident.

Implementing a hybrid model that includes manual review of specific, inaccessible journals may address some shortcomings. Ultimately, balancing both methods may offer the most comprehensive solution for medical literature monitoring, combining the depth of manual review with the efficiency of automation.

While these would offer the most wide-reaching possible screening, they would also substantially increase the price compared to the fully-automated screening and you would be right to ask why, seeing this adds no difference in terms of regulatory compliance.

Another topic that was not assessed in this case study, nor discussed above, is the automation of the safety assessment beyond keyword detection. The current large language models offered by companies like OpenAI, Microsoft and Google, have become so accurate and systematically more reliable than human specialists in safety assessment, that soon the generic companies with large portfolios should consider automating the safety assessment (and data entry) fully.

And an important update: New AI features are now already included in Tepsivo Literature to overcome any potential short-comings of the assessed version and keep the costs down.

Did you like the article? Share with your network!

…or tell us your opinion.

0 Comments

Submit a Comment

New comment:

Follow our newsletter!

Keep up with industry trends and get interesting reads like this one 1x per month into your inbox.

…or just get notified through our RSS feed RSS feed Tepsivo

Learn more about Tepsivo

We deliver modern PV solutions to fulfill your regulatory needs using less resources. See how we do it >

| contact@tepsivo.com