ASPECTS OF UTILIZATION AND LIMITATIONS OF ARTIFICIAL INTELLIGENCE IN DRUG SAFETY

Previously, it was thought that computers cannot perform the works on its own and need the human intelligence but now it is possible with the help of artificial intelligence (AI). AI has the potential to impact nearly every aspect of medical science. As pharmacovigilance (PV) deals with data concerning drug safety, it is being considered the field to be enormously transforming in near future with the emergence of AI. This article explores and gives an overall review of the researches done to implement AI technologies in PV activities. Among many of the PV activities, case processing is the most resource-consuming area, and signal detection is considered to be a poorly functioning area due to various limitations. Introducing AI will potentially fulfill the limitations in these areas and help us to use the resources in a focused way to get the real-world risk-benefit ratio for a better understanding of the safety profile of drugs and to take timely action for the well-being of people.


INTRODUCTION
As the recent blooming of artificial intelligence (AI) made a tremendous impact in data science, it has a huge scope for its utilization in the field of pharmacovigilance (PV) which is in need of alternate methods to cope up with the increasing load of drug safety data accumulating as a result of developments in spontaneous reporting systems and usage of social media as a source of PV [1]. The objective of this article is to give an overall review of the researches done in this field of interest.

AI
In simple terms, AI is a machine that has acquired human intelligence through learning by pattern recognition from huge volume of datasets, just like a child is learning from the surroundings and becoming an intellectual human being. Fig. 1 illustrates that in traditional programming, the computers installed with the human-made program will receive the data to produce results but in AI, the computers are trained by giving datasets that having a large volume of data and its results, as an input, to produce the program by itself; that program (trained AI model) can be used to produce results in the form of various tasks such as classification (predicting the best category for the given data), regression (predicting a continuous outcome), and clustering [2]. This is possible by the use of various machine learning (ML) techniques illustrated in Fig. 2, such as supervised and unsupervised learning; which are basically the by-product of various mathematical (linear, nonlinear, or logistic regression), probabilistic (Prior or Unconditional Probability, conditional probability, and Bayes' theorem), and statistical (Frequentist and Bayesian statistics) methods [3,4]. Most of the times, the drug safety data are in an unstructured format (e.g.: Free-text form) that needs to be converted into a machine-readable format before doing the tasks. This can be done by an AI method called natural language processing (NLP). Likewise, optical character recognition is used to read handwritten documents and AI-based speech recognition systems are used to read the information reported over the call [5].

PV
It is the science and activities relating to the detection, assessment, understanding, and prevention of adverse effects of drugs or any drug-related issues [6]. It involves various activities such as case processing, signal detection, and risk management ( Table 1 for the explanations of terminologies used in PV).

VARIOUS SOURCES OF ADVERSE EVENT REPORTS
Unsolicited (voluntary) sources include spontaneous reports by physicians and consumers, literature sources, and social media. Solicited (organized) sources include phase Ⅳ study reports, reports coming from health authority and pharmaceutical company, reports by patient support program, and social media.

Utilization in case processing
Case processing demands more human workforce and therefore cost and time-consuming. It consumes most of the PV budget of a pharmaceutical company. We can do nearly all the case processing activities using NLP and ML. For example, NLP would contextualize the unstructured data from various sources of PV and make it easy for ML to do various tasks such as extracting individual case safety report (ICSR) information [8], validity evaluation [8], seriousness evaluation [9], expectedness evaluation, and causality assessment. ICSR information could be extracted from various documents such as medical literature, case narratives (case reports), medication review posts in social media, and electronic clinical records (e.g. discharge summaries). We could automate medical dictionary for regulatory activities (MedDRA) coding and the WHO drug dictionary coding using ML along with rule-based systems [10]. We could use AI along with rule-based systems to check for duplicates and for categorizing ICSRs (e.g.: Physician-reported ICSRs versus patient-reported ICSRs) for Aggregate reporting. We could use AI to find out sensitive information in free-text so that we could anonymize the case narratives and make sure that we could share them for PV purposes without compromising patient confidentiality. We could use AI in screening voluntarily reported adverse events to find out serious events and to exclude non-serious events [11]. This will save the time, cost, and manpower required for these activities. Introducing AI in PV also improves the quality and accuracy of drug safety results [12,13].

Sujith et al.
Some examples of researches done in utilizing AI for case processing • Danielle Abatemarco et al. [14] used data received by Celgene's drug safety department and applied Deep Learning Approaches for ICSR Processing. In this, all the 10 trained AI models have reached the minimum evaluation score of 75%, and six of them reached 90% and above; thus, all models have shown potential for future use • Kajal et al. [15] used abstracts of case reports published in PubMed Central and extracted adverse events with 89% accuracy and identified suspect drug with 79% accuracy using classifiers based on ML and NLP techniques • Ramesh et al. [16] used a named entity tagger based on supervised ML and detected medication information and adverse event entities from Food and Drug Administration's (FDA's) Adverse Event Reporting System (AERS) Narratives. The best performing tagger achieved an overall F 1 score of 73% • Dev et al. [11] used traditional ML and deep learning techniques to classify adverse event case reports published in PubMed Central as serious versus non-serious and the final model achieved an average F 1 -Score of 95%. Fig. 3 illustrates that in case of a low volume of data, signals are detected by qualitative analysis by PV experts by reviewing the spontaneous ICSRs and medical literature, but in case of a high volume of data, it is done by quantitative analysis of reports with the help of various data mining methods [17]. Conventionally, we use disproportionality analysis (DPA) algorithms for evaluating two-dimensional associations (e.g. drugevent associations). Here, the disproportionality between the observed and expected values is compared. For example, if few cases of an event have occurred in patients taking a particular drug, to know whether they occurred by chance or there is a real association, we should have the following three details. They are, how many people get the adverse event in general, how many people take the drug in general, and the reporting rate. But in reality, getting those details is very difficult. Hence, we could use the database of spontaneous reports itself; to compute the number we would expect if the drug and the event are co-occurring by chance using DPA algorithms. If the observed cases are more than the expected number of cases, we could submit the reports to the scrutiny of expert reviewers, as there could be a possible association between the drug and the event. DPA uses either the Frequentist approach (proportional reporting ratio [PRR], reporting odds ratio [ROR]) or the

PV terminologies Explanations
Case processing It involves case intake (Text element contextualization, Identification of duplicates, Determining validity), evaluation (Seriousness evaluation, expectedness evaluation, and causality assessment), and reporting Signal It is the information that arises from one or multiple sources of PV, which suggests a new potentially causal association between an intervention and an event, either adverse or beneficial, that is judged to be of sufficient likelihood to justify verificatory action [7] Adverse event It is any untoward medical occurrence that may happen during treatment with a drug but which does not necessarily have a causal relationship with the treatment Adverse drug reaction It is a response to a drug that is noxious and unintended and which occurs at doses normally used in man and has a causal relationship with the treatment ICSR It is a drug safety report containing information describing the reporter, patient, suspected drug, and the adverse event that occurred at a specific point of time Aggregate reporting It is periodic reporting of cumulative safety information about a drug to regulatory agencies as per regulatory requirements Medical dictionary for regulatory activities It converts the reported adverse events into standardized medical terminology and gives an identification code for the event to create uniformity across. MSSO (Maintenance and support service organization) releases updated MedDRA versions twice a year (in March and September) XML (Extensible Mark-up Language) It is a mark-up language that defines a set of rules for encoding documents in a format that is both humanreadable and machine-readable VigiBase It Statistical algorithms use MedDRA terminologies in the process of signal detection; usage of very specific names for ADR-coding will cause dilution of signals. To tackle this situation, standardized MedDRA Queries (SMQs) have been created. They compile related MedDRA terms specific to a medical condition. The SMQs will be created by the team of experts through a manual study of the MedDRA. It is laborious and timeconsuming. We could use unsupervised ML techniques for this purpose by clustering the related MedDRA terms together to create SMQs. In the future, this may replace the manual work in this regard [23,24]. We could use ML based algorithms to look in longitudinal medical records for patterns of time-to-onset that may suggest whether a particular drug increases or decreases the risk of an adverse event.

Some examples of researches done in utilizing ML techniques for signal detection
• Botsis et al. [25] and Ball and Botsis [26] used network analysis approach to help in detecting signal from the safety surveillance data of FDA's Vaccine AERS • Harpaz et al. [27] used clustering approach and discovered that a large proportion (41%) of clusters having associations (e.g.: chlorpromazine -hepatotoxicity, bosentan -hepatic steatosis, and methotrexate -pancytopenia) that are currently unrecognized but all of which are supported by older case reports • Ji-Hwan et al. [28] compared ML algorithms with traditional DPA methods using dataset of known and unknown ADRs of Nivolumab and Docetaxel taken from Korea national spontaneous reporting database and found out that ML algorithms outperformed traditional DPA methods in detecting new ADR signals.

Utilization of AI in preclinical toxicity prediction studies
Preclinical animal studies of toxicity testing are quite expensive, time consuming, laborious, and there are chances of inter-species variations of toxicity. Hence, in silico statistical methods (e.g.: Quantitative Structure-Activity Relationships method) have been developed. This would statistically establish the plausible relationship between the physiochemical characteristics (e.g.: molecular structure) and toxicity (e.g.: Lethal Dose 50%). These physiochemical characteristics should be converted into a machine-readable format called chemical descriptors (e.g.: Molecular fingerprints) before using in the machine. Toxicology databases such as Toxicology data network, Hazardous Substances Data Bank, and Toxicity Forecaster are having the toxicity information of various chemical compounds. We could use these databases to train and develop an AI model for predicting toxicity during initial stages of drug development itself. Apart from saving time and cost of the drug development process, it could substantiate post-marketing safety signals [29].

Utilization of AI in the detection of adverse events associated with DDIs from medical literature
We could use deep learning AI to identify adverse events especially those associated with DDIs from medical literature. Allowing AI to get access and train from all the medical literature would able to produce one model capable of detecting and categorizing the DDIs by a classification task. For example, we could classify sentences mentioned with a pair of drugs as either the one having true DDI or the one not having any DDI. This aspect of AI utilization would impact the medical practice in a better way for safer use of drugs by improving the knowledge of  health professionals about DDIs. We could use other resources such as molecular information, social media in addition to using medical literature for training the AI model to facilitate its performance in detecting DDIs [30].

Social media as a source of PV
The drug safety data often shared or searched by patients on social media. These data can be extracted and evaluated using an AI model. Specialized health-care social networks and forums (e.g.: Medications. com), Generic social networking sites (e.g.: Twitter), internet Search logs are potential sources of PV, and it offsets the limitations of traditional reporting systems such as lack of geographic diversity and under-reporting [31]. Thus, social media sources may supplement traditional sources and it will act as an important source for signal detection, as the adverse events shared in the social media are quite often different from traditional sources and there is a huge possibility that it will detect a new or rare adverse event [32,33]. We could use ML and NLP techniques to extract drug abuse data from social media for toxicovigilance. This would help in knowing trends of drug abuse and assist in tailoring community interventions to ensure safe use of drugs [34].
Some examples of researches done in utilizing AI for extracting drug safety data from social media • Comfort et al. [35] used ML to Identify ICSR from social media. In this, the trained AI model has reached 83% accuracy and it took 48 h to complete a task that would have taken 44,000 h for human experts to perform • Nikfarjam et al. [36] compared supervised ML-based approach with several strong baseline systems for extracting mentions of AEs from social media and found out that ML-based approach outperformed all baseline systems by achieving 82% and 72% F 1 scores for daily strength and Twitter corpora respectively • Jing et al. [37] used Semi-Supervised ML for extracting adverse drug events from patient-generated content in social media and attained highest area under the curve (AUC) value of 81.48% • Sampathkumar et al. [38] used supervised ML to extract adverse event posts from online health forums (Medications.com, SteadyHealth.com) and attained an average F 1 score of 76%.

Limitations in the training phase of AI model
First of all, we have to develop trained AI model which in turn need training datasets (e.g.: TwitMed) for learning. The emergence of electronic health records (e.g.: VigiBase, Oracle Argus safety system) fulfilled these requirements needed to introduce the AI in PV, but regulations should be made in giving access to these databases for getting datasets to use in AI [39]. Even though the Unsupervised ML does not need annotation (labeling or tagging), it cannot be used for doing all the tasks and the results it produces are tricky to interpret. Hence, we need manual annotation of the training dataset before using it for doing tasks with Supervised ML. This is a laborious job. Hence, we could use annotated datasets from multiple corpora to reduce time and cost associated with laborious manual annotation and to increase the accuracy of AI model functioning [40] or we could use Semi-Supervised ML which uses a small amount of labeled data with a large amount of unlabeled data [41]. Although laborious, the manual generation of an annotated corpus is a 1 time task unless any changes in standards are made later. As the AI model is trained using datasets, if any change in the standardization of the elements (e.g.: updates in MedDRA coding, updates in WHO drug dictionary), it will produce error-prone results for those elemental tasks (e.g. automated MedDRA coding, and automated WHO drug dictionary coding) [14]. Hence, if any changes are made in the task elements, the AI model should be updated using new training datasets. This again requires the human workflow to produce the new training datasets according to new standards. Apart from this, we have to invest a significant amount of time from the AI developer and the domain expert to understand the errors and tune the model to correct those errors. After the training, the AI model needs to be evaluated and validated by setting-up an Acceptable Quality Limit. At the start of AI model development, the total dataset would be divided into a training dataset and test dataset for an unbiased evaluation of the model. Various evaluation metrics used are F 1 score, accuracy, and AUC. Usually, AI model is evaluated using F 1 score which is a measure of both precision and recall (Fig. 4). F 1 score above 75% is often considered to be the set-point for most of the tasks [42]. AUC is used when the dataset is imbalanced (positive samples are far less than negative ones, or vice versa). Hence, initially, Pharma agencies should provide sufficient funding for the development of AI Model and also for the development of annotated corpus which is needed for training and validating ML.

Dataset related limitations
As mentioned earlier the AI model is trained using datasets. Hence, if the training dataset is inadequate or biased or unequally distributed, it will affect the function of the AI model and the task results it produces will be error-prone. For example, if each physician reports the symptoms of an adverse event by their own terms, it will be biased.
Hence, the reporting terms should be standardized. For this purpose, the PV databases are uploaded with MedDRA. To avoid confusion with various marketing terms of a drug, the PV databases are uploaded with the WHO drug dictionary. The ICSRs used for AI training should be diverse and representative. Hence, they should cover all the factors such as the type of report, source country, seriousness category of the AE, Investigator's Brochure, and Company Core Data Sheet. The sampling strategy should make sure that the possible value for each factor is appropriately diverse and representative [42].

Data privacy and Social media related limitations
There are some data privacy issues in allowing AI to use safety data from social media, so regulations should be developed in this perspective to take hold of this [43]. Using social media safety data by an AI has its own limitations as it contains misspellings, usage of non-medical terms and slang, duplicates due to multiple postings, incomplete data due to missing important information, lack of standards, high volume of data, and high signal-to-noise ratio (only a small proportion of drug safety data collected from social media contains information associated with ADRs) [32]. These limitations could be overcome or minimized by spellchecking, keyword searching, sentiment analysis with the help of NLP techniques such as tokenization, stemming, lemmatization, Partof-speech tags, named entity recognition, and chunking.

Ethics related limitations
Ethical issues in terms of ethical principles in using AI should be taken care of by AI developers by collaborating with medical professionals, ethicists, and philosophers [44]. For example, sometimes there could be chances that the inferences made by an AI or understood by decision- makers are being discriminatory in nature due to some biases resulting from the change in group characteristics. This will be against the ethical principle called justice (treat all people equally) [43]. There are some ethical issues in sharing of drug safety data between various PV agencies. ICH (International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use) has given some guidelines recording this. The Oracle Argus safety system is designed in such a way that it follows ICH guidelines in data sharing.

Limitations related to the complexity of medical sciences
Even though it is possible in the future to automate all the processes of PV, there are some complex aspects of medical science where it is doubtful to train a machine to learn. For example, some anti-psoriatic drugs will generate new auto-antibodies that may aggravate the disease; here the aggravation of the disease itself is an adverse reaction of the treating drug. Hence, in this scenario, the trained machine that would differentiate the disease from an adverse event cannot possibly identify the adverse event which is presenting as disease aggravation [35]. To tackle this kind of limitation, we could implement some rule-based systems along with AI where the machine would submit the document that showing uncertainty in detecting ADRs for human handling before reporting to the regulatory authority.

CONCLUSION
As PV is a rapidly growing field in developing countries, every year huge volume of drug safety data is accumulating and in future, it will exceed the capacity of human processing. Social media "big data" has useful drug safety information that can't be processed by humans due to its vast amount. Hence, we are in urgent need of developing and implementing AI models for PV activities. Automation in PV not only reduces the cost, time, and manpower required but also improves the quality and accuracy of drug safety results. Through this, PV professionals can get more time to apply their knowledge in assessing the case rather than spending time to simply capture the relevant data from a safety database. This will improve their decision-making, help to detect signals in time and to take actions based on those signals, and help in better understanding of the risk-benefit ratio of the drug. As AI is having a wide range of applications in various branches of medical science related to PV, it has huge potential to cause direct and indirect multimodal influence on drug safety. Thus, the implementation of AI in PV will transform drug safety to a greater level.