Machine learning-based clinical decision support using laboratory data

Hikmet Can Çubukçu; Deniz İlhan Topcu; Sedef Yenice

doi:10.1515/cclm-2023-1037

Publicly Available Published by De Gruyter November 29, 2023

Machine learning-based clinical decision support using laboratory data

Hikmet Can Çubukçu , Deniz İlhan Topcu and Sedef Yenice

From the journal Clinical Chemistry and Laboratory Medicine (CCLM)

https://doi.org/10.1515/cclm-2023-1037

Abstract

Artificial intelligence (AI) and machine learning (ML) are becoming vital in laboratory medicine and the broader context of healthcare. In this review article, we summarized the development of ML models and how they contribute to clinical laboratory workflow and improve patient outcomes. The process of ML model development involves data collection, data cleansing, feature engineering, model development, and optimization. These models, once finalized, are subjected to thorough performance assessments and validations. Recently, due to the complexity inherent in model development, automated ML tools were also introduced to streamline the process, enabling non-experts to create models. Clinical Decision Support Systems (CDSS) use ML techniques on large datasets to aid healthcare professionals in test result interpretation. They are revolutionizing laboratory medicine, enabling labs to work more efficiently with less human supervision across pre-analytical, analytical, and post-analytical phases. Despite contributions of the ML tools at all analytical phases, their integration presents challenges like potential model uncertainties, black-box algorithms, and deskilling of professionals. Additionally, acquiring diverse datasets is hard, and models’ complexity can limit clinical use. In conclusion, ML-based CDSS in healthcare can greatly enhance clinical decision-making. However, successful adoption demands collaboration among professionals and stakeholders, utilizing hybrid intelligence, external validation, and performance assessments.

Keywords: machine learning; clinical laboratory; decision support; total testing process; data

Introduction

The field of artificial intelligence (AI) has witnessed pivotal moments where AI systems have triumphed over human experts. Two notable clashes, namely Deep Blue vs. Garry Kasparov and AlphaGo against Lee Se-dol, have served as compelling demonstrations of AI’s superiority in challenging cognitive tasks [1]. These occurrences have sparked a profound debate regarding the role of AI in our society, prompting a reassessment of our perception of AI as a mere competitor. The implications of AI, particularly its subset known as machine learning (ML), hold immense potential for revolutionizing the healthcare sector including clinical laboratories.

AI encompasses a wide range of technologies capable of autonomously making decisions and exhibiting intelligent behavior through data analysis. These technologies can be classified into two main categories: non-adaptive and adaptive approaches. Non-adaptive AI systems operate based on predefined rules, while adaptive AI systems, such as ML, leverage mathematical functions and statistical techniques to derive insights from input data without explicit instructions. ML techniques can further be categorized into supervised learning, where labeled data is provided for training, and unsupervised learning, which operates without the need for labeled data [2]. Deep learning, a subset of ML, is inspired by the structure of biological neural networks and employs multi-layered artificial neural networks to learn complex patterns and make predictions based on the data. The ability of deep learning algorithms to automatically extract intricate features from vast datasets has led to significant advancements in various domains [3, 4].

This review article comprehensively examines the impact of AI, with a particular emphasis on ML, and its potential to enhance the proficiency of laboratory professionals in the clinical laboratory. The primary objective is to present a thorough and all-encompassing overview of ML-based clinical decision support systems that effectively utilize laboratory data across the pre-analytical, analytical, and post-analytical phases. Through a meticulous exploration of the role of laboratory data in decision-making during these phases, the article further delves into the diverse applications of ML algorithms and models in augmenting decision support and addresses the challenges associated with the implementation of ML-based decision support systems.

Overview of machine learning-based decision support using clinical laboratory data

Machine learning model development and performance evaluation

The development and performance evaluation of ML models are essential steps in harnessing the potential of data-driven decision-making across diverse domains. Key steps for this pipeline are given in Figure 1. The ML model development process commences with (1) data collection and the careful (2) curation of an initial dataset such as data cleaning, (3) feature engineering (e.g., feature extraction and selection) which is subsequently divided into distinct subsets: the training dataset, the tuning dataset, and the internal validation testing dataset. The training dataset serves as the foundation for ML (4) model development, enabling the model to learn patterns and relationships within the data. The inclusion of a tuning dataset, while dependent on the methodology applied, can be valuable in optimizing the model’s performance by fine-tuning its parameters and configurations [5]. Once the ML model has been constructed and optimized, it undergoes a rigorous performance evaluation (5), explainability assessment, and validation process (6). This evaluation encompasses not only the internal validation testing dataset but also an independent external validation dataset that is separate from the dataset used during model development [5, 6].

Figure 1:

Machine learning model development steps. Yellow boxes are indicating steps that can be conducted by AutoML tools.

The inclusion of an external validation dataset allows for unbiased testing of the model’s performance in a novel population characterized by diverse demographics. It serves as a robust mechanism to assess the model’s external validity and determine its applicability across a broad range of patients or individuals. During external validation, the model is rigorously tested to assess its discrimination and calibration performance, employing established metrics and evaluation techniques [7]. Overall, the integration of external validation enables unbiased testing, ensuring the model’s applicability across different demographics. The final stage in ML involves the deployment of the model, a critical process where the developed algorithm is implemented into the real-world environment for practical use. The final stage (7) in ML involves the deployment of the model, a critical process where the developed algorithm is implemented into the real-world environment for practical use.

Automated machine learning

The series of steps involved in ML model development can be complex. From data preparation to performance evaluation, each stage requires specific knowledge and skills, which could be overwhelming for non-data science professionals [8, 9]. To overcome this problem, automated machine learning (AutoML) tools have been introduced which can build high-quality machine learning models for specific tasks without human expertise. These tools aim to automate the pipeline of ML model development, including feature engineering, model development, hyperparameter optimization, and performance evaluation, all of which typically necessitate experienced users as shown in Figure 1 [10, 11]. Therefore, they reduce the need for data scientists, enabling domain experts to create ML applications with minimal requirements of statistical and ML expertise [11]. Moreover, by automating some of the ML development components requiring expertise, healthcare professionals can more rapidly build, validate, and deploy ML solutions, and therefore more readily improve the quality of healthcare for patients. Despite the increase in AutoML research, the deployment of these models in clinical practice within the healthcare field is significantly limited due to factors such as explainability issues and data quality [10].

Clinical decision support

Clinical decision support (CDS) systems play a crucial role in assisting healthcare professionals in the interpretation of test results, mitigating interpretative subjectivity, and minimizing inconsistencies [12]. Traditionally, CDS tools have relied on rule-based systems to provide decision support. However, recent advancements in AI have demonstrated promising potential in enhancing CDS systems [13]. Particularly, AI has facilitated the development of new diagnostic and prognostic models through the utilization of ML techniques on extensive clinical datasets [14].

The integration and simultaneous interpretation of clinical data, imaging findings, and laboratory results make notable contributions to the evaluation of diagnosis and prognosis. A significant proportion, 66 %, of clinical decision-making is based on in vitro diagnostics (IVD) [15]. AI/ML models hold substantial potential in improving the contribution of IVD to clinical decision-making, thereby enhancing patient care and outcomes.

Specifically, AI/ML-driven CDS systems have important applications in various healthcare domains, including:

Diagnosis: AI/ML models can aid in the early and accurate diagnosis of diseases by analyzing patient data and test results [14], reducing the risk of misdiagnosis and improving patient outcomes.

Treatment planning: CDS systems powered by AI/ML can assist healthcare professionals in creating personalized treatment plans [16], considering patient-specific factors and the latest medical research.

Prognostics: Predicting disease progression and patient outcomes is another critical application [14]. These models can help healthcare providers make informed decisions about patient care and intervention.

Medication management: AI/ML-driven CDS can aid management of medications by checking dosages and potential drug interactions to enhance medication safety [17].

Patient monitoring: Continuous monitoring and real-time data analysis enable early detection of health issues and timely interventions, improving patient care and reducing healthcare costs [18].

By harnessing the power of AI/ML, the accuracy and efficiency of CDS systems can be significantly improved, allowing healthcare professionals to make more informed and effective decisions in their clinical practice.

Recent technological advances in laboratory medicine and the role of machine learning in clinical decision-making

The field of laboratory medicine is undergoing significant transformations as a result of two prominent technological advancements: automation and AI [14]. While the invention of microprocessors triggered total laboratory automation, nowadays, AI has paved the way for complex devices that include software with AI technologies. Clinical laboratories are experiencing a paradigm shift through the integration of sophisticated automated systems empowered by AI-driven software and advanced robotic technologies. This convergence enables the accomplishment of greater volumes of work with a reduced need for extensive human intervention within the laboratory setting [19]. In the context of the fourth industrial revolution, predictions suggest that approximately 30 billion interconnected devices will form the Internet of Things (IoT). Consequently, the convergence of cyber-physical systems, IoT, cloud computing, ML, and AI presents a tangible reality in the present and an anticipated future reality, revolutionizing laboratory medicine [19, 20].

Integrated diagnostics, which entails the integration of radiology, pathology, and laboratory medicine with advanced information technology, holds immense promise in transforming the landscape of disease diagnosis and therapeutic interventions [21]. Moreover, in the present era, healthcare practitioners are confronted with an escalating volume and diversity of data, encompassing various domains such as imaging, genomics, proteomics, clinical observations, as well as personal and environmental records. In light of this burgeoning data landscape, the utilization of AI and ML technologies undoubtedly offers valuable prospects for the comprehensive analysis and interpretation of this vast wealth of information [22].

Decision support with machine learning in the total testing process

The implementation of ML models incorporating laboratory results has attracted increasing attention in the scientific literature. These ML models have been applied across various stages of the total testing process, including pre-analytical, analytical, and post-analytical phases, as given in Figure 2.

Figure 2:

Decision support with machine learning in the total testing process.

To provide a comprehensive overview, we conducted an extensive review of relevant scientific literature about the utilization of ML in laboratory medicine. Specifically, we selected articles on AI/ML from reputable laboratory medicine journals that have demonstrated utility in the pre-analytical, analytical, and post-analytical phases within total testing process. Furthermore, to ensure the quality and relevance of our selected articles, we meticulously scrutinized them based on the following criteria: data features, ML methods employed, programming languages and packages utilized, the performance of the best ML model, considerations of model explainability, reported study limitations, and merits or outcomes observed. The findings of this review are summarized in Table 1, which provides a detailed analysis of the different approaches, methodologies, and outcomes reported in the literature. By synthesizing and analyzing the existing body of research, this review aims to shed light on the current state of ML implementation in laboratory medicine and provide insights for future developments and applications in this field.

Table 1:

Studies on the implementation of artificial intelligence and machine learning in laboratory medicine.

Phase	Study	Aim	Data	ML methods	Language and packages	Best model’s performance	Explainability	Limitations	Merits/outcome
Pre-analytical	Fang 2021	To identify clotted specimens.	192 clotted specimens and 2,889 no-clot-detected specimens results (TT, Fbg, PTT, PT, D-dimer results, and labels about the presence of clot)	Standard and momentum backpropagation neural networks (BPNNs)	R	Momentum BPNNs; AUC: 0.971, accuracy: 0.953, specificity 0.967, sensitivity 0.940.	Logistic regression coefficients were given^INT	The noticeable disparity in the distribution of age	A potential method for identifying clotted samples using coagulation test results
Pre-analytical	Farrell 2021	To identify mislabelled samples	127,256 sets of consecutive results (age, gender, specimen collection time present and previous results of sodium, chloride, potassium, bicarbonate, creatinine, and urea)	Decision trees, random forest, artificial neural network (ANN), k-nearest neighbors, extreme gradient boosting, support vector machines, logistic regression	R Packages: rpart, class, randomForest, e1071, xgboost, keras, and caret	ANN; accuracy: 92.1 %, AUC: 0.977	None	Randomly introduced labeling errors, omitting non-random errors	Machine learning algorithms achieved better performance than humans in identifying incorrectly labeled samples.
Pre-analytical	Farrell 2021.2	To detect errors related to wrong blood in the tube (WBIT)	141,396 sets of data items (age, sex, current and previous electrolytes, urea and creatinine (EUC) results, and their delta values)	ANN	R Package: keras	Sensitivity: 90.6 %, specificity: 94.5 %, and accuracy: 92.5 %	None	Randomly introduced labeling errors, omitting non-random errors	The performance of human interaction with AI models (for WBIT errors) was lower than autonomously functioning AI models.
Pre-analytical	Ialongo 2017	To manage sample dilution of serum-free light chain (sFLC) testing	6,099 database entries (sFLC results, dilution status, patient’s hospital status)	ANN utilizing the multi-layer perceptron (MLP-ANN)	SPSS 20	MLP-ANN reduced wasted tests for κ-FLC and λ-FLC by 69.4 and 70.8 % respectively.	Relative importance for the features was calculated using the Garson algorithm^{MS, G}	ANN model was systematically unable to recognize some particular cases, with no external validation.	MLP-ANN reduced the number of sFLC testing by managing specimen dilution.
Pre-analytical	Mitani 2020	To detect specimen mix-ups	2,159,354 records (complete blood cell counts and biochemical tests, differences between consecutive results)	Gradient-boosting-decision-tree (GBDT)	Python Packag: XGBoost	AUC: 0.998	Reported as SHAP values^{MA, G}	Simulation of mix-up, single-center study, no external validation	ML model performed efficient specimen mix-up detection
Pre-analytical	Rosenbaum 2018	To identify WBIT errors	20,638 patient results (11 clinical chemistry analytes, absolute changes, velocity)	Logistic regression, support vector machine	R	Support vector machine; AUC: 0.97	Indirect: PPV of univariate and multivariate delta checks	No external validation	The authors created an ML model to detect and prevent WBIT errors, reducing potential harm to patients. ML model was superior to conventional single-analyte delta checks.
Pre-analytical	Streun 2021	To reveal chemical manipulation in urine samples	702 urine samples (Mass Spectrometry results)	ANN	R, Python Packages: caret, keras, Tensorflow	Accuracy: 95.4 %	Feature importance was assessed using local interpretable model-agnostic explanations (LIME)^MA,L.	Lack of different adulterant concentrations, no external validation	ANN model was built to reveal chemical urine manipulation.
Pre-analytical	Yang 2022	To develop a deep-learning-based model to evaluate serum quality using sample images	16,427 centrifuged blood images with serum indices (hemolytic, icteric, and lipemic index values)	Convolutional neural networks (CNNs) (Inception-Resnet-V2 network)	Packages: Keras, Tensorflow	Hemolysis detection: AUC 0.989, Icterus detection: AUC 0.996, Lipemia detection: AUC 0.993.	None	No external validation	The deep learning model for automated assessment of serum quality
Pre-analytical	Zhang 2020	To improve PBFC test utilization	784 PBFC samples (history of hematological malignancy, CBC/diff parameters)	Decision tree, logistic regression model	R, Package: rpart	The decision tree model demonstrated a sensitivity of 98 % and specificity of 65 %, with an AUC of 0.906.	-Odds ratios for logistic regression were given^INT -Proposed decision tree was given^INT	Small sample size, no external validation	ML models for PBFC triaging reduced unnecessary utilization by 35–40 %.
Pre-analytical	Zhou 2022	To detect sample mix-ups using delta check method-based deep learning	423,290 hematology test results	Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN), Naive Bayesian Classifier (NBC), and Deep Belief Network (DBN).	Python, Packages: Keras, sklearn	DC method based on DBN AUC 0.977, accuracy 93.1 %, TPR 92.9 %, TNR 93.3 %	None	Lack of explainability	DC method based on DBN outperformed RCV and empirical delta check for specimen mix-up detection.
Analytical	Bigorra 2017	To attain automated differentiation between reactive lymphoid cells (RLC) and blast cells of lymphoid and myeloid origin	Total dataset: 916 blood cell images from 47 patients. Train set: 696 images from 32 patients. Test set: 220 images from 15 new patients.	Support vector machine	None declared	SVM (global accuracy 80 %, reactive lymphoid cell 85.11 %, lymphoid blast cell 73.97 %, myeloid blast cell 82 %)	Feature selection was performed before model development.	No external validation	Automatically distinguishing between reactive lymphocytes and blast cells in general, and specifically recognizing myeloblasts and lymphoblasts.
Analytical	Chabrun 2023	To analyze peripheral leukocytes using deep learning approaches to predict VEXAS syndrome	12 patients (197 blood smears)	Convolutional neural networks + support vector machine	Python, Package: sklearn	ROC-AUCs from 0.87 to 0.95 (VEXAS patients were effectively distinguished from both UBA1-WT and MDS patients:)	Visualization: UMAP was used for two-dimensional visual representations of the encodings.	Small sample size	Deep learning accurately distinguished neutrophils and monocytes drawn from patients with VEXAS syndrome.
Analytical	Durant 2017	To classify erythrocytes based on morphology	3,737 labeled cells	Convolutional Neural Networks (CNN)	Python, Packages: Theano, Lasagne	CNN achieved a recall of 92.70 %, precision of 89.39 %, and correct classification frequency of 90.60 %.	None	No external validation	CNN demonstrated high accuracy in measuring erythrocyte morphology profiles.
Analytical	Mohlman 2020	To distinguish between diffuse large B-cell lymphoma (DLBCL) from Burkitt lymphoma (BL) based on histologic images.	10,818 H&E-stained tissue slide images: 36 cases of DLBCL and 34 cases of BL.	CNN	Python, Platform: Tensorflow	CNN achieved an AUC of 0.92	None	The presence of more training images from BL may have resulted in a slight bias.	Tool designed for distinguishing a specific subset of BL and DLBCL cases.
Analytical	Sun 2022	To detect fetal nucleated red blood cells (fNRBCs)	4,760 pictures of fNRBCs from 260 cell-slide from umbilical cord blood samples	K-nearest neighbor, support vector machine, CNN	None declared	Accuracy: 98.5 %, sensitivity: 96.5 %, specificity: 100 % for CNN.	None	No external validation	Model for fast recognition of fNRBC
Analytical	Yu 2019	To verify analytically acceptable MS results using ML	1,267 urine samples of 11-nor-9-carboxy-delta-9-tetrahydrocannabinol	AdaBoost, decision tree, K-nearest neighbors, logistic regression, random forest, and SVM	Python, Package: Scikit-learn	Precision 81 %, recall 100 %, and F1 score 90 % for SVM.	Indirect: The impact of features was assessed via subsets g and AUC-based performance evaluation.	No external validation	ML model reduced manual review requirement by about 87 %.
Analytical	Zhou 2022	To build patient-based real-time Quality Control using machine learning (ML-based QC)	1,195,000 patient result	Random Forest (RF)	None declared	Albumin at critical bias showed an AUC of 0.985, accuracy of 75 %, sensitivity of 71.3 %, specificity of 99.6 %, and FPR of 0.45 %.	Indirect: The clinical effectiveness of ML-based QC was evaluated.	Artificial error data to validate the model	ML-based QC was found superior to PBRTQC
Analytical	Çubukçu 2021	To integrate conventional quality control (QC) rules, exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) in a machine learning model.	170,000 simulated QC results	Random forest	Python, sklearn	RF model: false rejection probability 0.0048, highest error detection rate for errors <1 SD	The most predictive features in terms of feature importance were CUSUM and EWMA.	Absence of the multi rules performance evaluation, no real-world implementation	RF model showed an acceptable probability of error detection for most degrees of error.
Post-analytical	Aguirre 2022	To develop machine learning algorithms based on cell population data for sepsis prediction at the Emergency Department (ED).	698 patient results//CBC differentials, (cell population data research parameters: morphological features of Neu, Lym, Mon) WBC, N/L ratio	XGBoost (XGBOOS), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Multi-layer Perceptron (MLP), Naive Bayes (NB), and K-Nearest Neighbors (K-NN).	Python, Packages: scikit-learn. R for statistics	MLP achieved an AUC of 0.95, with an accuracy range of 0.86–0.87 and precision ranging from 0.84 to 0.73.	SHAP values were reported.^{MA, G}	No external validation	ML and AI models successfully enabled the early detection of sepsis.
Post-analytical	Anudeep 2022	To perform LDL-C estimation based on HDL-C levels, total cholesterol, and triglyceride levels.	13,391 specimens of lipid profile including triglycerides (TG), total cholesterol (TC), HDL-C, and LDL-C.	Random forests (RF), XGBoost, support vector regression (SVR)	Python, Package: scikit learn	RF showed a strong correlation (r=0.98) with direct LDL-C, 92 % accuracy for ATP III classification, and a mean absolute difference of 3.12.	Coefficients for the linear regression were given^INT.	Lack of external validation, clinical characteristics of a population, the complexity of models	XGBoost and random forest models showed superior performance compared to six commonly used LDL-C calculating formulas for predicting LDL-C.
Post-analytical	Bancal 2023	To achieve the most accurate prediction of calcium status, various markers were combined with ionized calcium.	7,047 patient records (Ionized Ca, total Ca, corrected Ca values, arterial pH, and albumin)	Random forest regression	R Package: caret	RF accuracy was 0.81, and sensitivities for hypocalcemia, normocalcemia, and hypercalcemia were 0.81, 0.80, and 0.90, respectively, with corresponding PPVs of 0.88, 0.74, and 0.65.	Indirect: PCA was used to investigate the associations between ionized calcium and variables.	No external validation	ML achieves a concordance rate of 81 % with ionized calcium, unaffected by common pathological conditions such as hypoalbuminemia, acid-base disorders, renal insufficiency, phosphatemia, and inflammation.
Post-analytical	Barakett-Hamade 2021	To predict levels of LDL-C	31,922 results of lipid profile (non-HDL-C, TG, and LDL-C) gender, age, and sampling hour	K-Nearest Neighbors (KNN)	SPSS Version 26.0	KNN (overall TG levels ICC with measured LDL 0.925, Bland Altman >upper limit 3.1 %, <lower limit 0 %)	Feature importance was reported.	No external validation	The ML algorithm shows better agreement with LDL-D in comparison to the commonly used equations, particularly in cases of mild and severe hypertriglyceridemia.
Post-analytical	Barnhart-Magen 2013	To create an artificial neural network-based screening method for diagnosing thalassemia minor (TM) patients.	526 patients with their CBC parameters, comprising of α and β thalassemia minor cases, along with a control group of patients with iron-deficiency anemia, myelodysplastic syndrome, and healthy individuals	ANN	None declared	TM prediction using MCV, RDW, and RBC: TM vs. control – sensitivity 1, specificity 0.958, PPV 0.957, NPV 1. TM vs. control (MDS, IDA) – sensitivity 0.902, specificity 0.968, PPV 0.971, NPV 0.895.	None	No external validation	ANN model had the potential to reduce cost and increase accuracy in diagnosing TM patients.
Post-analytical	Bayani 2022	To predict grades of esophageal varices	490 cirrhosis patients and their dataset consisted of 26 routine laboratory parameters (including CBC parameters, bilirubin, AST, ALP, PT, INR, albumin, K, Cr, Na) as well as clinical data.	Ensemble learning methods, including Catboost and XGB classifier	Python, Package: None declared	CatBoost (Prec 1, Recall 1, accuracy 1, mean squared error 0.0314)	Detailed feature importance analysis revealed that the Child score, WBC, INR, and vitamin K level were predictive factors.	No external validation, small sample size	ML model effectively predicted EV grades in patients with cirrhosis, which can help clinicians avoid unnecessary procedures and improve predictions.
Post-analytical	Bayani 2022.2	To predict esophageal varices grades	490 patients (routine laboratory (CBC parameters, bilirubin, AST, ALP, PT, INR, albumin, K, Cr, Na) and clinical data)	SVM, logistic regression, RF, and ANN	Python	Random forest: average ROC curve AUC of 0.99	None	Small sample size, no external validation	A highly accurate non-invasive approach (ML) is employed for predicting the occurrence of esophageal varices (EV) in patients with liver cirrhosis.
Post-analytical	Bigorra 2022	To assist in the diagnosis of lymphocytosis, using a machine learning (ML) model based on complete blood count (CBC) parameters	1,565 samples were collected, including population parameters such as age and sex, as well as CBC parameters	RF, DT, naive Bayes classifier (NBC), KNN, SVM, and ANN	Package: Scikit-Learn	ANN achieved a global weighted accuracy of 95.8 % when classifying normal controls, benign, neoplastic, and spurious cases.	None	No external validation	Cost-effective model for lymphocytosis diagnosis with high accuracy
Post-analytical	Bigorra 2020	To aid diagnosis of polyclonal B-cell lymphocytosis (PPBL)	211 specimens from 101 normal controls and 110 patients with PPBL and SMZL. The collected data comprised age, gender, CBC parameters, flags, and CellaVision differentials.	DT, KNN, NBC, NN, RF, SVM	Python, Package: Scikit-Learn	NBC achieved an accuracy of 93.4 %, with a precision of 94.0 %, a recall of 93.0 %, and an F1-score of 94.0 %.	None	Small sample size	ML model was developed for the detection of PPBL.
Post-analytical	Cabitza 2020	To predict COVID-19 using. Routine blood tests	1,624 patients were included in the study, with data collected on age, gender, CBC parameters, CO-Oxymetry values, clinical chemistry markers, and coagulation parameters.	NB, KNN, LR, RF, SVM	Python, Package: scikit-learn	KNN and RF models achieved AUCs of 0.75–0.78 and external validation specificities of 0.92–0.96.	Feature importance was reported^{MA, G}	None	ML models were developed to identify COVID-19 through routine blood tests.
Post-analytical	Cadamuro 2023	To evaluate and interpret laboratory results with case-based scenarios	10 simulated laboratory reports were evaluated	Natural language processing (NLP) ChatGPT	NA	Regarding rating from 1 (very low) to 6 (very high): relevance 5–6, correctness 4–5, helpfulness 3–4, safety 5–6.	NA	NA	This study showed the ability of ChatGPT for laboratory result interpretation.
Post-analytical	Chocholova 2018	To differentiate between rheumatoid arthritis patients who are seropositive and seronegative.	Data from 31 seropositive patients, 16 seronegative patients, and 53 controls were collected, including RA markers and glycan analysis.	ANN	Matlab	Discrimination accuracy of 92.5 %.	None	Small sample size	ANN model was built to classify seropositive and seronegative rheumatoid arthritis (RA) patients.
Post-analytical	Demirci 2016	To create a decision algorithm model for reporting the results of biochemistry tests.	1,847 samples in the train set and 7,054 samples in the test set, including laboratory results, delta check values, HIL index, and age.	ANN	Weka software	Sensitivity of 92.2 % and specificity of 99.6 %.	None	Absence of internal quality control and calibration data	ANN model to evaluate and report medical test results
Post-analytical	Dobrijević 2023	To distinguish between SARS-CoV-2 and RSV infections in infants for differential diagnosis.	77 infants’ complete blood count, recalculated parameters (ratios), and CRP levels were examined.	Decision tree algorithms (e.g., random forest, optimized forest model)	WEKA version 3.8.6	Random forest: Accuracy 81.8 %, optimized forest: Sensitivity 72.7 %, specificity 88.6 %, PPV 82.8 %, NPV 81.3 %.	Reported as decision tree^INT	Small sample size, no external validation	The decision-making process for differentiating between SARS-CoV-2 and RSV in newborns was improved by the ML model.
Post-analytical	Fan 2022	To create a machine learning (ML) approach for estimating LDL-C.	Data from 111,448 individuals including demographic information (age, gender) and lipid profile (LDL-C, HDL, TG, TC)	Bagging random forest, M5P tree, M5Rules, Random Committee Multilayer Perceptron	Auto-WEKA	The MAD and RMSE values for the Bagging M5Rules and ML models were lower than those for the LDL-C equations.	Feature selection methods were utilized to select predictive features.	Lack of reference method measurement. Clinical data is unavailable	In comparison to other LDL-C equations, ML models exhibited lower bias.
Post-analytical	Feng 2022	To develop an ML model that utilizes RBC parameters to distinguish α-thalassemia carriers among patients with low HbA2 levels.	1,213 patients with low HbA2 were included, and their demographic and hematological parameters (age, gender, pregnancy status, Hb, Hct, RBC, MCV, MCH, RDW, HbF, HbA, HbA2) were collected.	14 models including random forest	R	The random forest model achieved an AUC of 0.948, specificity of 0.967, accuracy of 0.915, PPV of 0.942, and NPV of 0.901 on the external validation dataset.	Weights of features in the RF model were reported	None declared	The RF model efficiently distinguishes α-thalassemia carriers from patients with low HbA2 levels.
Post-analytical	Gui 2023	To assess the potential of volatile organic compounds (VOCs) as novel diagnostic biomarkers for perihilar cholangiocarcinoma (PHCCA) in bile samples.	200 bile specimens from PHCCA and BBD patients were analyzed for 19 VOCs.	DT, KNN, linear discriminant analysis (LDA), partial least-squares discriminant analysis (PLS-DA), and SVM	R and Matlab, Packages: pheatmap, ggord	SVM achieved a sensitivity of 93.1 %, a specificity of 100 %, and an AUC of 0.966.	None	Some unknown substances were not included in this study.	ML models utilizing VOCs can assist in the diagnosis of PHCCA.
Post-analytical	Han 2021	To distinguish benign from malignant breast lesions without invasive procedures.	102 healthy women, 158 patients with benign breast lesions, and 173 with malignant breast lesions (plasma cell-free DNA (cfDNA) data)	SVM	R Package: e1071	The SVM model achieved an AUC of 0.777 for identifying benign breast lesions and an AUC of 0.824 for identifying malignant breast lesions.	None	The classifier’s accuracy is not sufficient for practical clinical use.	A noninvasive method utilizing cell-free DNA can be used to differentiate between malignant and benign breast lesions.
Post-analytical	Hatami 2022	To forecast ascites grades in cirrhotic patients	492 subjects with cirrhosis (routine laboratory and clinical data)	KNN, SVM, random forest, and ANN	Python	KNN achieved an accuracy of 94 %.	None	No external validation, small sample size, insufficient number of data for grade 0 ascites	ML models were developed to predict ascites grades.
Post-analytical	Hauser 2021	To predict chronic myelogenous leukemia (CML) using blood cell counts	1,623 patients with BCR-ABL1 (laboratory results (CBC parameters and differentials), patient demographics (age and sex), and clinical information)	XGBoost, least absolute shrinkage and selection operator (LASSO)	R Packages: xgboost, glmnet	AUC values: 2–5 years (0.59–0.67), 0.5–1 year (0.75–0.80), at diagnosis (0.87–0.92).	Relative feature importance (as “Gain” values) was calculated^INT	No external validation, rare incidence (6.2 %) of CML in the study data, higher male gender in the study population	ML models using blood cell counts can aid the diagnosis of CML earlier in the disease course.
Post-analytical	He 2021	Prediction of Down syndrome in second trimester antenatal screening using ML	58,972 pregnant women, including 49 Down Syndrome (DS) cases (biological markers (uE3, AFP, and free ß-hCG), along with weight, maternal and gestational age)	RF	Python, Package: Scikit-learn	The model achieved an 85.2 % detection rate for DS while ensuring a false positive rate of 5 %.	Feature importance weight was given using Gini values^INT uE3 MOM, free B-HCG MOM most predictive features.	All of the screening information was obtained from the Han people.	The RF model enhanced the detection rate of DS.
Post-analytical	Hu 2021	To evaluate the diagnostic value of clinical indexes and urine polypeptide research in Gestational Diabetes Mellitus (GDM)	78 GDM patients, 30 normal pregnant women (serum TG, HDL-C, fasting plasma glucose (FPG) and glycosylated hemoglobin (HbA1c), and 7 GDM-related urinary polypeptides)	Multiple logistic regression equation, multilayer perceptron neural network model, radial basis function, and discriminant analysis function models	SPSS 22	Multilayer perceptron neural network: AUC 0.942.	Feature importance was reported	Small sample size, no external validation	ML model predicted GDM using blood glucose, blood lipid, and urine polypeptides.
Post-analytical	Hu 2023	To create a CNN-based system that can recognize pictures from immunofixation electrophoresis (IFE) automatically.	12,703 IFE images annotated by experts	Convolutional neural networks	Python, Packages: PyTorch, PyMIC	The average accuracy of 99.82 %, sensitivity of 93.17 %, and specificity of 99.93 %.	The model’s prediction was visually explained using score-based class activation map^{MS, L}	Excluded heavy chain-positive patterns and limited to IFE images from two imaging systems in a single hospital.	AI technology recognized IFE photos automatically with human-level performance.
Post-analytical	Janssens 2022	To reduce the synthetic cannabinoid receptor agonist (SCRA) screening workload by automating the interpretation of the activity-based screening output	968 serum samples	Random forest	Python, Packages: tsfresh, Scikit-learn	For a threshold of 0.055, the sensitivity and specificity were 94.0 %.	Given as decision tree^INT	More false positives than experts scoring	An ML model could accurately and consistently identify circulating SCRAs.
Post-analytical	Kurstjens 2022	To evaluate the risk of low body iron stores, as indicated by low levels of plasma ferritin.	The dataset consisted of 12,009 records, including CBC parameters and CRP results	Random forest	Python, scikit-learn	Random forest AUC 0.90–0.92	Feature importance was given	ML model was developed using anemic primary care adult patients.	A machine learning model predicting low ferritin levels was developed.
Post-analytical	Lafuente-Ganuza 2020	To determine the presence or absence of early-onset pre-eclampsia (PE).	630 pregnant women (NT-proBNP, PlGF, and sFlt-1 results)	Decision tree (DT) and random forest (RF)	Python, Package: scikit-learn	The NPV for early-onset pre-eclampsia is 100 %, while the PPV is 87 %.	Feature contributions for random forest models were given^INT	Cut-off values apply only to Elecsys Immunoassays.	Superior NPV and PPV for early-onset PE compared to conventional methods.
Post-analytical	Lee 2019	To use deep neural networks (DNN) to improve LDL-C estimation.	19,332 participants with recorded total cholesterol, HDL-C, LDL-C, and TG results.	Deep neural network (DNN)	Python, Platform: Tensorflow	The DNN model achieved a mean squared error ranging from 59.6 to 69.4.	None	Performance-based on clinical decision limits was not provided	DNN model outperformed other conventional formulas
Post-analytical	Lee 2022	To predict Sjogren Syndrome with ML	178 samples (primary Sjogren Syndrome, rheumatoid arthritis, secondary Sjogren Syndrome) with anti-MDA-modified peptide adducts autoantibodies levels.	Random forest	WEKA Package: scikit-learn	Sensitivity 93.7 %, specificity 84.4 %, accuracy 88.0 %.	Odds ratios were given for logistic regression^INT	Small sample size, no external validation	ML model was developed to discriminate primary Sjogren patients from other subjects.
Post-analytical	Lin 2022	To detect dyscalcemia using AI enabled-ECG method	121,848 instances consisting of 12-lead ECG traces signals and albumin-adjusted calcium (aCa) values.	Deep learning	R	The area under the curve (AUC) for hypercalcemia was 0.8948, while for hypocalcemia it was 0.7723.	Relative feature importance as “Gain” values of the XGB model were reported^INT	ECG-aCa has a low positive predictive value of 4.5 % for predicting hypercalcemia.	Tool to detect severe dyscalcemia for early diagnosis
Post-analytical	Luo 2016	To estimate ferritin results using other test results	A dataset of 5,128 entries containing clinical and routine laboratory data.	Random forest regression, Bayesian linear regression, and lasso regression, logistic regression (for classification)	Python Package: Scikit-learn	High accuracy (AUC: 0.97) in discriminating normal and abnormal ferritin results.	Indirect: Univariate associations were given	Numerical ferritin results were moderately accurate	ML model was built to predict ferritin status.
Post-analytical	Meng 2023	To create a predictive model to identify the early signs of seroconversion to positive thyroid autoantibodies.	Dataset of 26,549 individuals (anti-TPO and anti-Tg antibodies, liver function, kidney function, biochemistry, CBC results)	Logistic regression model	R Package: RMS	AUC of 0.838	Odds ratios were given for logistic regression^INT	None declared	An ML model was developed as an early warning system for the conversion of anti-TPO and anti-Tg antibodies.
Post-analytical	Mo 2023	To predict thalassemia using red blood cell indices	8,693 records (CBC parameters and genetic test results for thalassemia)	Deep neural network	Python Platform: TensorFlow	AUC: 0.960, accuracy: 0.897, Youden’s index: 0.794, F1 score: 0.897, sensitivity: 0.883, specificity: 0.911, PPV: 0.914, NPV: 0.882	The importance of features was assessed using feature subsets	Lack of external validation	DNN model outperformed the traditional screening model.
Post-analytical	Monaghan 2022	To classify and differentiate acute leukemias from nonneoplastic cytopenias.	531 patients with cytopenias and/or acute leukemia (37 flow cytometry (FC) parameters)	Gaussian mixture model, Fisher kernel methods, and SVM	Python Package: scikit-learn	Accuracy: 94.2 %, AUC: 99.5 %	Feature selection was performed.	No external validation	ML tool to detect acute leukemias using FC parameters
Post-analytical	Ng 2021	To diagnose B-cell malignancies with ML using FC parameters	3,417 blood samples, including both B-cell malignancies and healthy individuals (FC parameters	Random forest classification	Python Package: scikit-learn	The model classified B-cell malignancies with a sensitivity of 83.57 %, specificity of 99.26 %, PPV of 96.69 % and NPV of 95.87 %, and an accuracy of 96.02 %.	None	No external validation	ML model to detect B-cell malignancies using FC parameters
Post-analytical	Ng 2015	To diagnose classical Hodgkin Lymphoma (cHL) by ML using FC data	144 clinical cases (FC parameters)	Random forest, gradient boosting, and SVM	Python Package: scikit-learn	The SVM model achieved an AUC of 0.96, an accuracy of 0.95, a sensitivity of 1, and a specificity of 0.91.	The importance of features was assessed using feature subsets	No external validation	ML model to aid in the identification of cHL
Post-analytical	Peña-Bautista 2019	To detect Alzheimer’s Disease (AD) with ANN using lipid peroxidation constitutes	96 participants (70 early AD, 26 healthy controls) (urine and plasma lipid peroxidation constitutes)	Linear discriminant analysis (partial least squares, PLS) and non-linear discriminant analysis (ANN, SVM)	SPSS 20	ANN achieved an accuracy of 0.882, with a sensitivity of 88.2 % and a specificity of 76.9 %.	None	Small sample size, no external validation	ANN model to detect Alzheimer’s Disease using lipid peroxidation markers.
Post-analytical	Rashidi 2021	To create models for predicting AKI utilizing an automated ML technique using creatinine, NGAL, and/or urine output (UOP)	125 adult individuals with burn injuries or trauma unrelated to burns (NGAL, creatinine, and UOP)	Automated Machine Intelligence Learning Optimizer (MILO) (LR, NB, KNN, SVM, RF, XGBoost, and DNN)	Automated Machine Intelligence Learning Optimizer (MILO)	The logistic regression model achieved an accuracy of 96 %, a sensitivity of 92.3 %, a specificity of 97.7 %, and an AUC of 0.96.	Odds ratios were given for logistic regressioın^INT	Small sample size	Machine learning enhanced the predictive performance of biomarkers
Post-analytical	Reix 2019	To develop a therapeutic decision tree model using uPA/PAI-1 for breast cancer care	315 women diagnosed with breast cancer (Tumor size, nodal status, histological grade, ER and PR-H score, Ki 67, VI, uPA/PAI-1 levels, age, and comorbidities)	Decision tree	R Package: rpart	The agreement between the therapeutic recommendations of the decision tree and the actual treatment ranged from 75 to 100 %.	Given as decision tree^INT	Small sample size	The decision tree based on uPA/PAI-1 aided in making therapeutic decisions.
Post-analytical	Rigo-Bonnin 2022	To forecast outcomes of patients with COVID-19	326 COVID-19 patients in critical condition with recorded demographics, comorbidities, laboratory variables, symptoms, and hospital stays.	ANN and binary logistic regression (BLR)	SPSS Statistics 21.0	ANN AUC 0.917, NPV 95.9 %	Odds ratios were given for logistic regressioın^INT	Small sample size, no external validation	ML predicted COVID-19 patient outcomes using ICU admission on the first day.
Post-analytical	Sans 2019	To develop a portable and biocompatible device connected to a mass spectrometer combined with ML to detect ovarian cancer	192 variety of small metabolites’ levels of Fallopian tube, ovarian, and peritoneum tissue specimens	Lasso classification model	R	For high-grade serous carcinoma, sensitivity was 96.7 % and specificity was 95.7 %. For overall cancer, sensitivity was 94.0 % and specificity was 94.4 %.	Features selected by Lasso regression analysis	Small sample size	A handheld device coupled with an ML model offers rapid and accurate ovarian cancer diagnosis.
Post-analytical	Shang 2022	To predict seroconversion of HBeAg	260 patients with chronic hepatitis B (CHB) (laboratory and clinical variables)	KNN, SVM, DT, RF, gradient boosting (GB), XGBoost, NB, LR.	R, Packages: caret, Boruta, glmnet, pROC, VennDiagram, and MLeval	XGBoost achieved an AUC of 0.910	Variable importance was reported for the XGBoost model^MS, G	No external validation, small sample size	ML model predicted HBeAg seroconversion in HBeAg-positive patients with CHB undergoing treatment.
Post-analytical	Simonson 2022	To predict additional panels needed to differentiate between chronic lymphocytic lymphoma and mantle cell lymphoma.	A total of 9,635 cases with flow cytometry data.	Convolutional neural networks	Python, Packages: CSparser, sklearn, TensorFlow	Accuracy 94 %, AUROC 89 %, recall 78 %, F1 score 0.62, precision 51 %.	SHAP values were given^{MA, G}	Relatively low PPV	Enhanced efficiency and consistency in the laboratory workflow for requesting additional antibody panels by utilizing a CNN model.
Post-analytical	Simonson 2021	To detect classic Hodgkin lymphoma using two-dimensional (2D) histograms of flow cytometry data	A dataset consisting of flow cytometry data from 1,222 samples.	Convolutional neural networks	Python, Packages: fcsparser, sklearn, tensorflow	The EnsembleCNN classifier achieved an accuracy of 88.2 %, precision of 82.4 %, recall (sensitivity) of 67.7 %, and F1 score of 74.3 %, with an AUC of 0.92.	SHAP values were given^{MA, G}	No external validation	CNN model to identify cell populations for cHL
Post-analytical	Soerensen 2022	To use standard blood tests to identify people at risk of cancer	6,592 patients with 25 routine laboratory blood tests and cancer diagnoses within a 730-day follow-up.	Random Forest, ANN	SAS	ANN achieved an AUC of 0.79.	None	Relatively small sample size, no external validation	A simple risk score for predicting cancer within 90 days was generated by the ML model.
Post-analytical	Streun 2022	To screen synthetic cannabinoids (SCs) based on metabolome using a machine learning algorithm.	474 urine samples were analyzed for metabolite levels.	Random forest	R Package: randomForest	The model correctly classified 88 % of the test set, with 80 % of positive samples and 96 % of negative samples.	Feature selection was performed using ROC curves and feature importance analysis.	Small sample size	The combination of the random forest (RF) approach and metabolomics introduces a new screening strategy for novel SCs.
Post-analytical	Stroek 2023	To improve the PPV of the newborn congenital hypothyroidism screening	The dataset consists of 4,668 newborn screening data including age at NBS sampling, gestational age, TSH, T4, TBG, and T4/TBG ratio	Random forest	R Package: Caret	Specificity: 62 %, Sensitivity: 100 %, accuracy: 68 %, PPV: 26 %	Feature importance weight was determined using Gini values and decreasing accuracy.	Incomplete dataset	ML model increased the PPV value for congenital hypothyroidism from 21 to 26 %
Post-analytical	Su 2020	To predict cardiovascular diseases (CVD)	498 subjects (laboratory and clinical data)	Random forest, logistic regression	R	AUC of 0.802 for the random forest model.	Odds ratios were given for logistic regressioın^INT	No external validation, small sample size	Tool for the early prediction of CVD
Post-analytical	Su 2022	To predict urosepsis at an early stage.	574 subjects (patients with urinary tract infection and patients with urosepsis) with laboratory data including procalcitonin, C-reactive protein, and D-dimer	KNN, SVM, RF, ANN, LR, and naive Bayes	Python	ANN; accuracy 92.9 %, AUC 0.946	Feature selection was performed using Gini, LASSO, Ridge	The small sample size for urosepsis, no external validation	ANN model for predicting urosepsis
Post-analytical	Sun 2023	To detect intracranial aneurysm (IA) rupture with ML using plasma metabolic profiles	105 participants (IA patients and healthy subjects) (metabolomic data)	LASSO, random forest, and logistic regression	R Packages: glmnet, varSelRF	Logistic regression? AUC 0.929	Indirect: Log2 Fold change values were given	Small sample size, no external validation	Non-invasive IA risk assessment and diagnostic tool.
Post-analytical	Tang 2022	To develop ML methods in conjunction with changes in salivary glycopatterns to diagnose hepatocellular carcinoma	203 saliva samples (lectin microarray results for salivary glycopaterns)	RF, SVM, and LASSO	R	Random forest achieved an AUC of 0.886 for HCC diagnosis.	Gini values were calculated for each feature in the RF model^INT	Small sample size	ML model using salivary glycopatterns as a diagnostic tool for HCC diagnosis
Post-analytical	Topcu 2022	To estimate urine osmolality using an AutoML tool	300 urinalysis samples (urinalysis parameters)	H2O AutoML (generalized linear model (GLM), default random forests (DRF), gradient boosting machine (GBM), deep neural networks, extremely randomized tree (XRT))	R	The R2 value ranged between 0.70 and 0.83, and around 70–84 % of the results were within the agreed limit.	Permutation feature importances were given^MA, G	No external validation	ML models for estimating urine osmolality
Post-analytical	Binson 2021	To identify lung cancer and chronic obstructive pulmonary disease (COPD) using chemical gas sensor array-based electronic-nose device using ML	199 participants: 55 COPD, 51 lung cancer, and 93 controls. (VOCs results in exhaled air)	XGBoost, AdaBoost, and random forest	Matlab R2020b	XGBoost, classification accuracy of 79.31 % for lung cancer, 76.67 % for COPD	Feature selection was performed before model development	Small sample size, no external validation	Detection of lung cancer and COPD using a portable device with ML
Post-analytical	Van Woensel 2021	To establish an AI-based reflex protocol to identify pituitary dysfunction	875 patient cases (initial test results, reordered laboratory test results, clinical information)	Semantic Web technology	Apache Jena	Concordance with the laboratory clinician was 92 %	Indirect: Criteria were given.	Retrospective nature	The AI-based protocol can detect pituitary dysfunction at a low cost
Post-analytical	Vogg 2023	To distinguish adrenocortical carcinoma from adrenocortical adenoma by ML using a urinary steroid profile	352 patients with adrenal tumors (Eleven steroids detected by LC-MS/MS)	Decision tree strategy and random forest	R Packages: ctree, partykit	NPV 100 %, PPV 87.5 %	A decision tree was provided^INT	Small sample size	ML model using LC-MS/MS data for ruling out adrenocortical carcinoma
Post-analytical	Wang 2020	To set an autoverification system with ML	3,756,239 records (demographic information and test results)	KNN, Naïve Bayes, Xgboost, and RF	Python, Package: Scikit-learn	Ensemle model using top three models: 89.60 % passing rate, FNR 0.095 %	None	Lack of clinical information	ML-assisted autoverification system outperformed rule-based system and reduced workload
Post-analytical	Wang 2021	To differentiate benign prostate hyperplasia (BPH) and prostate cancer (PCa)	79 subjects with BPH or PCa with GC-MS-based metabolite data	SVM	SIMCA-P 14.1	The combination of the three-marker panel increased the AUC values for cPSA and tPSA to 0.781 in diagnosing PCa.	Indirect: The SVM model revealed the importance of metabolites	Small sample size, no external validation	ML using metabolomic data revealed that myoinositol, L-serine, and decanoic acid could be possible biomarkers for separating PCa from BPH.
Post-analytical	Wilkes 2018	To interpret urine steroid profiles	1,314 urine steroid profiles	RF, weighted-subspace RF, and Xgboost	R	Weighted-subspace RF achieved discrimination performance with an AUC of 0.955 for abnormal vs. normal classification, and an AUC of 0.873 for multiclass classification.	Boruta feature selection was performed before model development	No external validation	ML model for automated interpretation of urine steroid profiles
Post-analytical	Wilkes 2020	To interpret plasma amino acid (PAA) profiles with ML	2084 plasma amino acid (PAA) profiles	Xgboost RF, and weighted-subspace RF	R Package: caret	XGBT demonstrated an AUC of 0.953 for abnormal vs. normal classification, while an ensemble of three ML models achieved an AUC of 0.957. In the EQA scheme, 8 out of 9 interpretations were correct.	Detailed feature selection was performed before model development	No external validation?	ML model to aid the interpretation of PAA
Post-analytical	Wu 2022	To develop ML based diagnostic tool for encapsulating peritoneal sclerosis (EPS) using microRNA testing	142 effluents consisting of 62 EPS samples and 80 non-EPS samples, with miRNA results.	AdaBoost, Multiple logistic regression, DT, gradient tree boosting, RF	Sigma plot software	Random forest achieved a sensitivity of 100 % and specificity of 88.9 %.	Indirect: Log10 Fold change values were given	The small sample size and functions of selected miRNAs were unknown	ML-based diagnostic tool for EPS using microRNA testing
Post-analytical	Yang 2020	To predict SARS-CoV-2 infection using machine learning	3,356 subjects with information on demographic features (age, sex, race), 27 routine laboratory results, and RT-PCR results.	DT, gradient boosting DT (GBDT), RF, Logistic regression	Python Package: Scikit-learn	The GBDT model achieved an AUC of 0.838, sensitivity of 0.758, and specificity of 0.740 on an independent dataset.	SHAP values were given^{MA, G}	Only severe cases were included in the study	ML model using routine laboratory tests was developed for the detection of SARS-CoV-2 infected patients
Post-analytical	Yang 2021	To detect ovarian cancer with ML model using carcinoembryonic antigen (CEA) and salivary mRNAs	280 subjects (140 patients, 140 healthy controls), 120 subjects for external validation (60 patients, 60 controls) (CEA and salivary mRNA biomarkers)	Decision tree algorithm	Matlab Packages: fitctree, predict	ML model achieved a sensitivity of 85 % and a specificity of 88.3 %	Indirect: mRNA levels were compared between groups.	Small sample size	ML model using CEA and salivary mRNA biomarkers could detect ovarian cancer.
Post-analytical	Yang 2022.2	To develop a diagnostic model with ML using plasma lipidomics data for colorectal cancer diagnosis	99 subjects (49 CRC patients, 50 healthy controls) (Metabolomics data)	SVM, KNN, partial least squares (PLS), RF	R Package: caret	The SVM model achieved an accuracy of 100 % and a kappa score of 1.000.	Indirect: Recursive feature elimination (RFE) was utilized for ranking features	Small sample size, no external validation	A diagnostic model with ML using plasma lipidomics data to detect colorectal cancer
Post-analytical	Zheng 2021	To diagnose COPD using serum metabolic biomarkers	54 patients with COPD and 74 normal individuals, and their serum metabolites	Least-squares SVM	Matlab, Package: LS-SVM toolbox	Polynomial LS-SVM AUC 0.90, accuracy 84.62 %	PLS-discriminate analysis was used to derive variable importance	No external validation, small sample size	ML model using serum metabolites for diagnosis of COPD
Post-analytical	Zheng 2017	Predictive diagnosis of the major depression	72 depressive patients and 54 healthy subjects (NMR spectroscopy data of metabolites)	Least-squares SVM	Matlab, Package: LS-SVM toolbox	The LS-SVM model achieved an AUC of 0.96.	None	No external validation, small sample size	LS-SVM-RBF using metabolites can aid major depression diagnosis.
Post-analytical	Constantinescu 2022	To integrate machine learning algorithms, mass spectrometry-based steroidomics, and LIMS to automate the interpretation of plasma steroid profiles in patients with	Plasma steroidomics data of 22 hypertension and adrenal adenoma patients (plasma steroid profiling)	Linear Discriminant Analysis (LDA), SVM, and RF	Matlab	Primary aldosteronism (PA) probabilities ranged from 89 to 100 % (median 99 %) in PA patients, and from 2 to 90 % (median 21 %) in non-PA patients.	None	Performance characteristics were not reported with conventional metrics. No external validation	ML-based steroidomics models demonstrated diagnostic utility in patients with PA.
Post-analytical	Çubukçu 2022	To create a clinical decision support tool to aid in the diagnosis of COVID-19	Laboratory data of clinical chemistry and complete blood count parameters from a total of 1,391 patients	SVM, XGBoost, RF	Python Package: Scikit learn	Random forest achieved a specificity of 91.2 %, a sensitivity of 79.6 %, and an accuracy of 85.3 %.	Boruta feature selection was performed before model development	Absence of vaccinated subjects, lack of certain SARS-CoV2 variants	The study provided machine learning models as tools to support clinical decision-making in COVID-19 cases, aiding physicians in their clinical judgments.
Post-analytical	Çubukçu 2022	To estimate LDL-C using machine learning models	Laboratory data of 59,415 samples (total cholesterol, HDL-C, LDL-C, and TG)	Gradient-boosted trees, ANN, Linear regression	KNIME Analytics Platform, R, Python	For TG 177–399 mg/dL and LDL-C < 70 mg/dL, the ANN model demonstrated a sensitivity of 67.81 %, PPV of 73.33 %, specificity of 98.69 %, and an F-score of 70.46 %.	Linear regression coefficients were given^INT	No external validation	The ANN, gradient-boosted trees, and linear regression models showed superior performance compared to traditional formulas.
Post-analytical	Dabla 2022	To evaluate sick children admitted to the pediatric emergency department (ED) and discover novel patterns in their clinical and laboratory attributes.	158 children (51 clinical and laboratory parameters)	Association rule mining (Hotspot algorithm)	Not reported	NA	Rules extraction^INT	Small sample size	The study offered a tool for the management of pediatric patients in an emergency setting
Pre&Post-analytical	Benirschke 2020	To develop a predictive model for falsely increased point-of-care (POC) whole-blood potassium (K) results.	3,489 results of patients (sex, age, Na, K, Cl, CO2, GFR, Creat, iCa, and BUN)	a multivariate logistic regression model	R Packages: Tidyverse, readxl, lubridate, mcr, caret, ROCR, and pROC	Logistic regression (AUC 0.995, sensitivity of 88.2 %, and a specificity of 96.4 %)	Significant contributors for logistic regression were given (K, CO2, Creat, iCa, and sex)^INT	No external validation used core laboratory K as input.	ML model to detect laboratory errors and to alert for suspicious K results of POC.
Pre&Post-analytical	Chabrun 2021	To utilize deep learning techniques to achieve expert-level interpretation of serum protein electrophoresis (SPE).	The dataset consisted of 159,969 entries for SPE	4 different neural network models	Python, R	M-spike detection AUC 0.96 (external test set) Classification: Accuracy 88.1 % (external test set) Hemolysis detection: AUC 0.95	Indirect: Expert-system based approach was utilized	Models have not yet been validated by a regulatory authority, the lack or incompleteness of annotations	The deep learning model for high-throughput SPEs analyses and interpretation.
Total testing process	Tsai 2022	To predict turnaround time (TAT)	90,543 clinical chemistry samples (TAT)	Ridge Regression, Extra Trees (ET) Regressor, and K Neighbors Regressor were included in the PyCaret (AutoML) framework	Python Package: PyCaret	The ET Regressor model achieved an R2 score of 0.63, with a mean absolute error of 2.42 min and a mean absolute percentage error of 7.35 %.	SHAP values were given^{MA, G}	Right-skewed data	ML model to predict TAT

G, global; INT, intrinsic explainability; L, local; MA, model-agnostic; MS, model-specific.

Machine learning in the pre-analytical phase

The integration of AI and ML methodologies has witnessed a remarkable rise in recent years, finding diverse applications in the field of laboratory medicine. These advanced techniques have demonstrated significant potential in various domains of the pre-analytical phase. This section aims to provide a comprehensive summary of noteworthy articles that address these specific applications within the aforementioned context.

Clot detection

In a study conducted by Fang et al., the identification of clotted specimens was explored through the analysis of coagulation test results [23]. Utilizing standard and momentum backpropagation neural networks (BPNNs), the researchers developed a high-performance model, achieving an impressive area under the curve (AUC) of 0.971, accuracy of 0.953, specificity of 0.967, and sensitivity of 0.940. These findings underscore the potential of ML in accurately identifying clotted samples based on coagulation test results [23].

Specimen mix-up – wrong blood in tube error detection

In their study, Farrell et al. conducted research to detect mislabeled samples [24]. To accomplish this, they employed a range of ML methodologies, including decision trees, random forest, artificial neural network (ANN), k-nearest neighbors, extreme gradient boosting, support vector machine, and logistic regression. Notably, the ANN model yielded impressive results, achieving an accuracy of 92.1 % and an area under the curve (AUC) of 0.977. These findings unequivocally demonstrate the superiority of ML algorithms over human capabilities in accurately identifying incorrectly labeled samples [24]. Furthermore, Farrell et al. specifically focused on detecting errors associated with wrong blood-in-tube (WBIT) situations [25]. Through the use of an ANN model, they achieved noteworthy sensitivity of 90.6 %, specificity of 94.5 %, and accuracy of 92.5 %. This study provides evidence that autonomously functioning AI models exhibit superior performance in the detection of WBIT errors when compared to human interaction [25]. In a study by Zhou et al. the emphasis was placed on the detection of sample mix-ups utilizing the delta check method in conjunction with deep learning techniques [26]. A variety of ML algorithms were employed, including Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN), Naive Bayesian Classifier (NBC), and Deep Belief Network (DBN). Notably, the DBN-based delta check method outperformed other approaches, yielding an impressive AUC of 0.977, accuracy of 93.1 %, true positive rate (TPR) of 92.9 %, and true negative rate (TNR) of 93.3 %. However, it is important to note that the study had limitations in terms of explainability [26]. Mitani et al. developed a gradient-boosting-decision-tree (GBDT) model to effectively detect specimen mix-ups [27]. The model achieved an outstanding AUC of 0.998, showcasing its efficacy in accurately identifying mix-up occurrences. Nonetheless, it is worth mentioning that this study was limited to a simulation-based approach and lacked external validation [27]. Rosenbaum et al. directed their research toward identifying WBIT errors by utilizing logistic regression and support vector machine (SVM) models [28]. The SVM model outperformed conventional single-analyte delta checks, exhibiting an impressive AUC of 0.97. The application of ML models demonstrated superior capabilities in preventing WBIT errors, effectively reducing potential harm to patients [28].

Sample dilution management

In the realm of serum-free light chain (sFLC) testing, Ialongo (2017) tackled the critical issue of sample dilution management [29]. To address this challenge, they employed an artificial neural network (ANN) model known as MLP-ANN. The implementation of MLP-ANN resulted in a remarkable reduction in wasted tests for κ-FLC and λ-FLC, with reductions of 69.4 and 70.8 %, respectively. Despite these promising results, it should be noted that the MLP-ANN model exhibited limitations in recognizing certain cases, and the study lacked external validation [29].

Detecting chemical manipulation in urine samples

In the domain of urine sample analysis, Streun et al. conducted a study utilizing ML techniques, specifically (ANN), to detect instances of chemical manipulation [30]. The ANN model employed in the study exhibited a noteworthy accuracy of 95.4 %. Furthermore, the research investigated the importance of features by utilizing local interpretable model-agnostic explanations [30]. It is worth mentioning that the study did not include external validation; however, the results indicate that the ML-based ANN model holds promise in effectively identifying cases of chemical manipulation in urine samples. This has the potential to enhance the overall integrity and reliability of laboratory testing procedures [30].

Assessing serum quality based on hemolysis, icterus, and lipemia

In their research, Yang et al. developed a deep learning-based model utilizing convolutional neural networks (CNNs) to evaluate serum quality based on sample images [31]. The CNN model demonstrated exceptional performance, achieving high area under the curve (AUC) values for detecting hemolysis (0.989), icterus (0.996), and lipemia (0.993). It is important to note, however, that this study lacked external validation [31].

Improving PBFC test utilization

To enhance the utilization of peripheral blood flow cytometry (PBFC) tests, Zhang et al. conducted a study [32]. Decision tree and logistic regression models were employed, resulting in a sensitivity of 98 % and specificity of 65 %, with an AUC of 0.906. The utilization of ML models effectively reduced unnecessary PBFC test utilization by 35–40 %, optimizing resource allocation [32].

Overall, these studies demonstrate the successful implementation of ML techniques in the pre-analytical phase of laboratory medicine. They offer promising solutions for identifying clotted specimens, detecting mislabeled samples and WBIT errors, managing sample dilution, identifying specimen mix-ups, revealing chemical manipulation, evaluating serum quality, improving PBFC test utilization, and detecting sample mix-ups using the delta check method. The use of ML algorithms has shown superior performance compared to traditional approaches, contributing to enhanced efficiency and patient safety in laboratory workflows. However, some studies have limitations such as disparities in sample distribution, lack of external validation, and limited explainability. Future research should focus on addressing these limitations and further exploring the potential of ML in the pre-analytical phase of laboratory medicine.

Machine learning in the analytical phase

The analytical phase involves the examination process, and ML algorithms have shown promise in various analytical tasks. In this section, we summarize notable studies that have explored the implementation of ML in the analytical phase.

Cell image analysis

Bigorra et al. conducted a study aiming to automate the differentiation between reactive lymphoid cells (RLC) and blast cells of lymphoid and myeloid origin [33]. Using a dataset of 916 blood cell images, Support Vector Machines (SVM) were employed, achieving an overall accuracy of 80 %. Notably, the SVM model exhibited high accuracy in distinguishing reactive lymphoid cells (85.11 %) and myeloid blast cells (82 %), although the accuracy for lymphoid blast cells was relatively lower (73.97 %). The model incorporated statistical features extracted from the color components of the images, followed by dimensionality reduction using Principal Component Analysis (PCA) and feature selection based on mutual information maximization. However, external validation was lacking in this study. Nevertheless, this research shows promise in automating the distinction between reactive lymphocytes and blast cells, particularly in recognizing myeloblasts and lymphoblasts [33]. Chabrun et al. investigated the analysis of peripheral leukocytes and the prediction of VEXAS syndrome through deep learning approaches [34]. Convolutional Neural Networks (CNN) and support vector machine (SVM) algorithms were employed on a dataset of 197 blood smears from 12 patients. The deep learning models demonstrated satisfactory performance, effectively distinguishing VEXAS patients from both UBA1-WT and MDS patients, with ROC-AUCs ranging from 0.87 to 0.95. The workflow involved Python programming utilizing the sklearn package. However, it is important to acknowledge the limitation of the small sample size in this study [34]. In the field of erythrocyte morphology classification, Durant et al. employed CNNs to classify erythrocytes based on their morphology [35]. Utilizing a dataset of 3,737 labeled cells, the CNN model achieved a recall of 92.70 %, a precision of 89.39 %, and a correct classification frequency of 90.60 %. The study implemented the CNN model using Python programming with Theano and Lasagne packages. However, external validation was not performed, which should be taken into consideration [35]. Mohlman et al. investigated the differentiation between diffuse large B-cell lymphoma (DLBCL) and Burkitt lymphoma (BL) based on histologic images [36]. CNNs were applied to a dataset of 10,818 H&E-stained tissue slide images, comprising 36 cases of DLBCL and 34 cases of BL. The CNN model achieved an AUC of 0.92 for distinguishing between the two types of lymphoma. The study utilized Python programming with the Tensorflow platform. It is important to note that the presence of a higher number of training images from BL may have introduced a slight bias, which should be considered when interpreting the results [36]. Sun et al. conducted a study to detect fetal nucleated red blood cells (fNRBCs) utilizing various machine learning algorithms, including K-nearest neighbor (KNN), support vector machine (SVM), and CNN [37]. The study analyzed 4,760 pictures of fNRBCs from 260 cell slides of umbilical cord blood samples. The CNN model achieved an accuracy of 98.5 %, sensitivity of 96.5 %, and specificity of 100 % for fNRBC detection. However, the study did not explicitly address the aspect of explainability, which warrants further consideration [37].

These studies utilize various techniques, including statistical feature extraction, dimensionality reduction, and deep learning models, demonstrating the versatility and potential of ML and image analysis methods in the field of cell analysis. However, it is important to address the limitations mentioned, such as the absence of external validation, small sample sizes, and the potential bias introduced by imbalanced training images. Further research and validation are necessary to ensure the reliability and generalizability of these findings. Overall, these studies contribute valuable insights and methodologies to cell image analysis, paving the way for the development of automated systems for cell classification, disease prediction, and diagnostic support in clinical and research settings.

Assessing analytically acceptable mass spectrometry results

Yu et al. utilized ML algorithms, including AdaBoost, decision tree, K-nearest neighbors (KNN), logistic regression, random forest, and support vector machine (SVM), to verify analytically acceptable mass spectrometry (MS) results [38]. Using a dataset of 1,267 urine samples targeting 11-nor-9-carboxy-delta-9-tetrahydrocannabinol, the SVM model achieved a precision of 81 %, recall of 100 %, and an F1 score of 90 %. Although external validation was not performed, the ML model reduced manual review needs by approximately 87 % [38]. These findings demonstrate the potential of ML in automating the assessment of MS results, improving efficiency, and warrant further research for external validation and broader applicability.

Quality control

Zhou et al. conducted a study aiming to develop a real-time patient-based quality control (QC) system in laboratory medicine using ML, known as MLiQC [39]. The researchers utilized the Random Forest (RF) algorithm on a large dataset of 1,195,000 patient results. The RF model demonstrated promising performance, achieving an Area Under the Curve (AUC) of 0.985 for the detection of critical bias in albumin. The model exhibited an accuracy of 75 %, sensitivity of 71.3 %, specificity of 99.6 %, and a low false positive rate (FPR) of 0.45 %. The study conducted validation using artificial error data, although external validation was not explicitly mentioned. MLiQC proved to be superior to the traditional patient-based real-time quality control (PBRTQC) method [39]. In another investigation, Çubukçu proposed an ML model that integrated conventional (QC) rules, exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) charts [40]. The model employed the random forest algorithm on a dataset of 170,000 simulated QC results. The RF model achieved a low false rejection probability of 0.0048 and demonstrated the highest error detection rate for errors less than one standard deviation. The study identified CUSUM and EWMA as the most important features in terms of predictive capability. However, the model lacked performance evaluation with multi rules and real-world implementation [40].

Both studies showcase the potential of ML in enhancing quality control practices in laboratory medicine. However, further research and validation are necessary to address the limitations and ensure the effectiveness and practicality of these approaches in real-world settings.

In summary, the implementation of ML in the analytical phase of laboratory medicine has demonstrated promising results across various applications. These studies have showcased the potential of ML algorithms, such as SVM, CNN, and RF, in differentiating between reactive lymphoid cells and blast cells, analyzing peripheral leukocytes, classifying erythrocyte morphology, distinguishing between different types of lymphoma, detecting fetal nucleated red blood cells, verifying MS results, and developing real-time quality control methods. While these studies present advancements in the field, further validation, and larger-scale implementation are necessary to ensure the robustness and generalizability of ML models in laboratory medicine’s analytical phase.

Machine learning in the post-analytical phase

The integration of ML techniques in the post-analytical phase of healthcare has shown promising applications across a wide range of medical disciplines. Studies have explored the utilization of ML algorithms to enhance various diagnostic processes, such as predicting disease outcomes, estimating biomarker levels, and differentiating between different medical conditions [11, 41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], [93], [94], [95], [96], [97], [98], [99], [100], [101], [102], [103], [104], [105], [106], [107], [108], [109], [110].

These studies covered addressed a diverse range of medical conditions, including sepsis, cardiovascular diseases, cancer, endocrine disorders, infectious diseases, and autoimmune diseases as given in Table 1. ML algorithms were applied to various laboratory parameters, such as blood tests, urine tests, metabolic profiles, genetic markers, and imaging data.

The reviewed studies demonstrated the efficacy of ML algorithms in several diagnostic applications. Predictive models were developed for diseases such as sepsis [41], cancer [78], diabetes, and gestational disorders [62]. ML algorithms were also used for estimating LDL cholesterol levels [106], identifying specific infections [88], and distinguishing between benign and malignant lesions [58]. Furthermore, ML-based approaches showed promise in improving the accuracy of diagnosing genetic disorders, autoimmune diseases [68], and hematological malignancies [73].

However, challenges such as data quality, interpretability, and algorithm validation need to be addressed to ensure the safe and effective implementation of ML in clinical practice. Future research should focus on large-scale validation studies, standardized protocols, and the integration of ML algorithms into existing laboratory information systems.

Explainable AI

The introduction of ML in laboratory medicine and its impact on clinical decision-making provides a cutting-edge methodology that bears tremendous promise, as can be observed from the examples provided thus far. However, similar to many healthcare ML applications, the adoption of ML in routine clinical laboratory practice is often hindered by substantial concerns regarding its inherent behavior, often referred to as the “black-box” problem [41, 61], [62], [63, 111]. In essence, the term “black box” refers to AI models that produce outputs without revealing their internal decision-making processes [112]. High-performance, black-box models’ internal decision-making procedures are typically incomprehensible to humans [61, 76, 113].

This is where explainable AI (XAI) comes into play, as it provides transparency and interpretability to the decision-making process of AI algorithms, allowing clinicians to make informed decisions and take responsibility for the outcomes [63, 113]. In contrast to applications like AI-based advertising recommendations, explanations are crucial for users to understand, trust, and effectively manage these tools in high-stakes AI applications, such as autonomous vehicles and healthcare and where decisions can have life-or-death consequences [112]. Since the clinical laboratory influences medical decisions with its results, the models developed in this field must also be explainable. For reliable AI in clinical contexts, regulatory compliance, and determining who is responsible for AI faults, explainability in ML models is essential [111].

In addition to the transparency and interpretability offered by XAI, the integrity and quality of the underlying data are equally crucial for ML models, which is where the FAIR data principles become essential [114, 115]. The importance of the FAIR data principles is particularly crucial in high-stake applications where the accuracy and consistency of data directly influence the quality of AI-driven insights and decisions [114].

These principles ensure that data is well described for both humans and computers, encompassing the following aspects.

Findability ensures that human and automated systems can easily locate and retrieve data [116]. This aspect can be ensured by assigning unique and persistent identifiers, enriching descriptions with detailed metadata, and clearly including the data’s identifier within the metadata for easy discovery [115].
Accessibility implies that data can be accessed with well-defined mechanisms [116]. It can be achieved through retrievability via standardized, open protocols, potentially inclusive of authentication and authorization processes, with a commitment to keeping metadata available even after data is no longer available [117].
Interoperability allows for the integration and collaborative use of data from diverse sources [116]. It can be facilitated using universally understandable languages and FAIR-compliant vocabularies and incorporating references that connect to other relevant datasets [115].
Reusability can be obtained when data is well-documented and maintained in formats beneficial to future research and extends the utility of datasets beyond their initial purpose. This aspect promoted by comprehensive, accurate descriptions, clear usage licenses, thorough documentation of origins, and adherence to established community standards, enhancing transparency and utility for various users [115].

The FAIR data principles support for a high data management and stewardship standard, ensuring that the data used for training and validating ML models are well-curated, standardized, and transparent [114, 115].

A greater interest in using explainability tools for created ML models was identified by our literature review on the use of the XAI approach, as shown in Table 1. Although there has been substantial reporting on XAI’s results in the healthcare area as a whole recently, there have been relatively few studies specifically focused on laboratory medicine [112]. Therefore, we provided a holistic overview in the context of publications that included the results of our research and introduced the XAI principles in this section of our review study [105, 119]. Further research outcomes could be anticipated in the future as a result of this strategy, which is still quite new.

Approaches for explainable AI

XAI approaches can be categorized in a variety of ways in the literature, however, Figure 3 provides a basic illustration of one.

Figure 3:

Basic classification of explainability methods.

Considering the diverse classifications of XAI techniques in existing literature, some model types inherently offer explainability, often denoted as “Transparent Models” or “Explanation by Design.” These models’ decision-making procedures are typically simple and can be easily understood by humans [118]. It is also possible to visually illustrate them in a way that is simple to comprehend. Examples of intrinsically explainable models include linear models, decision trees, rule-based systems, Naive Bayes classifiers, and K-Nearest Neighbors (K-NN) [112].

For instance, in linear models with a limited number of features, the coefficients or weights associated with the linear equation can provide meaningful insights into predictive behavior. As an example, the study by Çubukçu (2022), where utilized a linear model for LDL-C prediction, offering easily interpretable coefficients for LDL-C estimation [106]. Similarly, Zhang (2020) demonstrated the transparency and ease of interpretation of decision tree models in their study, where they used this approach to triage peripheral blood flow cytometry specimens [32]. Although certain models, such as rule-based algorithms, decision trees, and linear regression, have transparent decision-making processes, their performance is typically inferior to that of more complicated “black-box” models, such as deep learning or ensemble models [119].

Despite their potential to enhance the trustworthiness of AI and promote unbiased decision-making, these models may exhibit lower levels of prediction and inference accuracy compared to their black-box counterparts. Also, their optimal performance is limited to tabular or relational data structures, and they may encounter difficulties when processing more intricate data types, such as images or text [113].

Given the complexity of black-box models, special techniques have been developed to interpret their decision-making processes [83]. These techniques, known as “black-box model explainers”, fall into two main categories: model-specific explainers and model-agnostic explainers, as explained below [113].

Model-specific explainers

Model-specific explainers are customized for a particular type of model, such as DeepLIFT for neural networks and Score-CAM for CNN networks [63, 112, 120]. These kinds of explainers leverage the understanding of the model’s structure and functions, making them more accurate than model-agnostic explainers [118, 121]. As an example, Hu (2023) used score-based class activation maps (score CAMs) to visually explain the gel immunofixation interpretation of the model for classifying monoclonal gammopathies. According to the study results, the maps accurately highlighted the targeted regions in the bands and revealed potential misclassifications [62]. Another example of a model-specific explainability method is using Gini weights in random forest models, which provide a relatively simple and understandable measure of feature importance. This method was employed by other researchers to report the feature importance of their random forest models [61, 90].

Model-agnostic explainers

These types of explainers do not require access to the model’s internal structure or process and create explanations based on a model’s input-output behavior. Therefore model-agnostic explainers exhibit greater flexibility than model-specific explainers and can be used to interpret any ML model [118, 121, 122]. Model-agnostic explainers can also be further divided into global and local explanations according to the explanation’s scope:

Global explanations can interpret the general workings of a model and offer insights into its decision-making process, aiming to provide a comprehensive understanding of the model’s decision-making. As seen in Table 1, feature importance is one of the most used global explanation methods. For this purpose, methods such as SHAP (SHapley Additive exPlanations) and permutation feature importance are frequently utilized. The feature permutation importance technique serves as a valuable tool in discerning the significance of features, offering a global perspective on how these features influence the model’s overall performance [29, 30, 123]. SHAP explanations are based on the Shapley values from game theory and can be also used to identify the most essential features for a model’s predictions [41]. Examples of studies utilizing these two feature-importance methods can be found in Table 1.
Local explanations offer insights into specific decisions made by an ML model, focusing on the interpretation of individual predictions. Consequently, they can be employed for the individual evaluation of predictions, providing a detailed understanding of the model’s decision-making process on a case-by-case basis [123]. In contrast to global explanations, their use in clinical laboratory settings is less common. For feature importance calculations, it is possible to utilize the “Local Interpretable Model-agnostic Explanations” (LIME) technique, which builds a local interpretable model that resembles the black-box mode. Streun et al. utilized this approach to provide interpretability for their adulterated urine sample prediction model [30]. Both SHAP and LIME can be utilized for local explanations as well as global explanations. Furthermore, breakdown plots may also be used for this purpose. However, if there is a correlation between the data, the effectiveness of these theories may be diminished [123]. Topcu (2023) utilized both breakdown plots and SHAP as local explainability methods for ML models in the classification of HbA1c. The study demonstrated how these local explanation techniques could provide different insights into feature contributions, highlighting their utility in understanding and comparing complex models [124].

Challenges and optimal implementation ways of machine learning models

Challenges

AI and ML models have been increasingly applied in laboratory medicine for diagnostic and prognostic purposes. However, the integration of these approaches in healthcare and clinical laboratory settings poses several challenges and considerations that need to be addressed. These include the potential for deskilling among healthcare givers and laboratory professionals due to task automation [125]. ML models have inherent uncertainties and performance limitations that need to be considered.

The complexity and interpretability of ML algorithms are other important problems that may restrict their clinical utility. It is crucial to create ML algorithms that clinicians can understand and find transparent [126]. Black box models, like artificial neural networks, lack explainability, although methods exist to enhance interpretability [125]. Different users have varying needs; while some may seek a high-level understanding of a model’s decision-making process, others might require detailed insights into specific model behaviors. Ensuring fairness and addressing biases, especially related to sensitive attributes such as gender and race, has significant importance, and without proper methods, bias can remain hidden. Techniques available for XAI are still maturing and may not be sufficient to address all the complexities inherent in advanced ML models [127]. Furthermore, there are no standardized evaluation techniques for XAI tools currently available [122].

Investigations that are biased raise ethical issues that call for validation among many populations [128]. To prevent negative outcomes, adverse events, and probable underperformance, ML models must be validated for groups with heterogeneous demographics on independent datasets to ensure their generalizability and reproducibility before implementation. However, obtaining independent datasets can be difficult in clinical laboratory medicine [126, 128]. To train and validate ML algorithms, a substantial amount of high-quality data must be available. The adoption of ML is impacted by infrastructure constraints, data quality issues, and privacy protection [129]. ML algorithms come in a wide variety, making it difficult to select the best solution for a given task. The selection of hyperparameters and data preprocessing can also have an impact on how well ML algorithms perform. To preserve the privacy of health information and guarantee adherence to applicable rules, appropriate consent, and patient governance processes must be in place [126, 129].

Transferability and performance evaluation of ML models are also essential before implementation. Besides, the dynamic nature of healthcare requires ML development to capture evolving professional knowledge and be continuously monitored to align with sustainability goals [128].

Insufficient infrastructure, including instruments, laboratory information systems (LIS), and electronic health record (EHR) systems, may hinder the seamless incorporation and interface of cutting-edge ML technologies. Strengthening information systems and infrastructure is essential to support the effective integration of ML in healthcare settings [130].

AI/ML-based clinical decision support (CDS) systems are currently being implemented in clinical practice as Software as a Medical Device (SaMD). This scenario presents additional challenges for manufacturers and users within healthcare facilities. Although ISO 15189:2022 does not explicitly delineate AI-based SaMD [131], clearer definitions and compliance standards have been established in Europe through the Medical Device Regulation (MDR) and In Vitro Medical Device Regulation (IVDR) [132, 133]. The regulatory classification depends on the software’s intended purpose; if primarily associated with In Vitro Diagnostic (IVD) data, it falls under IVDR [132, 134], otherwise, it falls under MDR [133]. Both IVDR and MDR prescribe classifications for these devices. Despite nomenclature discrepancies, the criticality of the medical condition and the significance of the information addressed by SaMD are pivotal considerations for compliance standards [132, 133]. In-house Clinical Decision Support (CDS) systems must also adhere to security and performance criteria stipulated by regulations. Additionally, the utilization of in-house devices is permissible only when no equivalent medical device is available in the market, as specified by MDR and IVDR [132, 133].

Other issues to be concerned about include the still-uncertain liability and responsibility for AI and ML-assisted clinical decision-making. Determining the responsibility for clinical decisions and the extent of disclosure regarding ML integration is an ongoing discussion [13, 129]. Overall, translating research techniques into clinical practice, establishing accountability for clinical decisions [129], and the extent of disclosure of ML integration are the major challenges associated with the use of ML models [135]. Nevertheless, recent advances in AI offer an exciting opportunity to improve healthcare [135].

To overcome these challenges, validation, interpretability, collaboration, and interdisciplinary cooperation are required. It is also important to develop problem-solving strategies, such as procedures to correct class imbalances and the continued development of sampling and data augmentation techniques [94, 136]. As pointed out above, it is worth considering the ethical issues that arise from the use of AI technologies in laboratory medicine, such as the use of sensitive patient data [137].

Optimal implementation ways

The integration of ML-based CDSS poses various challenges that must be addressed to ensure their successful implementation.

An optimal approach to CDSS implementation would be to use ML to augment human performance and decrease interobserver variability [138], rather than replacing them entirely. A hybrid intelligence approach, combining humans and AI, can provide the best of both worlds [139].

External validation of ML-based CDSS is crucial before their deployment in clinical settings, ideally at a different location from where the model was developed. The uncertainty connected to CDSS predictions should also be disclosed to stakeholders [140].

Clinical laboratory and healthcare professionals should be involved in the development and integration of CDSS to mitigate challenges related to adoption [141]. Furthermore, comprehensive performance evaluation, using supported human decisions as endpoints, should be conducted before deploying ML-based CDSS in clinical settings [138].

To ensure the successful implementation of ML-based CDSS laboratories must meet certain preconditions, such as the availability of well-categorized, structured, standardized, and complete clinical data [142]. As technologies such as ML, AI, IoT, big data, and advanced analytics become more mainstream, organizations must adapt to reap their benefits [143].

A few noteworthy guiding documents have emerged to provide laboratory professionals and other stakeholders with invaluable recommendations, shedding light on the best practices in this relatively new field. The IFCC working group has contributed significantly by publishing practical recommendations aimed at facilitating the implementation of ML in laboratory medicine [126]. Additionally, the principles presented by the Food and Drug Administration (FDA), Health Canada, and the Medicines & Healthcare Products Regulatory Agency serve to guide the development of medical devices, offering a comprehensive framework for the adoption of “good machine-learning practices” [144]. These guiding documents collectively contribute to the advancement and standardization of ML practices in laboratory medicine and medical device development, fostering improved outcomes and ensuring the highest level of quality and safety in the healthcare industry.

Conclusions

The use of machine learning (ML)-based clinical decision support systems (CDSS) in laboratory medicine and healthcare, from precision medicine to population health, greatly improves clinical decision-making. However, it is crucial to address the challenges associated with their implementation and ensure their successful adoption.

To make sure that the models created are suitable and the best that can be done at a specific time, ethical issues, including data access, permission, and biases, must be addressed. Another important obstacle to AI-driven technologies in clinical practice is the lack of transparency in some AI algorithms, especially black box ones. Explainability enables professionals to assess system reliability, explain AI recommendations to patients, and foster clinician-patient trust. Additionally, XAI makes sure that treatment recommendations are supported by trustworthy data and ethical standards.

Overall, the integration of ML-based CDSS in healthcare and laboratory medicine requires careful attention to ensure the successful and ethical use of AI and ML algorithms. This requires a collaborative effort between laboratory professionals, clinicians, and other stakeholders, and the use of hybrid intelligence approaches, external validation, comprehensive performance evaluation, and meeting preconditions for successful implementation.

Corresponding author: Hikmet Can Çubukçu, MD, EuSpLM, General Directorate of Health Services, Rare Diseases Department, Turkish Ministry of Health, Bilkent Yerleskesi, 6001. Cadde, Universiteler Mahallesi 06800, Ankara, Türkiye; and Hacettepe University Institute of Informatics, Ankara, Türkiye, Phone: +905376728807, E-mail: hikmetcancubukcu@gmail.com

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors state no conflict of interest.
Research funding: None declared.
Data availability: Not applicable.

References

1. Granter, SR, Beck, AH, Papke, DJJr. AlphaGo, deep learning, and the future of the human microscopist. Arch Pathol Lab Med 2017;141:619–21. https://doi.org/10.5858/arpa.2016-0471-ed.Search in Google Scholar PubMed

2. Herman, DS, Rhoads, DD, Schulz, WL, Durant, TJS. Artificial intelligence and mapping a new direction in laboratory medicine: a review. Clin Chem 2021;67:1466–82. https://doi.org/10.1093/clinchem/hvab165.Search in Google Scholar PubMed

3. Goecks, J, Jalili, V, Heiser, LM, Gray, JW. How machine learning will transform biomedicine. Cell 2020;181:92–101. https://doi.org/10.1016/j.cell.2020.03.022.Search in Google Scholar PubMed PubMed Central

4. De Bruyne, S, Speeckaert, MM, Van Biesen, W, Delanghe, JR. Recent evolutions of machine learning applications in clinical laboratory medicine. Crit Rev Clin Lab Sci 2021;58:131–52. https://doi.org/10.1080/10408363.2020.1828811.Search in Google Scholar PubMed

5. Faes, L, Liu, X, Wagner, SK, Fu, DJ, Balaskas, K, Sim, DA, et al.. A clinician’s guide to artificial intelligence: how to critically appraise machine learning studies. Transl Vis Sci Technol 2020;9:7. https://doi.org/10.1167/tvst.9.2.7.Search in Google Scholar PubMed PubMed Central

6. Rabbani, N, Kim, GYE, Suarez, CJ, Chen, JH. Applications of machine learning in routine laboratory medicine: current state and future directions. Clin Biochem 2022;103:1–7. https://doi.org/10.1016/j.clinbiochem.2022.02.011.Search in Google Scholar PubMed PubMed Central

7. Staartjes, VE, Kernbach, JM. Significance of external validation in clinical machine learning: let loose too early? Spine J 2020;20:1159–60. https://doi.org/10.1016/j.spinee.2020.02.016.Search in Google Scholar PubMed

8. Auffray, C, Balling, R, Barroso, I, Bencze, L, Benson, M, Bergeron, J, et al.. Making sense of big data in health research: towards an EU action plan. Genome Med 2016;8:71. https://doi.org/10.1186/s13073-016-0323-y.Search in Google Scholar PubMed PubMed Central

9. Rajkomar, A, Dean, J, Kohane, I. Machine learning in medicine. N Engl J Med 2019;380:1347–58. https://doi.org/10.1056/nejmra1814259.Search in Google Scholar PubMed

10. Waring, J, Lindvall, C, Umeton, R. Automated machine learning: review of the state-of-the-art and opportunities for healthcare. Artif Intell Med 2020;104:101822. https://doi.org/10.1016/j.artmed.2020.101822.Search in Google Scholar PubMed

11. Topcu, D, Bayraktar, N. Searching for the urine osmolality surrogate: an automated machine learning approach. Clin Chem Lab Med 2022;60:1911–20. https://doi.org/10.1515/cclm-2022-0415.Search in Google Scholar PubMed

12. Bright, TJ, Wong, A, Dhurjati, R, Bristow, E, Bastian, L, Coeytaux, RR, et al.. Effect of clinical decision-support systems: a systematic review. Ann Intern Med 2012;157:29–43. https://doi.org/10.7326/0003-4819-157-1-201207030-00450.Search in Google Scholar PubMed

13. Baron, JM, Kurant, DE, Dighe, AS. Machine learning and other emerging decision support tools. Clin Lab Med 2019;39:319–31. https://doi.org/10.1016/j.cll.2019.01.010.Search in Google Scholar PubMed

14. Naugler, C, Church, DL. Automation and artificial intelligence in the clinical laboratory. Crit Rev Clin Lab Sci 2019;56:98–110. https://doi.org/10.1080/10408363.2018.1561640.Search in Google Scholar PubMed

15. Rohr, UP, Binder, C, Dieterle, T, Giusti, F, Messina, CG, Toerien, E, et al.. The value of in vitro diagnostic testing in medical practice: a status report. PLoS One 2016;11:e0149856. https://doi.org/10.1371/journal.pone.0149856.Search in Google Scholar PubMed PubMed Central

16. Wen, Z, Wang, S, Yang, DM, Xie, Y, Chen, M, Bishop, J, et al.. Deep learning in digital pathology for personalized treatment plans of cancer patients. Semin Diagn Pathol 2023;40:109–19. https://doi.org/10.1053/j.semdp.2023.02.003.Search in Google Scholar PubMed

17. Damiani, G, Altamura, G, Zedda, M, Nurchis, MC, Aulino, G, Heidar Alizadeh, A, et al.. Potentiality of algorithms and artificial intelligence adoption to improve medication management in primary care: a systematic review. BMJ Open 2023;13:e065301. https://doi.org/10.1136/bmjopen-2022-065301.Search in Google Scholar PubMed PubMed Central

18. Hurvitz, N, Azmanov, H, Kesler, A, Ilan, Y. Establishing a second-generation artificial intelligence-based system for improving diagnosis, treatment, and monitoring of patients with rare diseases. Eur J Hum Genet 2021;29:1485–90. https://doi.org/10.1038/s41431-021-00928-4.Search in Google Scholar PubMed PubMed Central

19. Ialongo, C, Bernardini, S. Total laboratory automation has the potential to be the field of application of artificial intelligence: the cyber-physical system and “Automation 4.0”. Clin Chem Lab Med 2019;57:e279–81. https://doi.org/10.1515/cclm-2019-0226.Search in Google Scholar PubMed

20. Neumaier, M. Diagnostics 4.0: the medical laboratory in digital health. Clin Chem Lab Med 2019;57:343–8. https://doi.org/10.1515/cclm-2018-1088.Search in Google Scholar PubMed

21. Lippi, G, Plebani, M. Integrated diagnostics: the future of laboratory medicine? Biochem Med 2020;30:010501. https://doi.org/10.11613/bm.2020.010501.Search in Google Scholar

22. Gruson, D, Helleputte, T, Rousseau, P, Gruson, D. Data science, artificial intelligence, and machine learning: opportunities for laboratory medicine and the value of positive regulation. Clin Biochem 2019;69:1–7. https://doi.org/10.1016/j.clinbiochem.2019.04.013.Search in Google Scholar PubMed

23. Fang, K, Dong, Z, Chen, X, Zhu, J, Zhang, B, You, J, et al.. Using machine learning to identify clotted specimens in coagulation testing. Clin Chem Lab Med 2021;59:1289–97. https://doi.org/10.1515/cclm-2021-0081.Search in Google Scholar PubMed

24. Farrell, CJ. Identifying mislabelled samples: machine learning models exceed human performance. Ann Clin Biochem 2021;58:650–2. https://doi.org/10.1177/00045632211032991.Search in Google Scholar PubMed

25. Farrell, CL. Decision support or autonomous artificial intelligence? The case of wrong blood in tube errors. Clin Chem Lab Med 2022;60:1993–7. https://doi.org/10.1515/cclm-2021-0873.Search in Google Scholar PubMed

26. Zhou, R, Liang, YF, Cheng, HL, Wang, W, Huang, DW, Wang, Z, et al.. A highly accurate delta check method using deep learning for detection of sample mix-up in the clinical laboratory. Clin Chem Lab Med 2022;60:1984–92. https://doi.org/10.1515/cclm-2021-1171.Search in Google Scholar PubMed

27. Mitani, T, Doi, S, Yokota, S, Imai, T, Ohe, K. Highly accurate and explainable detection of specimen mix-up using a machine learning model. Clin Chem Lab Med 2020;58:375–83. https://doi.org/10.1515/cclm-2019-0534.Search in Google Scholar PubMed

28. Rosenbaum, MW, Baron, JM. Using machine learning-based multianalyte delta checks to detect wrong blood in tube errors. Am J Clin Pathol 2018;150:555–66. https://doi.org/10.1093/ajcp/aqy085.Search in Google Scholar PubMed

29. Ialongo, C, Pieri, M, Bernardini, S. Smart management of sample dilution using an artificial neural network to achieve streamlined processes and saving resources: the automated nephelometric testing of serum free light chain as case study. Clin Chem Lab Med 2017;55:231–6. https://doi.org/10.1515/cclm-2016-0263.Search in Google Scholar PubMed

30. Streun, GL, Steuer, AE, Ebert, LC, Dobay, A, Kraemer, T. Interpretable machine learning model to detect chemically adulterated urine samples analyzed by high resolution mass spectrometry. Clin Chem Lab Med 2021;59:1392–9. https://doi.org/10.1515/cclm-2021-0010.Search in Google Scholar PubMed

31. Yang, C, Li, D, Sun, D, Zhang, S, Zhang, P, Xiong, Y, et al.. A deep learning-based system for assessment of serum quality using sample images. Clin Chim Acta 2022;531:254–60. https://doi.org/10.1016/j.cca.2022.04.010.Search in Google Scholar PubMed

32. Zhang, ML, Guo, AX, Kadauke, S, Dighe, AS, Baron, JM, Sohani, AR. Machine learning models improve the diagnostic yield of peripheral blood flow cytometry. Am J Clin Pathol 2020;153:235–42. https://doi.org/10.1093/ajcp/aqz150.Search in Google Scholar PubMed

33. Bigorra, L, Merino, A, Alférez, S, Rodellar, J. Feature analysis and automatic identification of leukemic lineage blast cells and reactive lymphoid cells from peripheral blood cell images. J Clin Lab Anal 2017;31:1–9. https://doi.org/10.1002/jcla.22024.Search in Google Scholar PubMed PubMed Central

34. Chabrun, F, Lacombe, V, Dieu, X, Geneviève, F, Urbanski, G. Accurate stratification between VEXAS syndrome and differential diagnoses by deep learning analysis of peripheral blood smears. Clin Chem Lab Med 2023;61:1275–9. https://doi.org/10.1515/cclm-2022-1283.Search in Google Scholar PubMed

35. Durant, TJS, Olson, EM, Schulz, WL, Torres, R. Very deep convolutional neural networks for morphologic classification of erythrocytes. Clin Chem 2017;63:1847–55. https://doi.org/10.1373/clinchem.2017.276345.Search in Google Scholar PubMed

36. Mohlman, JS, Leventhal, SD, Hansen, T, Kohan, J, Pascucci, V, Salama, ME. Improving augmented human intelligence to distinguish Burkitt lymphoma from diffuse large B-cell lymphoma cases. Am J Clin Pathol 2020;153:743–59. https://doi.org/10.1093/ajcp/aqaa001.Search in Google Scholar PubMed

37. Sun, C, Wang, R, Zhao, L, Han, L, Ma, S, Liang, D, et al.. A computer-aided diagnosis system of fetal nucleated red blood cells with convolutional neural network. Arch Pathol Lab Med 2022;146:1395–401. https://doi.org/10.5858/arpa.2021-0142-oa.Search in Google Scholar PubMed

38. Yu, M, Bazydlo, LAL, Bruns, DE, Harrison, JHJr. Streamlining quality review of mass spectrometry data in the clinical laboratory by use of machine learning. Arch Pathol Lab Med 2019;143:990–8. https://doi.org/10.5858/arpa.2018-0238-oa.Search in Google Scholar PubMed

39. Zhou, R, Wang, W, Padoan, A, Wang, Z, Feng, X, Han, Z, et al.. Traceable machine learning real-time quality control based on patient data. Clin Chem Lab Med 2022;60:1998–2004. https://doi.org/10.1515/cclm-2022-0548.Search in Google Scholar PubMed

40. Çubukçu, HC. Performance evaluation of internal quality control rules, EWMA, CUSUM, and the novel machine learning model. Turk J Biochem 2021;46:661–70. https://doi.org/10.1515/tjb-2021-0199.Search in Google Scholar

41. Aguirre, U, Urrechaga, E. Diagnostic performance of machine learning models using cell population data for the detection of sepsis: a comparative study. Clin Chem Lab Med 2023;61:356–65. https://doi.org/10.1515/cclm-2022-0713.Search in Google Scholar PubMed

42. Anudeep, PP, Kumari, S, Rajasimman, AS, Nayak, S, Priyadarsini, P. Machine learning predictive models of LDL-C in the population of eastern India and its comparison with directly measured and calculated LDL-C. Ann Clin Biochem 2022;59:76–86. https://doi.org/10.1177/00045632211046805.Search in Google Scholar PubMed

43. Bancal, C, Salipante, F, Hannas, N, Lumbroso, S, Cavalier, E, De Brauwere, DP. A new approach to assessing calcium status via a machine learning algorithm. Clin Chim Acta 2023;539:198–205. https://doi.org/10.1016/j.cca.2022.12.018.Search in Google Scholar PubMed

44. Barakett-Hamade, V, Ghayad, JP, McHantaf, G, Sleilaty, G. Is machine learning-derived low-density lipoprotein cholesterol estimation more reliable than standard closed form equations? Insights from a laboratory database by comparison with a direct homogeneous assay. Clin Chim Acta 2021;519:220–6. https://doi.org/10.1016/j.cca.2021.05.008.Search in Google Scholar PubMed

45. Barnhart-Magen, G, Gotlib, V, Marilus, R, Einav, Y. Differential diagnostics of thalassemia minor by artificial neural networks model. J Clin Lab Anal 2013;27:481–6. https://doi.org/10.1002/jcla.21631.Search in Google Scholar PubMed PubMed Central

46. Bayani, A, Hosseini, A, Asadi, F, Hatami, B, Kavousi, K, Aria, M, et al.. Identifying predictors of varices grading in patients with cirrhosis using ensemble learning. Clin Chem Lab Med 2022;60:1938–45. https://doi.org/10.1515/cclm-2022-0508.Search in Google Scholar PubMed

47. Bayani, A, Asadi, F, Hosseini, A, Hatami, B, Kavousi, K, Aria, M, et al.. Performance of machine learning techniques on prediction of esophageal varices grades among patients with cirrhosis. Clin Chem Lab Med 2022;60:1955–62. https://doi.org/10.1515/cclm-2022-0623.Search in Google Scholar PubMed

48. Bigorra, L, Larriba, I, Gutiérrez-Gallego, R. A physician-in-the-loop approach by means of machine learning for the diagnosis of lymphocytosis in the clinical laboratory. Arch Pathol Lab Med 2022;146:1024–31. https://doi.org/10.5858/arpa.2021-0044-oa.Search in Google Scholar

49. Bigorra, L, Larriba, I, Gutiérrez-Gallego, R. Abnormal characteristic “round bottom flask” shape volume-based scattergram as a trigger to suspect persistent polyclonal B-cell lymphocytosis. Clin Chim Acta 2020;511:181–8. https://doi.org/10.1016/j.cca.2020.10.015.Search in Google Scholar PubMed

50. Cabitza, F, Campagner, A, Ferrari, D, Di Resta, C, Ceriotti, D, Sabetta, E, et al.. Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests. Clin Chem Lab Med 2020;59:421–31. https://doi.org/10.1515/cclm-2020-1294.Search in Google Scholar PubMed

51. Cadamuro, J, Cabitza, F, Debeljak, Z, De Bruyne, S, Frans, G, Perez, SM, et al.. Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI). Clin Chem Lab Med 2023;61:1158–66. https://doi.org/10.1515/cclm-2023-0355.Search in Google Scholar PubMed

52. Chocholova, E, Bertok, T, Jane, E, Lorencova, L, Holazova, A, Belicka, L, et al.. Glycomics meets artificial intelligence – potential of glycan analysis for identification of seropositive and seronegative rheumatoid arthritis patients revealed. Clin Chim Acta 2018;481:49–55. https://doi.org/10.1016/j.cca.2018.02.031.Search in Google Scholar PubMed

53. Demirci, F, Akan, P, Kume, T, Sisman, AR, Erbayraktar, Z, Sevinc, S. Artificial neural network approach in laboratory test reporting: learning algorithms. Am J Clin Pathol 2016;146:227–37. https://doi.org/10.1093/ajcp/aqw104.Search in Google Scholar PubMed

54. Dobrijević, D, Andrijević, L, Antić, J, Rakić, G, Pastor, K. Hemogram-based decision tree models for discriminating COVID-19 from RSV in infants. J Clin Lab Anal 2023;37:e24862. https://doi.org/10.1002/jcla.24862.Search in Google Scholar PubMed PubMed Central

55. Fan, G, Zhang, S, Wu, Q, Song, Y, Jia, A, Li, D, et al.. A machine learning-based approach for low-density lipoprotein cholesterol calculation using age, and lipid parameters. Clin Chim Acta 2022;535:53–60. https://doi.org/10.1016/j.cca.2022.08.007.Search in Google Scholar PubMed

56. Feng, P, Li, Y, Liao, Z, Yao, Z, Lin, W, Xie, S, et al.. An online alpha-thalassemia carrier discrimination model based on random forest and red blood cell parameters for low HbA(2) cases. Clin Chim Acta 2022;525:1–5. https://doi.org/10.1016/j.cca.2021.12.003.Search in Google Scholar PubMed

57. Gui, X, Zhang, X, Xin, Y, Liu, Q, Wang, Y, Zhang, Y, et al.. Identification and validation of volatile organic compounds in bile for differential diagnosis of perihilar cholangiocarcinoma. Clin Chim Acta 2023;541:117235. https://doi.org/10.1016/j.cca.2023.117235.Search in Google Scholar PubMed

58. Han, BW, Cai, GX, Liu, Q, Yang, X, Guo, ZW, Huang, LM, et al.. Noninvasive discrimination of benign and malignant breast lesions using genome-wide nucleosome profiles of plasma cell-free DNA. Clin Chim Acta 2021;520:95–100. https://doi.org/10.1016/j.cca.2021.06.008.Search in Google Scholar PubMed

59. Hatami, B, Asadi, F, Bayani, A, Zali, MR, Kavousi, K. Machine learning-based system for prediction of ascites grades in patients with liver cirrhosis using laboratory and clinical data: design and implementation study. Clin Chem Lab Med 2022;60:1946–54. https://doi.org/10.1515/cclm-2022-0454.Search in Google Scholar PubMed

60. Hauser, RG, Esserman, D, Beste, LA, Ong, SY, Colomb, DG, Bhargava, A, et al.. A machine learning model to successfully predict future diagnosis of chronic myelogenous leukemia with retrospective electronic health records data. Am J Clin Pathol 2021;156:1142–8. https://doi.org/10.1093/ajcp/aqab086.Search in Google Scholar PubMed

61. He, F, Lin, B, Mou, K, Jin, L, Liu, J. A machine learning model for the prediction of down syndrome in second trimester antenatal screening. Clin Chim Acta 2021;521:206–11. https://doi.org/10.1016/j.cca.2021.07.015.Search in Google Scholar PubMed

62. Hu, Z, Zhang, M. Establishment of clinical diagnostic models using glucose, lipid, and urinary polypeptides in gestational diabetes mellitus. J Clin Lab Anal 2021;35:e23833. https://doi.org/10.1002/jcla.23833.Search in Google Scholar PubMed PubMed Central

63. Hu, H, Xu, W, Jiang, T, Cheng, Y, Tao, X, Liu, W, et al.. Expert-level immunofixation electrophoresis image recognition based on explainable and generalizable deep learning. Clin Chem 2023;69:130–9. https://doi.org/10.1093/clinchem/hvac190.Search in Google Scholar PubMed

64. Janssens, LK, Boeckaerts, D, Hudson, S, Morozova, D, Cannaert, A, Wood, DM, et al.. Machine learning to assist in large-scale, activity-based synthetic cannabinoid receptor agonist screening of serum samples. Clin Chem 2022;68:906–16. https://doi.org/10.1093/clinchem/hvac027.Search in Google Scholar PubMed

65. Kurstjens, S, de Bel, T, van der Horst, A, Kusters, R, Krabbe, J, van Balveren, J. Automated prediction of low ferritin concentrations using a machine learning algorithm. Clin Chem Lab Med 2022;60:1921–8. https://doi.org/10.1515/cclm-2021-1194.Search in Google Scholar PubMed

66. Lafuente-Ganuza, P, Lequerica-Fernandez, P, Carretero, F, Escudero, AI, Martinez-Morillo, E, Sabria, E, et al.. A more accurate prediction to rule in and rule out pre-eclampsia using the sFlt-1/PlGF ratio and NT-proBNP as biomarkers. Clin Chem Lab Med 2020;58:399–407. https://doi.org/10.1515/cclm-2019-0939.Search in Google Scholar PubMed

67. Lee, T, Kim, J, Uh, Y, Lee, H. Deep neural network for estimating low density lipoprotein cholesterol. Clin Chim Acta 2019;489:35–40. https://doi.org/10.1016/j.cca.2018.11.022.Search in Google Scholar PubMed

68. Lee, YJ, Lin, YC, Liao, CC, Chang, YS, Huang, YH, Tsai, IJ, et al.. Using anti-malondialdehyde-modified peptide adduct autoantibodies in serum of Taiwanese women to diagnose primary Sjogren’s syndrome. Clin Biochem 2022;108:27–41. https://doi.org/10.1016/j.clinbiochem.2022.07.002.Search in Google Scholar PubMed

69. Lin, C, Chen, CC, Chau, T, Lin, CS, Tsai, SH, Lee, DJ, et al.. Artificial intelligence-enabled electrocardiography identifies severe dyscalcemias and has prognostic value. Clin Chim Acta 2022;536:126–34. https://doi.org/10.1016/j.cca.2022.09.021.Search in Google Scholar PubMed

70. Luo, Y, Szolovits, P, Dighe, AS, Baron, JM. Using machine learning to predict laboratory test results. Am J Clin Pathol 2016;145:778–88. https://doi.org/10.1093/ajcp/aqw064.Search in Google Scholar PubMed

71. Meng, Y, Xu, Y, Liu, J, Qin, X. Early warning signs of thyroid autoantibodies seroconversion: a retrospective cohort study. Clin Chim Acta 2023;545:117365. https://doi.org/10.1016/j.cca.2023.117365.Search in Google Scholar PubMed

72. Mo, D, Zheng, Q, Xiao, B, Li, L. Predicting thalassemia using deep neural network based on red blood cell indices. Clin Chim Acta 2023;543:117329. https://doi.org/10.1016/j.cca.2023.117329.Search in Google Scholar PubMed

73. Monaghan, SA, Li, JL, Liu, YC, Ko, MY, Boyiadzis, M, Chang, TY, et al.. A machine learning approach to the classification of acute leukemias and distinction from nonneoplastic cytopenias using flow cytometry data. Am J Clin Pathol 2022;157:546–53. https://doi.org/10.1093/ajcp/aqab148.Search in Google Scholar PubMed

74. Ng, DP, Wu, D, Wood, BL, Fromm, JR. Computer-aided detection of rare tumor populations in flow cytometry: an example with classic Hodgkin lymphoma. Am J Clin Pathol 2015;144:517–24. https://doi.org/10.1309/ajcpy8e2lyhcgufp.Search in Google Scholar PubMed

75. Ng, DP, Zuromski, LM. Augmented human intelligence and automated diagnosis in flow cytometry for hematologic malignancies. Am J Clin Pathol 2021;155:597–605. https://doi.org/10.1093/ajcp/aqaa166.Search in Google Scholar PubMed

76. Peña-Bautista, C, Durand, T, Oger, C, Baquero, M, Vento, M, Cháfer-Pericás, C. Assessment of lipid peroxidation and artificial neural network models in early Alzheimer Disease diagnosis. Clin Biochem 2019;72:64–70. https://doi.org/10.1016/j.clinbiochem.2019.07.008.Search in Google Scholar PubMed

77. Rashidi, HH, Makley, A, Palmieri, TL, Albahra, S, Loegering, J, Fang, L, et al.. Enhancing military burn- and trauma-related acute kidney injury prediction through an automated machine learning platform and point-of-care testing. Arch Pathol Lab Med 2021;145:320–6. https://doi.org/10.5858/arpa.2020-0110-oa.Search in Google Scholar PubMed

78. Reix, N, Lodi, M, Jankowski, S, Molière, S, Luporsi, E, Leblanc, S, et al.. A novel machine learning-derived decision tree including uPA/PAI-1 for breast cancer care. Clin Chem Lab Med 2019;57:901–10. https://doi.org/10.1515/cclm-2018-1065.Search in Google Scholar PubMed

79. Rigo-Bonnin, R, Gumucio-Sanguino, VD, Pérez-Fernández, XL, Corral-Ansa, L, Fuset-Cabanes, M, Pons-Serra, M, et al.. Individual outcome prediction models for patients with COVID-19 based on their first day of admission to the intensive care unit. Clin Biochem 2022;100:13–21. https://doi.org/10.1016/j.clinbiochem.2021.11.001.Search in Google Scholar PubMed PubMed Central

80. Sans, M, Zhang, J, Lin, JQ, Feider, CL, Giese, N, Breen, MT, et al.. Performance of the MasSpec pen for rapid diagnosis of ovarian cancer. Clin Chem 2019;65:674–83. https://doi.org/10.1373/clinchem.2018.299289.Search in Google Scholar PubMed PubMed Central

81. Shang, H, Hu, Y, Guo, H, Lai, R, Fu, Y, Xu, S, et al.. Using machine learning models to predict HBeAg seroconversion in CHB patients receiving pegylated interferon-α monotherapy. J Clin Lab Anal 2022;36:e24667. https://doi.org/10.1002/jcla.24667.Search in Google Scholar PubMed PubMed Central

82. Simonson, PD, Lee, AY, Wu, D. Potential for process improvement of clinical flow cytometry by incorporating real-time automated screening of data to expedite addition of antibody panels. Am J Clin Pathol 2022;157:443–50. https://doi.org/10.1093/ajcp/aqab166.Search in Google Scholar PubMed

83. Simonson, PD, Wu, Y, Wu, D, Fromm, JR, Lee, AY. De novo identification and visualization of important cell populations for classic Hodgkin lymphoma using flow cytometry and machine learning. Am J Clin Pathol 2021;156:1092–102. https://doi.org/10.1093/ajcp/aqab076.Search in Google Scholar PubMed PubMed Central

84. Soerensen, PD, Christensen, H, Gray Worsoe Laursen, S, Hardahl, C, Brandslund, I, Madsen, JS. Using artificial intelligence in a primary care setting to identify patients at risk for cancer: a risk prediction model based on routine laboratory tests. Clin Chem Lab Med 2022;60:2005–16. https://doi.org/10.1515/cclm-2021-1015.Search in Google Scholar PubMed

85. Streun, GL, Steuer, AE, Poetzsch, SN, Ebert, LC, Dobay, A, Kraemer, T. Towards a new qualitative screening assay for synthetic cannabinoids using metabolomics and machine learning. Clin Chem 2022;68:848–55. https://doi.org/10.1093/clinchem/hvac045.Search in Google Scholar PubMed

86. Stroek, K, Visser, A, van der Ploeg, CPB, Zwaveling-Soonawala, N, Heijboer, AC, Bosch, AM, et al.. Machine learning to improve false-positive results in the Dutch newborn screening for congenital hypothyroidism. Clin Biochem 2023;116:7–10. https://doi.org/10.1016/j.clinbiochem.2023.03.001.Search in Google Scholar PubMed

87. Su, X, Xu, Y, Tan, Z, Wang, X, Yang, P, Su, Y, et al.. Prediction for cardiovascular diseases based on laboratory data: an analysis of random forest model. J Clin Lab Anal 2020;34:e23421. https://doi.org/10.1002/jcla.23421.Search in Google Scholar PubMed PubMed Central

88. Su, M, Guo, J, Chen, H, Huang, J. Developing a machine learning prediction algorithm for early differentiation of urosepsis from urinary tract infection. Clin Chem Lab Med 2023;61:521–9. https://doi.org/10.1515/cclm-2022-1006.Search in Google Scholar PubMed

89. Sun, K, Zhang, X, Li, X, Li, X, Su, S, Luo, Y, et al.. Plasma metabolic signatures for intracranial aneurysm and its rupture identified by pseudotargeted metabolomics. Clin Chim Acta 2023;538:36–45. https://doi.org/10.1016/j.cca.2022.11.002.Search in Google Scholar PubMed

90. Tang, Z, Zhang, F, Wang, Y, Zhang, C, Li, X, Yin, M, et al.. Diagnosis of hepatocellular carcinoma based on salivary protein glycopatterns and machine learning algorithms. Clin Chem Lab Med 2022;60:1963–73. https://doi.org/10.1515/cclm-2022-0715.Search in Google Scholar PubMed

91. Binson, VA, Subramoniam, M, Mathew, L. Detection of COPD and Lung Cancer with electronic nose using ensemble learning methods. Clin Chim Acta 2021;523:231–8. https://doi.org/10.1016/j.cca.2021.10.005.Search in Google Scholar PubMed

92. Van Woensel, W, Elnenaei, M, Abidi, SSR, Clarke, DB, Imran, SA. Staged reflexive artificial intelligence driven testing algorithms for early diagnosis of pituitary disorders. Clin Biochem 2021;97:48–53. https://doi.org/10.1016/j.clinbiochem.2021.08.005.Search in Google Scholar PubMed

93. Vogg, N, Müller, T, Floren, A, Dandekar, T, Riester, A, Dischinger, U, et al.. Simplified urinary steroid profiling by LC-MS as diagnostic tool for malignancy in adrenocortical tumors. Clin Chim Acta 2023;543:117301. https://doi.org/10.1016/j.cca.2023.117301.Search in Google Scholar PubMed

94. Wang, H, Wang, H, Zhang, J, Li, X, Sun, C, Zhang, Y. Using machine learning to develop an autoverification system in a clinical biochemistry laboratory. Clin Chem Lab Med 2021;59:883–91. https://doi.org/10.1515/cclm-2020-0716.Search in Google Scholar PubMed

95. Wang, W, He, Z, Kong, Y, Liu, Z, Gong, L. GC-MS-based metabolomics reveals new biomarkers to assist the differentiation of prostate cancer and benign prostatic hyperplasia. Clin Chim Acta 2021;519:10–17. https://doi.org/10.1016/j.cca.2021.03.021.Search in Google Scholar PubMed

96. Wilkes, EH, Rumsby, G, Woodward, GM. Using machine learning to aid the interpretation of urine steroid profiles. Clin Chem 2018;64:1586–95. https://doi.org/10.1373/clinchem.2018.292201.Search in Google Scholar PubMed

97. Wilkes, EH, Emmett, E, Beltran, L, Woodward, GM, Carling, RS. A machine learning approach for the automated interpretation of plasma amino acid profiles. Clin Chem 2020;66:1210–18. https://doi.org/10.1093/clinchem/hvaa134.Search in Google Scholar PubMed

98. Wu, KL, Chou, CY, Chang, HY, Wu, CH, Li, AL, Chen, CL, et al.. Peritoneal effluent MicroRNA profile for detection of encapsulating peritoneal sclerosis. Clin Chim Acta 2022;536:45–55. https://doi.org/10.1016/j.cca.2022.09.007.Search in Google Scholar PubMed

99. Yang, HS, Hou, Y, Vasovic, LV, Steel, PAD, Chadburn, A, Racine-Brzostek, SE, et al.. Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning. Clin Chem 2020;66:1396–404. https://doi.org/10.1093/clinchem/hvaa200.Search in Google Scholar PubMed PubMed Central

100. Yang, J, Xiang, C, Liu, J. Clinical significance of combining salivary mRNAs and carcinoembryonic antigen for ovarian cancer detection. Scand J Clin Lab Invest 2021;81:39–45. https://doi.org/10.1080/00365513.2020.1852478.Search in Google Scholar PubMed

101. Yang, C, Zhou, S, Zhu, J, Sheng, H, Mao, W, Fu, Z, et al.. Plasma lipid-based machine learning models provides a potential diagnostic tool for colorectal cancer patients. Clin Chim Acta 2022;536:191–9. https://doi.org/10.1016/j.cca.2022.09.002.Search in Google Scholar PubMed

102. Zheng, H, Hu, Y, Dong, L, Shu, Q, Zhu, M, Li, Y, et al.. Predictive diagnosis of chronic obstructive pulmonary disease using serum metabolic biomarkers and least-squares support vector machine. J Clin Lab Anal 2021;35:e23641. https://doi.org/10.1002/jcla.23641.Search in Google Scholar PubMed PubMed Central

103. Zheng, H, Zheng, P, Zhao, L, Jia, J, Tang, S, Xu, P, et al.. Predictive diagnosis of major depression using NMR-based metabolomics and least-squares support vector machine. Clin Chim Acta 2017;464:223–7. https://doi.org/10.1016/j.cca.2016.11.039.Search in Google Scholar PubMed

104. Constantinescu, G, Schulze, M, Peitzsch, M, Hofmockel, T, Scholl, UI, Williams, TA, et al.. Integration of artificial intelligence and plasma steroidomics with laboratory information management systems: application to primary aldosteronism. Clin Chem Lab Med 2022;60:1929–37. https://doi.org/10.1515/cclm-2022-0470.Search in Google Scholar PubMed

105. Çubukçu, HC, Topcu, D, Bayraktar, N, Gülşen, M, Sarı, N, Arslan, AH. Detection of COVID-19 by machine learning using routine laboratory tests. Am J Clin Pathol 2022;157:758–66. https://doi.org/10.1093/ajcp/aqab187.Search in Google Scholar PubMed PubMed Central

106. Çubukçu, HC, Topcu, D. Estimation of low-density lipoprotein cholesterol concentration using machine learning. Lab Med 2022;53:161–71. https://doi.org/10.1093/labmed/lmab065.Search in Google Scholar PubMed

107. Dabla, PK, Upreti, K, Singh, D, Singh, A, Sharma, J, Dabas, A, et al.. Target association rule mining to explore novel paediatric illness patterns in emergency settings. Scand J Clin Lab Invest 2022;82:595–600. https://doi.org/10.1080/00365513.2022.2148121.Search in Google Scholar PubMed

108. Benirschke, RC, Gniadek, TJ. Detection of falsely elevated point-of-care potassium results due to hemolysis using predictive analytics. Am J Clin Pathol 2020;154:242–7. https://doi.org/10.1093/ajcp/aqaa039.Search in Google Scholar PubMed

109. Chabrun, F, Dieu, X, Ferre, M, Gaillard, O, Mery, A, Chao de la Barca, JM, et al.. Achieving expert-level interpretation of serum protein electrophoresis through deep learning driven by human reasoning. Clin Chem 2021;67:1406–14. https://doi.org/10.1093/clinchem/hvab133.Search in Google Scholar PubMed

110. Tsai, ER, Demirtas, D, Hoogendijk, N, Tintu, AN, Boucherie, RJ. Turnaround time prediction for clinical chemistry samples using machine learning. Clin Chem Lab Med 2022;60:1902–10. https://doi.org/10.1515/cclm-2022-0668.Search in Google Scholar PubMed

111. Carobene, A, Cabitza, F, Bernardini, S, Gopalan, R, Lennerz, JK, Weir, C, et al.. Where is laboratory medicine headed in the next decade? Partnership model for efficient integration and adoption of artificial intelligence into medical laboratories. Clin Chem Lab Med 2023;61:535–43. https://doi.org/10.1515/cclm-2023-0352.Search in Google Scholar PubMed

112. Linardatos, P, Papastefanopoulos, V, Kotsiantis, S. Explainable AI: a review of machine learning interpretability methods. Entropy 2020;23:1–45. https://doi.org/10.3390/e23010018.Search in Google Scholar PubMed PubMed Central

113. Yang, G, Ye, Q, Xia, J. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: a mini-review, two showcases and beyond. Inf Fusion 2022;77:29–52. https://doi.org/10.1016/j.inffus.2021.07.016.Search in Google Scholar PubMed PubMed Central

114. Bellamy, RKE, Dey, K, Hind, M, Hoffman, SC, Houde, S, Kannan, K, et al.. AI Fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J Res Dev 2019;63:4:1–4:15. https://doi.org/10.1147/jrd.2019.2942287.Search in Google Scholar

115. Wilkinson, MD, Dumontier, M, Aalbersberg, IJ, Appleton, G, Axton, M, Baak, A, et al.. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. https://doi.org/10.1038/sdata.2016.18.Search in Google Scholar PubMed PubMed Central

116. Jacobsen, A, de Miranda Azevedo, R, Juty, N, Batista, D, Coles, S, Cornet, R, et al.. FAIR principles: interpretations and implementation considerations. Data Intell 2020;2:10–29. https://doi.org/10.1162/dint_r_00024.Search in Google Scholar

117. Queralt-Rosinach, N, Kaliyaperumal, R, Bernabé, CH, Long, Q, Joosten, SA, van der Wijk, HJ, et al.. Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic. J Biomed Semantics 2022;13:12. https://doi.org/10.1186/s13326-022-00263-7.Search in Google Scholar PubMed PubMed Central

118. Barredo Arrieta, A, Díaz-Rodríguez, N, Del Ser, J, Bennetot, A, Tabik, S, Barbado, A, et al.. Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 2020;58:82–115. https://doi.org/10.1016/j.inffus.2019.12.012.Search in Google Scholar

119. Holzinger, A, Langs, G, Denk, H, Zatloukal, K, Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov 2019;9:e1312.10.1002/widm.1312Search in Google Scholar PubMed PubMed Central

120. Gunning, D, Stefik, M, Choi, J, Miller, T, Stumpf, S, Yang, GZ. XAI-Explainable artificial intelligence. Sci Robot 2019;4:1–2. https://doi.org/10.1126/scirobotics.aay7120.Search in Google Scholar PubMed

121. Ali, S, Abuhmed, T, El-Sappagh, S, Muhammad, K, Alonso-Moral, JM, Confalonieri, R, et al.. Explainable Artificial Intelligence (XAI): what we know and what is left to attain trustworthy artificial intelligence. Inf Fusion 2023;99:101805. https://doi.org/10.1016/j.inffus.2023.101805.Search in Google Scholar

122. Markus, AF, Kors, JA, Rijnbeek, PR. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inf 2021;113:103655. https://doi.org/10.1016/j.jbi.2020.103655.Search in Google Scholar PubMed

123. Alfeo, AL, Zippo, AG, Catrambone, V, Cimino, M, Toschi, N, Valenza, G. From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput Methods Progr Biomed 2023;236:107550. https://doi.org/10.1016/j.cmpb.2023.107550.Search in Google Scholar PubMed PubMed Central

124. Topcu, Dİ. How to explain a machine learning model: HbA1c classification example. J Med Palliat Care 2023;4:117–25. https://doi.org/10.47582/jompac.1259507.Search in Google Scholar

125. Cabitza, F, Rasoini, R, Gensini, GF. Unintended consequences of machine learning in medicine. JAMA 2017;318:517–8. https://doi.org/10.1001/jama.2017.7797.Search in Google Scholar PubMed

126. Master, SR, Badrick, TC, Bietenbeck, A, Haymond, S. Machine learning in laboratory medicine: recommendations of the IFCC Working Group. Clin Chem 2023;69:690–8. https://doi.org/10.1093/clinchem/hvad055.Search in Google Scholar PubMed PubMed Central

127. Ghassemi, M, Oakden-Rayner, L, Beam, AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health 2021;3:e745–50. https://doi.org/10.1016/s2589-7500(21)00208-9.Search in Google Scholar

128. Magrabi, F, Ammenwerth, E, McNair, JB, De Keizer, NF, Hyppönen, H, Nykänen, P, et al.. Artificial intelligence in clinical decision support: challenges for evaluating AI and practical implications. Yearb Med Inf 2019;28:128–34. https://doi.org/10.1055/s-0039-1677903.Search in Google Scholar PubMed PubMed Central

129. Peiffer-Smadja, N, Rawson, TM, Ahmad, R, Buchard, A, Georgiou, P, Lescure, FX, et al.. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect 2020;26:584–95. https://doi.org/10.1016/j.cmi.2019.09.009.Search in Google Scholar PubMed

130. Sharma, G, Carter, A. Artificial intelligence and the pathologist: future frenemies? Arch Pathol Lab Med 2017;141:622–3. https://doi.org/10.5858/arpa.2016-0593-ed.Search in Google Scholar

131. International Organization for Standardization. ISO 15189:2012 Medical laboratories – requirements for quality and competence; 2022.Search in Google Scholar

132. The European Parliament and the Council of the European Union. European Parliament and Council Regulation (EU) 2017/746 of 5 April 2017 on in vitro diagnostic medical devices. Available at: https://eur-lex.europa.eu/eli/reg/2017/746/oj [Accessed 16 Nov 2023].Search in Google Scholar

133. The European Parliament and the Council of the European Union. European Parliament and Council Regulation (EU) 2017/745 of 5 April 2017 on medical devices. Available at: https://eur-lex.europa.eu/eli/reg/2017/745/oj [Accessed 16 Nov 2023].Search in Google Scholar

134. Vanstapel, F, Orth, M, Streichert, T, Capoluongo, ED, Oosterhuis, WP, Çubukçu, HC, et al.. ISO 15189 is a sufficient instrument to guarantee high-quality manufacture of laboratory developed tests for in-house-use conform requirements of the European In-Vitro-Diagnostics Regulation. Clin Chem Lab Med 2023;61:608–26. https://doi.org/10.1515/cclm-2023-0045.Search in Google Scholar PubMed

135. Kelly, CJ, Karthikesalingam, A, Suleyman, M, Corrado, G, King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195. https://doi.org/10.1186/s12916-019-1426-2.Search in Google Scholar PubMed PubMed Central

136. Ellis, RJ, Sander, RM, Limon, A. Twelve key challenges in medical machine learning and solutions. Intell Based Med 2022;6:100068. https://doi.org/10.1016/j.ibmed.2022.100068.Search in Google Scholar

137. Pennestrì, F, Banfi, G. Artificial intelligence in laboratory medicine: fundamental ethical issues and normative key-points. Clin Chem Lab Med 2022;60:1867–74. https://doi.org/10.1515/cclm-2022-0096.Search in Google Scholar PubMed

138. Vasey, B, Ursprung, S, Beddoe, B, Taylor, EH, Marlow, N, Bilbro, N, et al.. Association of clinician diagnostic performance with machine learning-based decision support systems: a systematic review. JAMA Netw Open 2021;4:e211276. https://doi.org/10.1001/jamanetworkopen.2021.1276.Search in Google Scholar PubMed PubMed Central

139. van Baalen, S, Boon, M, Verhoef, P. From clinical decision support to clinical reasoning support systems. J Eval Clin Pract 2021;27:520–8. https://doi.org/10.1111/jep.13541.Search in Google Scholar PubMed PubMed Central

140. Bates, DW, Auerbach, A, Schulam, P, Wright, A, Saria, S. Reporting and implementing interventions involving machine learning and artificial intelligence. Ann Intern Med 2020;172:S137–44. https://doi.org/10.7326/m19-0872.Search in Google Scholar PubMed

141. Schwartz, JM, Moy, AJ, Rossetti, SC, Elhadad, N, Cato, KD. Clinician involvement in research on machine learning-based predictive clinical decision support for the hospital setting: a scoping review. J Am Med Inf Assoc 2021;28:653–63. https://doi.org/10.1093/jamia/ocaa296.Search in Google Scholar PubMed PubMed Central

142. Bietenbeck, A, Streichert, T. Preparing laboratories for interconnected health care. Diagnostics 2021;11:1–8. https://doi.org/10.3390/diagnostics11081487.Search in Google Scholar PubMed PubMed Central

143. Gopal, G, Suter-Crazzolara, C, Toldo, L, Eberhardt, W. Digital transformation in healthcare – architectures of present and future information technologies. Clin Chem Lab Med 2019;57:328–35. https://doi.org/10.1515/cclm-2018-0658.Search in Google Scholar PubMed

144. The U.S. Food and Drug Administration, Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency. Good machine learning practice for medical device development: guiding principles, October 2021. Available at: https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles [Accessed 1 May 2023].Search in Google Scholar

Received: 2023-09-15

Accepted: 2023-11-17

Published Online: 2023-11-29

Published in Print: 2024-04-25

Machine learning-based clinical decision support using laboratory data

Abstract

Introduction

Overview of machine learning-based decision support using clinical laboratory data

Machine learning model development and performance evaluation

Automated machine learning

Clinical decision support

Recent technological advances in laboratory medicine and the role of machine learning in clinical decision-making

Decision support with machine learning in the total testing process

Machine learning in the pre-analytical phase

Clot detection

Specimen mix-up – wrong blood in tube error detection

Sample dilution management

Detecting chemical manipulation in urine samples

Assessing serum quality based on hemolysis, icterus, and lipemia

Improving PBFC test utilization

Machine learning in the analytical phase

Cell image analysis

Assessing analytically acceptable mass spectrometry results

Quality control

Machine learning in the post-analytical phase

Explainable AI

Approaches for explainable AI

Model-specific explainers

Model-agnostic explainers

Challenges and optimal implementation ways of machine learning models

Challenges

Optimal implementation ways

Conclusions

References

Journal and Issue

Articles in the same Issue