Ann Lab Med 2023; 43(5): 399-400

Published online September 1, 2023

Copyright © Korean Society for Laboratory Medicine.

Laboratory Data Quality Evaluation in the Big Data Era

Sollip Kim , M.D., Ph.D.

Department of Laboratory Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea

Correspondence to: Sollip Kim, M.D., Ph.D.
Department of Laboratory Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, Korea
Tel: +82-2-3010-4553, Fax: +82-2-2045-3081

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

“Big data” are increasingly being used to conduct research in the field of healthcare [1] as well as artificial intelligence. Laboratory results account for a large proportion of big data in healthcare. As most test results from clinical laboratories are quantitative, big data researchers who are not experts in the field of laboratory medicine often believe that all numerical results are appropriate for research. However, this is not true. Despite the long journey of standardization and harmonization efforts [2-4], a large bias in test results is observed when the same sample is tested in different laboratories. Even for standardized or harmonized test items, big data results may be biased if unreliable test results from certain laboratories are included. Therefore, it is challenging to select reliable research-level, real-world laboratory results, obtained for clinical purposes, for use as secondary data in big data analysis [5].

In this issue of Annals of Laboratory Medicine, Cho, et al. [6] propose a strategy for evaluating the quality of laboratory results suitable for big data research. They analyzed more than 30,000 external quality assessment (EQA) results for seven test items, using commutable frozen human serum pools in the Korean Association of External Quality Assessment Service (KEQAS) program [7]. EQA results from the accuracy-based proficiency testing program, such as HbA1c, creatinine, total cholesterol, and triglyceride, were compared with target values measured using the reference measurement procedure used in certified reference laboratories. EQA results of alpha-fetoprotein and prostate-specific antigen with relevant international standards were compared with mean peer group values. EQA results of cardiac troponin I (cTnI), for which harmonization was still ongoing, were compared with an all-method mean value. The acceptance rates of the EQA results of the seven test items were only 67.5%–100%, 42.9%–100%, and 22.9%–99.5% within the minimum, desirable, and optimum criteria, respectively. The EQA results from the KEQAS participants exhibited significant differences according to the quality grade based on the total error. For example, the mean percentage bias for cTnI results within the optimum, desirable, minimum, and unacceptable criteria was 4.4%, 6.5%, 7.2%, and 46.0%, respectively. Cho, et al. [6] concluded that even test results that passed the EQA acceptance criteria did not guarantee the quality for inclusion in big data. Thus, when constructing laboratory big data, data quality should be evaluated and poor quality data excluded.

Although Cho, et al. [6] did not suggest a detailed evaluation protocol, they highlighted the necessity of evaluating data quality and established a new evaluation model using EQA data. As EQA can only guarantee a laboratory’s performance at a given point in time and big data in healthcare include longitudinal patient records, accumulated EQA results from each laboratory must be analyzed to determine whether they can be included in big data analysis [5]. Further evaluation of other test items is warranted.

In summary, Cho, et al. [6] showed that participants’ EQA results can be used to evaluate laboratory data as a surrogate for real laboratory data. As specialists of laboratory medicine, we should continue to develop appropriate methods for research-level laboratory data quality assessment in the big data era.

Kim S contributed to writing the manuscript and approved the final manuscript.

  1. Wang L and Alexander CA. Big data analytics in medical engineering and healthcare: methods, advances and challenges. J Med Eng Technol 2020;44:267-83.
    Pubmed CrossRef
  2. Jeong T, Cho E, Lee K, Lee W, Yun YM, Chun S, et al. Recent trends in creatinine assays in Korea: Long-term accuracy-based proficiency testing survey data by the Korean Association of External Quality Assessment Service (2011-2019). Ann Lab Med 2021;41:372-9.
    Pubmed KoreaMed CrossRef
  3. Yoon YA, Lee YW, Kim S, Lee K, Park HD, Chun S, et al. Standardization status of total cholesterol concentration measurement: analysis of Korean External Quality Assessment Data. Ann Lab Med 2021;41:366-71.
    Pubmed KoreaMed CrossRef
  4. Nam Y, Lee JH, Kim SM, Jun SH, Song SH, Lee K, et al. Periodic comparability verification and within-laboratory harmonization of clinical chemistry laboratory results at a large healthcare center with multiple instruments. Ann Lab Med 2022;42:150-9.
    Pubmed KoreaMed CrossRef
  5. Kim S, Cho EJ, Jeong TD, Park HD, Yun YM, Lee K, et al. Proposed model for evaluating real-world laboratory results for big data research. Ann Lab Med 2023;43:104-7.
    Pubmed KoreaMed CrossRef
  6. Cho EJ, Jeong TD, Kim S, Park HD, Yun YM, Chun S, et al. A new strategy for evaluating the quality of laboratory results for big data research: using external quality assessment survey data (2010-2020). Ann Lab Med 2023;43:425-33.
    Pubmed CrossRef
  7. Kim S, Lee K, Park HD, Lee YW, Chun S, Min WK. Schemes and Performance Evaluation Criteria of Korean Association of External Quality Assessment (KEQAS) for Improving Laboratory Testing. Ann Lab Med 2021;41:230-9.
    Pubmed KoreaMed CrossRef