OPEN ACCESS pISSN 2234-3806
eISSN 2234-3814

Table. 1.
Key data types, preprocessing techniques, and applications for artificial intelligence in electronic health records
Data type Examples of data Preprocessing techniques Applications
Numerical Clinically measured parameters (e.g., blood pressure level, blood glucose level) Outlier removal, imputation for missing value, scaling (standardization, normalization) Disease/patient status identification, disease occurrence prediction, clinical outcome estimation/prediction, numerical abnormality detection, reasoning of contributing factors to outcomes
Categorical Coded parameters (e.g., patient type, disease code) Imputation for missing value, encoding Same as for numerical data
Text Nursing note, doctor’s note, manual error report Tokenization, non-words removal, vectorization (e.g., bag of words, term frequency-inverse document frequency, word embedding) Generating report, disease/patient status identification, disease occurrence prediction, reasoning of contributing factors to outcomes
Image X-ray, CT, MRI, PET, US (image), tissue image, skin photography Image conversion (e.g., resampling, bit-depth conversion, domain transformation, normalization, regularization), image processing (e.g., noise reduction, image quality enhancement, image restoration, segmentation), recognition/feature extraction (e.g., region of interest, object/situation, feature) Improving traditional image processing technology, identification/classification (e.g., cell type), counting/enumeration (e.g., cell, chromosome), numerical estimation (e.g., ventricular volume, lung volume), disease/patient status identification, reasoning of contributing factors to outcomes, data curation (e.g., annotation, labeling, description)
Video US (video), echo, endo, telemedicine/teleconsultation, surgical video, video for medical education Same as for image preprocessingand frame processing (e.g., frame extraction and selection, temporal resampling, temporal segmentation) Same as for image data and captioning
Speech Psychiatric consultation, diagnostic conversation Traditional audio processing (e.g., noise reduction, normalization, feature extraction, segmentation. domain transformation) Generating reports, speech-to-text transformation for medical dialogue, disease identification using voice or contents
Signal Auscultation (heart and lung sound), ECG, EEG, EMG, EOG, snoring Signal conversion (e.g., resampling, bit-depth conversion, domain transformation, normalization, regularization), signal conditioning (e.g., noise reduction, signal quality enhancement, signal restoration), feature extraction (e.g., QRS complex) Improving traditional bio-signal processing technologies, disease/patient status identification, clinical outcome prediction, reasoning of contributing factors to outcomes, data curation (e.g., annotation, labeling)

Abbreviations: CT, computed tomography; MRI, magnetic resonance imaging; PET, positron emission tomography; US, ultrasound; ECG, electrocardiography; EEG, electroencephalography; EMG, electromyography; EOG, electrooculography; Echo, echocardiography; Endo, endoscopic imaging; QRS, Q, R, and S waves.

Ann Lab Med 2025;45:1~11 https://doi.org/10.3343/alm.2024.0258

© Ann Lab Med