Data type | Examples of data | Preprocessing techniques | Applications |
---|---|---|---|
Numerical | Clinically measured parameters (e.g., blood pressure level, blood glucose level) | Outlier removal, imputation for missing value, scaling (standardization, normalization) | Disease/patient status identification, disease occurrence prediction, clinical outcome estimation/prediction, numerical abnormality detection, reasoning of contributing factors to outcomes |
Categorical | Coded parameters (e.g., patient type, disease code) | Imputation for missing value, encoding | Same as for numerical data |
Text | Nursing note, doctor’s note, manual error report | Tokenization, non-words removal, vectorization (e.g., bag of words, term frequency-inverse document frequency, word embedding) | Generating report, disease/patient status identification, disease occurrence prediction, reasoning of contributing factors to outcomes |
Image | X-ray, CT, MRI, PET, US (image), tissue image, skin photography | Image conversion (e.g., resampling, bit-depth conversion, domain transformation, normalization, regularization), image processing (e.g., noise reduction, image quality enhancement, image restoration, segmentation), recognition/feature extraction (e.g., region of interest, object/situation, feature) | Improving traditional image processing technology, identification/classification (e.g., cell type), counting/enumeration (e.g., cell, chromosome), numerical estimation (e.g., ventricular volume, lung volume), disease/patient status identification, reasoning of contributing factors to outcomes, data curation (e.g., annotation, labeling, description) |
Video | US (video), echo, endo, telemedicine/teleconsultation, surgical video, video for medical education | Same as for image preprocessingand frame processing (e.g., frame extraction and selection, temporal resampling, temporal segmentation) | Same as for image data and captioning |
Speech | Psychiatric consultation, diagnostic conversation | Traditional audio processing (e.g., noise reduction, normalization, feature extraction, segmentation. domain transformation) | Generating reports, speech-to-text transformation for medical dialogue, disease identification using voice or contents |
Signal | Auscultation (heart and lung sound), ECG, EEG, EMG, EOG, snoring | Signal conversion (e.g., resampling, bit-depth conversion, domain transformation, normalization, regularization), signal conditioning (e.g., noise reduction, signal quality enhancement, signal restoration), feature extraction (e.g., QRS complex) | Improving traditional bio-signal processing technologies, disease/patient status identification, clinical outcome prediction, reasoning of contributing factors to outcomes, data curation (e.g., annotation, labeling) |
Abbreviations: CT, computed tomography; MRI, magnetic resonance imaging; PET, positron emission tomography; US, ultrasound; ECG, electrocardiography; EEG, electroencephalography; EMG, electromyography; EOG, electrooculography; Echo, echocardiography; Endo, endoscopic imaging; QRS, Q, R, and S waves.
© Ann Lab Med