Sleep and Breathing, cilt.30, sa.1, 2026 (SCI-Expanded, Scopus)
Purpose: To explore the feasibility of using camera-derived, non-contact audio synchronized with PSG for clinically relevant sleep-apnea classification, and to benchmark compact deep models under a subject-aware design using a previously unstudied, real-world dataset. Methods: Thirty-two adults underwent simultaneous polysomnography (PSG) and camera-based non-contact audio recording. The synchronized audio segments were used to train and compare three compact deep-learning architectures (convolutional, attention-augmented, and transformer-based) under a subject-aware evaluation design that prevented identity leakage. Model performance and calibration were assessed at both segment and subject levels using standard statistical tests. Results: Subject-level evaluation was based on a very small, imbalanced test set of six subjects (one positive). Within this limited yet previously unstudied local dataset, the CNN_trans model achieved an apparent perfect ranking performance (AUC = 1.00; 95% CI 0.00–1.00), though this likely reflects the small, imbalanced test cohort, with recall = 1.00 and precision = 0.55. The wide confidence interval reflects substantial statistical uncertainty, and DeLong comparisons showed no significant AUC difference between CNN_trans and CNN_att (ΔAUC = − 0.042; p = 0.43). Conclusion: PSG-synchronized, non-contact audio supports accurate and well-calibrated sleep-apnea classification with compact deep models. This subject-aware evaluation suggests that contactless acoustic monitoring may have potential clinical relevance, motivating larger, multi-site validation.