Robust suicide risk assessment on social media via deep adversarial learning

Machine learning models for suicide risk assessment on social media are vulnerable to adversarial attacks and may fail when deployed in real-world scenarios with noisy or manipulated data. We propose a robust approach for suicide risk assessment that uses deep adversarial learning to improve model resilience against various forms of input perturbations and attacks. Our method incorporates adversarial training techniques specifically designed for text-based mental health applications, where small changes in language can significantly alter meaning and clinical interpretation. The approach trains models to be robust against both intentional adversarial attacks and natural variations in social media language, including typos, slang, and evolving linguistic patterns. We develop a comprehensive adversarial training framework that generates realistic perturbations while preserving the clinical meaning of mental health indicators. The robust model maintains high accuracy on clean data while showing significant improvements in performance on perturbed inputs. Evaluation on multiple social media datasets demonstrates that our adversarially trained models are more reliable and generalizable for real-world deployment in mental health monitoring systems.