Detecting Poison Attacks using Neural Collapse Geometry
Posted:
There are specifically two types of Data Poison Attacks - Untargeted & Targeted. However, detecting untergetted attacks is fairly simple as the overall accuracy of the model decreases. But it becomes a problem when the attack is targeted. Until and unless, the query for that specific label/feature is not triggered, you will never understand that you are hacked! There are methods to identify Targeted Attacks but lets try to figure out if these attacks can be identified / corrected using embeddings’ representation, and thus utilising the Neural Collapse to detect poisoned labels/features.