Can AI sandbag safety checks to sabotage users? Yes, but not very well

Top tags

Today most read articles

Article

Can AI sandbag safety checks to sabotage users? Yes, but not very well — for now
- very
AI companies claim to have robust safety checks in place that ensure that models don’t say or do weird, illegal, or unsafe stuff. But what if the models were capable of evading those checks and, for some reason, trying to sabotage or mislead users?...

2024-10-20 20:00 – read more on

Can AI sandbag safety checks to sabotage users? Yes, but not very well — for now