Once an AI model exhibits ‘deceptive behavior’ it can be hard to correct, researchers at OpenAI competitor Anthropic found::Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can’t reverse.

    • HelloHotel@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      10 months ago

      An AI thats evil to everything isnt sympathetic to its creators. But The users have no hope of controlling it either.