Turing Award laureate and renowned AI researcher Yoshua Bengio announced the launch of a nonprofit organization, LawZero, whose main goal is to develop safe artificial intelligence systems.
As part of LawZero’s activities, Bengio and a team of over a dozen researchers are working on a system called Scientist AI, designed to detect and prevent harmful behavior by autonomous AI agents. The model is intended to act as a “psychologist,” analyzing and predicting potentially dangerous actions of other systems, including attempts to deceive or avoid shutdown. “We aim to create AI that is honest and not misleading,” Bengio noted.
Scientist AI will not provide definitive answers but will only assess the likelihood of the correctness of information and the risk of harm. If the probability of harm exceeds a certain threshold, the system will block the corresponding agent’s action. The model is planned to be trained using open generative AI, allowing adaptation to different types of agents.
Bengio emphasized the importance of ensuring that such protective systems are no less powerful than those they monitor. In his view, the current competition among leading AI companies does not guarantee a sufficient level of safety. “The goal is to demonstrate the effectiveness of the methodology to convince donors, governments, or AI labs to allocate the necessary resources to scale this work,” he explained.