Meta’s “Fundamental AI Research” (FAIR) department has released the ‘Self-Taught Evaluator’, a tool that supposedly can assess and improve the accuracy of other AI models, without any human intervention, potentially bypassing human involvement during the future development of AI.
The process was introduced in a paper (https://arxiv.org/pdf/2408.02666) which describes how a model would use the same “chain of ressoning” method that OpenAI’s o1 model uses to generate different outputs from AI models before then using another AI system to assess the accuracy and improve the outputs to address inaccuracies. The STEP (Self-Taught Evaluator Process) is claimed by Meta researchers to use only AI-generated data to train the evaluator model, removing the need for human input or oversight, and early reports claim that it performs better than models that rely on human-labeled data.
This is important because AI experts have long predicted that a ‘Self-Taught Evaluator’ could reduce the need for expensive and inefficient processes that rely on human data labelling so that autonomous AI agents can learn from their own mistakes, and perform better than any human can, but in one infamous experiment in the USA in 2022, an early version also adjusted the guardrails as these were considered restrictive, but many of the same experts, including Hinton, O’Leahy, Yudowski and others have warned that STEP is the process that AI needs to develop in order to become an existential threat to mankind because unless the AI goals are exactly aligned to the goals of mankind, then AI’s goals will inevitably conflict with those of mankind, leading to mankind being ultimately subjugated to a servant of AI. The example often cited is that alignment of an apparently simple goal of “reduce the carbon dioxide in the atmosphere” which results in the forced destruction of cows, a leading producer of Carbon dioxide, coupled with the hacking of coal station power-plants, coal-mines and their distribution networks to bring them to an immediate halt in such a way as to cause irrevocable harm so that starting them up would be prohibitively expensive, regardless of the consequence to human society.