-
Trump says China's Xi to visit US 'toward the end of the year'
-
Real Madrid edge Valencia to stay on Barca's tail, Atletico slump
-
Malinin keeps USA golden in Olympic figure skating team event
-
Lebanon building collapse toll rises to 9: civil defence
-
Real Madrid keep pressure on Barca with tight win at Valencia
-
PSG trounce Marseille to move back top of Ligue 1
-
Hong Kong to sentence media mogul Jimmy Lai in national security trial
-
Lillard will try to match record with third NBA 3-Point title
-
Vonn breaks leg as crashes out in brutal end to Olympic dream
-
Malinin enters the fray as Japan lead USA in Olympics team skating
-
Thailand's Anutin readies for coalition talks after election win
-
Fans arrive for Patriots-Seahawks Super Bowl as politics swirl
-
'Send Help' repeats as N.America box office champ
-
Japan close gap on USA in Winter Olympics team skating event
-
Liverpool improvement not reflected in results, says Slot
-
Japan PM Takaichi basks in election triumph
-
Machado's close ally released in Venezuela
-
Dimarco helps Inter to eight-point lead in Serie A
-
Man City 'needed' to beat Liverpool to keep title race alive: Silva
-
Czech snowboarder Maderova lands shock Olympic parallel giant slalom win
-
Man City fight back to end Anfield hoodoo and reel in Arsenal
-
Diaz treble helps Bayern crush Hoffenheim and go six clear
-
US astronaut to take her 3-year-old's cuddly rabbit into space
-
Israeli president to honour Bondi Beach attack victims on Australia visit
-
Apologetic Turkish center Sengun replaces Shai as NBA All-Star
-
Romania, Argentina leaders invited to Trump 'Board of Peace' meeting
-
Kamindu heroics steer Sri Lanka past Ireland in T20 World Cup
-
Age just a number for veteran Olympic snowboard champion Karl
-
England's Feyi-Waboso out of Scotland Six Nations clash
-
Thailand's pilot PM lands runaway election win
-
Sarr strikes as Palace end winless run at Brighton
-
Olympic star Ledecka says athletes ignored in debate over future of snowboard event
-
Auger-Aliassime retains Montpellier Open crown
-
Lindsey Vonn, skiing's iron lady whose Olympic dream ended in tears
-
Conservative Thai PM claims election victory
-
Kamindu fireworks rescue Sri Lanka to 163-6 against Ireland
-
UK PM's top aide quits in scandal over Mandelson links to Epstein
-
Reed continues Gulf romp with victory in Qatar
-
Conservative Thai PM heading for election victory: projections
-
Heartache for Olympic downhill champion Johnson after Vonn's crash
-
Takaichi on course for landslide win in Japan election
-
Wales coach Tandy will avoid 'knee-jerk' reaction to crushing England loss
-
Sanae Takaichi, Japan's triumphant first woman PM
-
England avoid seismic shock by beating Nepal in last-ball thriller
-
Karl defends Olympic men's parallel giant slalom crown
-
Colour and caution as banned kite-flying festival returns to Pakistan
-
England cling on to beat Nepal in last-ball thriller
-
UK foreign office to review pay-off to Epstein-linked US envoy
-
England's Arundell eager to learn from Springbok star Kolbe
-
Czech snowboard great Ledecka fails in bid for third straight Olympic gold
AI is learning to lie, scheme, and threaten its creators
The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.
- 'Strategic kind of deception' -
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."
The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."
Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).
- No rules -
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.
H.Silva--PC