Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.
Announced in December, 2.0 Flash Thinking rivals OpenAI's o1 and o3-mini reasoning models in that it's capable of working ...