Anthropic claims its latest model is the best in its class

OpenAI rival Anthropic is releasing a powerful new generative AI model called Claude 3.5 Sonnet. But it's more of an incremental step than a monumental leap forward.

Claude 3.5 Sonnet can analyze and generate text as well as images, and it's Anthropic's best-performing model yet – at least on paper. On several AI benchmarks for reading, coding, math, and vision, Claude 3.5 Sonnet outperforms the model it replaces, Claude 3 Sonnet. And beats Anthropic's previous flagship model Claude 3 Opus.

Benchmarks aren't necessarily the most useful measure of AI progress, in part because many of them test for esoteric edge cases that don't apply to the average person, such as answering health exam questions. But for what it's worth, Claude 3.5 Sonnet barely quite competitive leading models, including OpenAI's recently launched GPT-4o, on some of the benchmarks Anthropic tested it against.

In addition to the new model, Anthropic is releasing what it calls Artifacts, a workspace where users can edit and add content (e.g. code and documents) generated by Anthropic's models. Artifacts is currently in preview and will receive new features in the near future, such as ways to collaborate with larger teams and store knowledge bases, Anthropic says.

Focus on efficiency

Claude 3.5 Sonnet is slightly more performant than Claude 3 Opus, and Anthropic says the model better understands nuanced and complex instructions, along with concepts like humor. (AI is notoriously unfunnyBut perhaps more importantly for developers who use Claude to build apps that require fast responses (for example, customer service chatbots), 3.5 Sonnet is faster. It's about twice as fast as 3 Opus, Anthropic claims.

According to Anthropic, vision – analyzing photographs – is an area in which Claude 3.5 Sonnet significantly improves over 3 Opus. 3.5 Sonnet can interpret charts and graphs more accurately and transcribe text from “imperfect” images, such as photos with distortions and visual artifacts.

Michael Gerstenhaber, product leader at Anthropic, says the improvements are the result of architectural changes and new training data, including AI-generated data. Which data specifically? Gerstenhaber wouldn't reveal it, but he suggested that Claude 3.5 Sonnet gets a lot of its power from these training sets.

Image credits: Anthropic

“What matters [businesses] What matters is whether or not AI helps them meet their business needs, not whether or not AI is competitive on a benchmark,” Gerstenhaber told JS. “And from that perspective, I believe Claude 3.5 Sonnet will be a step ahead of everything else we have available – and everything else in the industry as well.”

The secrecy surrounding training data may be for competitive reasons. But it could also be to protect Anthropic from legal challenges – especially those related to fair use. The courts have yet to decide whether vendors like Anthropic and its competitors, such as OpenAI, Google, Amazon, and so on, have the right to train on public data, including copyrighted data, without compensating or crediting the creators of that data.

So all we know is that Claude 3.5 Sonnet is trained on a lot of text and images, like Anthropic's previous models, plus feedback from human testers to try to 'align' the model with users' intentions, so hopefully is prevented from radiating toxic or other harmful substances. problematic text.

Anthropic Claude 3.5 Sonnet
Image credits: Anthropic

What else do we know? Well, Claude 3.5 Sonnet's context window – the amount of text the model can analyze before generating new text – is 200,000 tokens, the same as 3 Sonnet. Tokens are subdivided pieces of raw data, such as the syllables 'fan', 'bag' and 'tic' in the word 'fantastic'; 200,000 tokens correspond to approximately 150,000 words.

And we know that Claude 3.5 Sonnet is available today. Free users of Anthropic's web client and the Claude iOS app can access it for free; subscribers of Anthropic's Claude Pro and Claude Team paid plans receive 5x higher rate caps. 3.5 Sonnet is also live on Anthropic's API and managed platforms such as Amazon Bedrock and Google Cloud's Vertex AI.

“Claude 3.5 Sonnet is truly a step change in intelligence without sacrificing speed, and it sets us up for future releases across the entire Claude model family,” said Gerstenhaber.

Claude 3.5 Sonnet also powers Artefacts, which causes a special window to appear in the Claude web client when a user asks the model to generate content such as code snippets, text documents, or website designs. Gerstenhaber explains: “Artifacts are the model output that sets aside generated content and allows you as a user to iterate on that content. Let's say you want to generate code: the artifact is placed in the UI and then you can talk to Claude and iterate on the document to improve it so you can run the code.”

The bigger picture

So what is the significance of Claude 3.5 Sonnet in the broader context of Anthropic – and the AI ​​ecosystem for that matter?

Claude 3.5 Sonnet shows that incremental progress is the extent of what we can now expect on the modeling front, barring a major breakthrough in research. Recent months have seen flagship releases from Google (Gemini 1.5 Pro) and OpenAI (GPT-4o) that marginally move the needle in terms of benchmark and quality performance. But there hasn't been a move to match the jump from GPT-3 to GPT-4 in quite some time, due to the rigidity of current model architectures and the enormous computing power they require to train.

As generative AI vendors turn their attention to data management and licensing rather than promising new scalable architectures, there are signs that investors become wary of the longer-than-expected path to ROI for generative AI. Anthropic is somewhat inoculated against these pressures, as it is in the enviable position of insuring Amazon (and to a lesser extent Google) against OpenAI. But the company's revenue is expected to rise just under $1 billion by the end of 2024 a fraction of OpenAIs – and I'm sure Anthropic's backers won't let that fact be forgotten.

Despite a growing customer base that includes household brands like Bridgewater, Brave, Slack and DuckDuckGo, Anthropic still lacks a certain corporate cachet. Tellingly, it was OpenAI – not Anthropic – that PwC recently partnered with to resell generative AI offerings to the enterprise.

So Anthropic is taking a strategic and common approach to moving forward, investing development time in products like Claude 3.5 Sonnet to deliver slightly better performance at commodity prices. 3.5 Sonnet has the same price as 3 Sonnet: $3 per million tokens fed into the model and $15 per million tokens generated by the model.

Gerstenhaber talked about this in our conversation. “When you build an application, the end user doesn't need to know what model is being used or how an engineer has optimized for their experience,” he said, “but the engineer could have the tools to optimize for that experience. along the vectors that need to be optimized, and cost is certainly one of them.”

Claude 3.5 Sonnet does not solve the hallucination problem. Mistakes are almost certain to be made. But it could be attractive enough to persuade developers and enterprises to move to the Anthropic platform. And ultimately that's what Anthropic is all about.

With that same goal in mind, Anthropic has doubled down on similar tools experimental steering AIwhich allows developers to 'control' the internal functions of their models; integrations to let the models take actions within apps; and tools built on top of of its models, such as the aforementioned Artifacts experience. It has also hired an Instagram co-founder as head of product. And it has expanded the availability of its products, most recently bringing Claude to Europe and opening offices in London and Dublin.

All things considered, Anthropic seems to have come to the idea that building an ecosystem around models – and not just models themselves – is the key to retaining customers as the capability gap between models narrows.

Still, Gerstenhaber emphasized that bigger and better models – like Claude 3.5 Opus – are on the horizon, with features like web search and the ability to remember preferences.

'I have not seen it deep learning still hit a wall, and I'll leave it to researchers to speculate about the wall, but I think it's a little early to draw conclusions about that, especially when you look at the pace of innovation,” he said. “There is very rapid development and very rapid innovation, and I have no reason to believe it will slow down.”

We will see.

Related Posts

Paint the colors of your dreams with this $50 sensor

Finding the perfect color for something can be a big challenge. It’s an intuitive and vague process when you’re looking for something that feels right. And no matter where the…

Twilio says hackers have identified mobile phone numbers of users of its two-factor app Authy

Last week, a hacker claimed to have stolen 33 million phone numbers from US messaging giant Twilio. On Tuesday, Twilio confirmed to JS that “threat actors” were able to identify…

Leave a Reply

Your email address will not be published. Required fields are marked *

You Missed

McDonald's, Yum, Wendy's test technology

  • July 3, 2024
McDonald's, Yum, Wendy's test technology

Low-dose aspirin may help prevent pregnancy complications from flu infections

  • July 3, 2024
Low-dose aspirin may help prevent pregnancy complications from flu infections

How Klay Thompson’s 13-year run with the Warriors splintered so unceremoniously

  • July 3, 2024
How Klay Thompson’s 13-year run with the Warriors splintered so unceremoniously

ECB's De Guindos discusses Le Pen's budget rules

  • July 3, 2024
ECB's De Guindos discusses Le Pen's budget rules

Biden administration issues reminder after emergency abortion ruling

  • July 3, 2024
Biden administration issues reminder after emergency abortion ruling

Paint the colors of your dreams with this $50 sensor

  • July 3, 2024
Paint the colors of your dreams with this $50 sensor

White House says gender-affirming surgeries should be limited to adults

  • July 3, 2024
White House says gender-affirming surgeries should be limited to adults

Drug can enhance effect of naloxone and reduce withdrawal symptoms, study finds

  • July 3, 2024
Drug can enhance effect of naloxone and reduce withdrawal symptoms, study finds

United Airlines Sends Customers Live Radar Maps During Weather Delays

  • July 3, 2024
United Airlines Sends Customers Live Radar Maps During Weather Delays

People leaving prison will have access to Medicaid in five states

  • July 3, 2024
People leaving prison will have access to Medicaid in five states

Martin Odegaard expects Arsenal to have a 'big' season: 'We will come back even more motivated and hungrier'

  • July 3, 2024
Martin Odegaard expects Arsenal to have a 'big' season: 'We will come back even more motivated and hungrier'