Be part of our day by day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra
There’s a brand new king on the town: Matt Shumer, co-founder and CEO of AI writing startup HyperWrite, at present unveiled Reflection 70B, a brand new massive language mannequin (LLM) based mostly on Meta’s open supply Llama 3.1-70B Instruct that leverages a brand new error self-correction method and boasts superior efficiency on third-party benchmarks.
As Shumer introduced in a publish on the social community X, Reflection-70B now seems to be “the world’s high open-source AI mannequin.”
He posted the next chart displaying its benchmark efficiency right here:
Reflection 70B has been rigorously examined throughout a number of benchmarks, together with MMLU and HumanEval, utilizing LMSys’s LLM Decontaminator to make sure the outcomes are free from contamination. These benchmarks present Reflection constantly outperforming fashions from Meta’s Llama sequence and competing head-to-head with high business fashions.
You may attempt it your self right here as a demo on a “playground” web site, however as Shumer famous on X, the announcement of the brand new king of open-source AI fashions has flooded the demo website with site visitors and his workforce is scrambling to seek out sufficient GPUs (graphics processing models, the dear chips from Nvidia and others used to coach and run most generative AI fashions) to spin as much as meet the demand.
How Reflection 70B stands aside
Shumer emphasised that Reflection 70B isn’t simply aggressive with top-tier fashions however brings distinctive capabilities to the desk, particularly, error identification and correction.
As Shumer advised VentureBeat over DM: “I’ve been fascinated about this concept for months now. LLMs hallucinate, however they will’t course-correct. What would occur in case you taught an LLM the best way to acknowledge and repair its personal errors?”
Therefore the identify, “Reflection” — a mannequin that may replicate on its generated textual content and assess its accuracy earlier than delivering it as outputs to the person.
The mannequin’s benefit lies in a method referred to as reflection tuning, which permits it to detect errors in its personal reasoning and proper them earlier than finalizing a response.
Reflection 70B introduces a number of new particular tokens for reasoning and error correction, making it simpler for customers to work together with the mannequin in a extra structured manner. Throughout inference, the mannequin outputs its reasoning inside particular tags, permitting for real-time corrections if it detects a mistake.
The playground demo website consists of steered prompts for the person to make use of, asking Reflection 70B what number of letter “r” situations there are within the phrase “Strawberry” and which quantity is bigger, 9.11 or 9.9, two easy issues many AI fashions — together with main proprietary ones — fail to get proper constantly. Our exams of it have been sluggish, however Reflection 70B in the end offered the right response after 60+ seconds.
This makes the mannequin notably helpful for duties requiring excessive accuracy, because it separates reasoning into distinct steps to enhance precision. The mannequin is out there for obtain through the AI code repository Hugging Face, and API entry is about to be out there later at present by means of GPU service supplier Hyperbolic Labs.
An much more highly effective, bigger mannequin on the way in which
The discharge of Reflection 70B is barely the start of the Reflection sequence. Shumer introduced that a good bigger mannequin, Reflection 405B, will likely be made out there subsequent week.
He additionally advised VentureBeat that HyperWrite is engaged on integrating the Reflection 70B mannequin into its main AI writing assistant product.
“We’re exploring numerous methods to combine the mannequin into HyperWrite — I’ll share extra on this quickly,” he pledged.
Reflection 405B is anticipated to outperform even the highest closed-source fashions available on the market at present. Shumer additionally mentioned HyperWrite would launch a report detailing the coaching course of and benchmarks, offering insights into the improvements that energy Reflection fashions.
The underlying mannequin for Reflection 70B is constructed on Meta’s Llama 3.1 70B Instruct and makes use of the inventory Llama chat format, guaranteeing compatibility with current instruments and pipelines.
Shumer credit Glaive for enabling speedy AI mannequin coaching
A key contributor to Reflection 70B’s success is the artificial information generated by Glaive, a startup specializing within the creation of use-case-specific datasets.
Glaive’s platform permits the speedy coaching of small, extremely centered language fashions, serving to to democratize entry to AI instruments. Based by Dutch engineer Sahil Chaudhary, Glaive focuses on fixing one of many largest bottlenecks in AI growth: the supply of high-quality, task-specific information.
Glaive’s method is to create artificial datasets tailor-made to particular wants, permitting corporations to fine-tune fashions rapidly and affordably. The corporate has already demonstrated success with smaller fashions, akin to a 3B parameter mannequin that outperformed many bigger open-source alternate options on duties like HumanEval. Spark Capital led a $3.5 million seed spherical for Glaive greater than a 12 months in the past, supporting Chaudhary’s imaginative and prescient of making a commoditized AI ecosystem the place specialist fashions could be educated simply for any activity.
By leveraging Glaive’s know-how, the Reflection workforce was capable of quickly generate high-quality artificial information to coach Reflection 70B. Shumer credited Chaudhary and the Glaive AI platform for accelerating the event course of, with information generated in hours relatively than weeks.
In complete, the coaching course of took three weeks, in response to Shumer in a direct message to VentureBeat. “We educated 5 iterations of the mannequin over three weeks,” he wrote. “The dataset is totally customized, constructed utilizing Glaive’s artificial information era techniques.”
HyperWrite is a uncommon Lengthy Island AI startup
At first look, it looks as if Reflection 70B got here from nowhere. However Shumer has been on the AI recreation for years.
He based his firm, initially referred to as Otherside AI, in 2020 alongside Jason Kuperberg. It was initially based mostly in Melville, New York, a hamlet about an hour’s drive east of New York Metropolis on Lengthy Island.
It gained traction round its signature product, HyperWrite, which began as a Chrome extension for shoppers to craft emails and responses based mostly on bullet factors, however has advanced to deal with duties akin to drafting essays, summarizing textual content, and even organizing emails. HyperWrite counted two million customers as of November 2023 and landed the co-founding duo a spot on Forbes‘ annual “30 Below 30” Listing, in the end spurring Shumer and Kuperberg and their rising workforce to alter the identify of the corporate to match their hit product.
HyperWrite’s newest spherical, disclosed in March 2023, noticed a $2.8 million injection from buyers together with Madrona Enterprise Group. With this funding, HyperWrite has launched new AI-driven options, akin to turning net browsers into digital butlers that may deal with duties starting from reserving flights to discovering job candidates on LinkedIn.
Shumer notes that accuracy and security stay high priorities for HyperWrite, particularly as they discover advanced automation duties. The platform remains to be refining its private assistant instrument by monitoring and making enhancements based mostly on person suggestions. This cautious method, much like the structured reasoning and reflection embedded in Reflection 70B, reveals Shumer’s dedication to precision and accountability in AI growth.
What’s subsequent for HyperWrite and the Reflection AI mannequin household?
Trying forward, Shumer has even larger plans for the Reflection sequence. With Reflection 405B set to launch quickly, he believes it is going to surpass the efficiency of even proprietary or closed-source LLMs akin to OpenAI’s GPT-4o, presently the worldwide chief, by a major margin.
That’s dangerous information not just for OpenAI — which is reportedly searching for to lift a major new spherical of personal funding from the likes of Nvidia and Apple — however different closed-source mannequin suppliers akin to Anthropic and even Microsoft.
It seems that as soon as once more within the fast-moving gen AI area, the steadiness of energy has shifted.
For now, the discharge of Reflection 70B marks a major milestone for open-source AI, giving builders and researchers entry to a robust instrument that rivals the capabilities of proprietary fashions. As AI continues to evolve, Reflection’s distinctive method to reasoning and error correction could set a brand new normal for what open-source fashions can obtain.