IBM further strengthens Granite for enterprise deployment with HackerOne
IBM is partnering with HackerOne to offer a bug bounty program for Granite, with up to $100,000 in bounty payouts available. The objective is to identify successful jailbreaks of Granite models deployed in enterprise-like settings with guardrails enabled, as businesses look to scale their AI workflows.
In the span of a few short years, generative AI has gone from the research lab to powering enterprise platforms and systems used by countless businesses and their customers alike. But as AI expands beyond the sandbox, companies need to continuously ensure that the models powering their platforms and systems are trustworthy and robust.
That’s why IBM is working with HackerOne, a leading offensive cybersecurity company that helps enterprises find vulnerabilities in their software, to kickstart a bug bounty program, for its Granite family of AI models. Through this new initiative, researchers will be invited through the HackerOne platform to find ways to adversarially attack Granite models and make them act in ways they weren’t intended to.
These attacks, and the resulting model outputs, will then be used to further strengthen the Granite models, as well as identify new attack techniques used by cybercriminals. A team within IBM Research, composed of AI policy, safety, security, and governance experts, will monitor reports from the program and use the data to generate synthetic data for alignment of Granite.
IBM will offer up to $100,000 in total bounty rewards, based on the program’s in-scope activities, which could evolve over time. The program will launch with Granite Guardian in place, an open-source guardrail designed to run alongside any foundation model.
“HackerOne's community of researchers has proven invaluable in testing the safety and security of real-world AI systems. More than finding flaws, they are advancing the frontier of AI — probing edge cases, exposing novel failure modes, and surfacing risks before anyone else sees them,” said Dane Sherrets, Staff Innovation Architect at HackerOne. “This partnership with IBM builds on that momentum, showing how community-driven insights can power safer development, strengthen trust, and accelerate adoption.”
The goal for the researchers invited to be part of the program is to break the models with these guardrails raised, as the intention is to find disconnects in how IBM expects Granite developers to actually deploy the models in an enterprise setting, according to its responsible use guide. It’s not much use being able to jailbreak a model in a sandbox with attacks that Guardian is already capable of mitigating.
Granite and Granite Guardian models are open-sourced and permissively licensed under an Apache 2.0 license, and available on Hugging Face and GitHub and myriad other places that developers convene to build the future of AI technology. Every flaw discovered through this new program will help shape that future, making Granite models more secure and giving the open-source community a better understanding of the security challenges that come with scaling AI. And for Granite users, it means that with every new discovery, Granite models will get even stronger.
This work will build on Granite’s pedigree as one of the most robust families of open-source models available today. Granite Guardian models currently hold six of the top 10 spots on the GuardBench, the first independent measure of how well guardrail models can detect harmful and hallucinated content, as well as attempts to break LLM safety controls. And when you pair a Granite LLM with Guardian, there’s only a 0.03% success rate for jailbreaking the model when judged on the HarmBench red-teaming framework.
Both Granite Guardian and the Granite LLMs arose from work that began within IBM Research. This work will also inform future directions of IBM Research’s generative computing work, creating software frameworks to improve GenAI applications security and maintainability
“Granite Guardian enforces secure control flow over model inferences, like a software firewall for AI,” said Ambrish Rawat, a senior research scientist and Master Inventor at IBM Research, who specializes in AI safety and security. “It's central to our efforts to secure AI behavior at the system level, and through HackerOne we are stress-testing this foundation to ensure safe and robust model deployment.”
The first cohort of researchers are being invited by HackerOne to test their mettle against Granite in the coming weeks.
Related posts
- Technical noteTakuya Ito
Towards a generative future for computing
ReleaseMike MurphyHow the IBM Research AI Hardware Center is building tomorrow’s processors
Deep DivePeter HessAll decisions have trade-offs. IBM’s Wei Sun is an expert at weighing them
Q & AKim Martineau