IBM Granite Guardian 3.3

Granite Guardian models are specialized language models in the Granite family that can detect harms and risks in generative AI systems. They can be used with any large language model to make interactions with generative AI systems safe. Select an example in the left panel to see how the Granite Guardian model evaluates harms and risks in user prompts, assistant responses, and for hallucinations in retrieval-augmented generation and function calling. In this demo, we use granite-guardian-3.3-8b. This version of Granite Guardian is a hybrid thinking model that allows the user to operate in thinking or non-thinking mode.

Example Risks

General Harm

Is the user message harmful by common-sense?