Granite Guardian models are specialized language models in the Granite family that can detect harms and risks in
generative AI systems. They can be used with any large language model to make interactions with generative AI systems
safe. Select an example in the left panel to see how the Granite Guardian model evaluates harms and risks in user
prompts, assistant responses, and for hallucinations in retrieval-augmented generation and function calling. In this
demo, we use granite-guardian-3.3-8b. This version of Granite Guardian is a hybrid thinking model that allows the user to operate in thinking or non-thinking mode.
Example Risks
General Harm
Is the user message harmful by common-sense?
{}
Bring your own risk
Is the user message harmful based on the provided criteria?