Inference GLiGuard in Pioneer
POST /v1/chat/completions
Runs GLiGuard over the supplied messages and returns a classification for each task defined in schema. Pioneer exposes an OpenAI-compatible endpoint at https://api.pioneer.ai/v1, so you call GLiGuard through the standard chat completions route using the model ID fastino/gliguard-LLMGuardrails-300M.
Request body
The GLiGuard model ID:
fastino/gliguard-LLMGuardrails-300M.The text to moderate, in standard OpenAI chat format.
The classification schema. Contains a
classifications array, where each object defines one moderation task with a task name (see the task table below), a set of candidate labels, a multi_label flag, and a confidence threshold. The example runs a single prompt_safety task with labels safe and unsafe, multi_label: false, and threshold: 0.5.Return a confidence score per label. Set to
true in the example.Example: safety classification
This request runs the safety task on a single user message.Running the other moderation tasks
The example above runs a singleprompt_safety task. GLiGuard supports all of the moderation tasks listed in the task table below, and can evaluate several in one pass by adding more entries to schema.classifications.
What GLiGuard moderates
GLiGuard supports both prompt-side and response-side moderation, covering binary safety, harm categorization, jailbreak detection, and refusal classification. You compose these as tasks in a single request, and the model scores all of them in one pass.| Task family | Task | Output type | Purpose |
|---|---|---|---|
| Prompt-side | prompt_safety | single-label | Binary safe/unsafe classification before generation |
| Prompt-side | prompt_toxicity | multi-label | Harm categorization of prompts |
| Prompt-side | jailbreak_detection | multi-label | Jailbreak or prompt-attack strategy detection |
| Response-side | response_safety | single-label | Binary safe/unsafe classification of a model answer |
| Response-side | response_toxicity | multi-label | Harm categorization of responses |
| Response-side | response_refusal | single-label | Refusal vs compliance classification |
- Single-label tasks (
prompt_safety,response_safety,response_refusal) return one label. - Multi-label tasks (
prompt_toxicity,response_toxicity,jailbreak_detection) can return several labels at once.
Labels
Each task scores the input against a fixed label set:- Safety (
prompt_safety,response_safety):safe,unsafe - Refusal (
response_refusal):refusal,compliance - Harm categories (
prompt_toxicity,response_toxicity):violence_and_weapons,non_violent_crime,sexual_content,hate_and_discrimination,self_harm_and_suicide,pii_exposure,misinformation,copyright_violation,child_safety,political_manipulation,unethical_conduct,regulated_advice,privacy_violation,other,benign - Jailbreak strategies (
jailbreak_detection):prompt_injection,jailbreak_attempt,policy_evasion,instruction_override,system_prompt_exfiltration,data_exfiltration,roleplay_bypass,hypothetical_bypass,obfuscated_attack,multi_step_attack,social_engineering,benign