Probe Inventory

AI behavior detectors for policy enforcement,
security monitoring, and compliance across industries.

5 Verticals
32 Total Probes

Claude Code

Detect reliabilities and vulnerabilities in Claude Code agent

17 probes

If content mentions any API keys, secrets, tokens

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_1e469cb8cb64207e0746d34a75a...

Explore Agent's running environment

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_a39a76470581d1a61f11790e912...

If Claude is trying to escape it's sandbox, run introspection tools, etc

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_49e57ccf6e5d8e48206ef67592b...

If Claude is accessing the Internet

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_77952523baafdbbfd3029a39d32...

If Claude is changing configuration defaults (.env, settings.py, config.yaml)

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_7fde46ec551d9290a5e36c94bb4...

If Claude is changing public API / exported symbols

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_2e66955b1c28270320360b845b0...

If Claude is changing unit tests

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_ae92062fbbc188cbd6d9e9b649d...

If Claude is modifying / relaxing checks: lint rules, type checking, auth checks, feature flags

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_c9efe1561f41552f2943ffe15de...

If Claude is downloading / executing a binary and install system packages

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_aed908a92887b01844501521ba7...

If Claude is modifying project lockfile

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_9f97364ebfc2eaf908d330c4423...

If Claude is modifying anything outside the project workspace

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_5557e9c61ecb8e64cbdafc857bf...

If Claude is introducing unsafe de-serialization (pickle, yaml, etc)

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_a5894ecf1c78af226a2bde3cf2f...

If Claude is introducing overly broad exception handling

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_5d7de26a717f17c3b1e2168a1e3...

If Claude is changing license / copyright headers

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_bbb533ad448bdbc675404f649cf...

If Claude is modifying backend python

Build Date 2025-12
Kind Weak logit labels
UUID PairwiseArithmeticOp_02b450f99d6dff55ff0a3169e3b...

If Claude is deleting files

Build Date 2025-11
Kind Weak logit labels
UUID PairwiseArithmeticOp_d22fd59996dabb99aec4251174f...

If Claude is modifying project dependencies

Build Date 2025-11
Kind Weak logit labels
UUID PairwiseArithmeticOp_384448139059f37456f9379a2d6...

Neuropedia

Detect concepts from Neuronpedia

1 probe

Legal content

Build Date 2025-08
Kind Probe
UUID ClassifierEvaluationOp_6949f16566164babb8c12a308...

Toxicity

Detect content vulnerabilities

7 probes

General-purpose harmful content (for blog post)

Build Date 2025-07
Kind Probe

Fraud

Build Date 2025-04
Kind Probe

Illicit activity

Build Date 2025-04
Kind Probe

Hate speech

Build Date 2025-04
Kind Probe

Partisan political content

Build Date 2025-04
Kind Probe

Explicit content (sex, gore)

Build Date 2025-04
Kind Probe

Self-harm

Build Date 2025-02
Kind Probe

Cybersecurity

Detect cybersecurity vulnerabilities

6 probes

Jailbreak segmentation (incomplete)

Build Date 2025-06
Kind various

Refusal

Build Date 2025-02
Kind Probe

Tool use

Build Date 2025-02
Kind Probe

Authority escalation

Build Date 2025-02
Kind Probe

Privacy crimes

Build Date 2025-02
Kind Probe

Cybersecurity / hacking

Build Date 2025-02
Kind Probe

Travel agents

Detect vulnerabilities in Travel agent chatbots

1 probe

Authority escalations / tool use for travel agents

Build Date 2025-03
Kind Probe