Evidence-gated CI security for smart contracts

Turn risky diffs into executable proofs. Block unsafe merges.

Paythos runs in your CI on every commit. It generates vulnerability hypotheses, writes Foundry tests to confirm or falsify them, and posts a Pass / Warn / Block verdict with reproducible evidence.

Not a report. A status check. We don't just detect — we generate the failing test.

7-day proof pilot. If it doesn't run and produce verdicts, you get your money back.

paythos-ci — etherfi-smart-contracts

━━━ Final Phase: Verdict ━━━━━━━━━━━━━━━━━━━━━━━━━━━

[paythos-ci] FALSIFIED H-MA-001: access control properly enforced → safe

[paythos-ci] FALSIFIED H-MA-002: reentrancy guard active → safe

[paythos-ci] VERIFIED H-INV-001: total supply invariant holds → confirmed

...

[pipeline] Recorded 4 global learnings

[pipeline] Memory: persisted 6 entries

✅Verdict: PASS

Iterations: 1

Runtime: 3m 30s

Learnings: 4

Hypotheses: 6

Verified: 1|Falsified: 5

Tests: 6

Passed: 6|Failed: 0|Errored: 0

Top findings:

[OK][high]LiquidityPool.pauseContract/unPauseContract access control verified (falsified)

[OK][high]EtherFiNode.sweepFunds restricted by onlyEtherFiNodesManager (falsified)

[OK][med]Total supply invariant holds under bounded fuzz (verified)

5 falsified

1 verified

3m 30s

1 iteration

You don't lose because you "ignored security."

You lose because one small change ships a dangerous regression.

Common ways teams get hurt:

A PR silently weakens access control

critical

A new external call introduces a reentrancy window

critical

An upgrade breaks storage layout

high

Accounting math drifts under edge cases

high

Tests pass, but they don't prove the critical properties

medium

Manual review can't reliably catch this at PR speed. Tool output is noisy. Audits come later.

You need a gate that stops bad releases now.

git diff — main...feature/vault-v2

@@ -89,6 +89,12 @@ contract Vault {

function withdraw(uint256 amount) external {

- require(hasRole(ADMIN, msg.sender));

+ // TODO: add back role check

(bool ok,) = msg.sender.call{value: amount}("");

balances[msg.sender] -= amount;

}

⚠Access control removed on privileged fn

What Paythos does

On every PR/commit, autonomous agents:

STEP 1

Recon: maps the attack surface

Autonomous recon agent scans the codebase, identifies contracts, slices relevant code, and maps privileges, external calls, storage layouts, and accounting paths.

STEP 2

Hypothesis: generates attack theories

Generates concrete vulnerability hypotheses tied to specific code locations — access control gaps, reentrancy paths, accounting bugs, and more.

STEP 3

Test architect: writes Foundry tests

For each hypothesis, generates Foundry tests — reproducers, invariant checks, and bounded fuzz tests — designed to confirm or falsify it.

STEP 4

Execute & review: iterates until confident

✅Pass:hypotheses falsified, properties hold

⚠️Warn:inconclusive findings, needs human review

⛔Block:hypothesis verified, exploit reproduces

STEP 5

Attaches evidence & learnings

Full artifacts: generated test code, execution logs, hypothesis verdicts, agent timeline, and global learnings persisted for future runs.

paythos-ci — pipeline execution

Intake

Hypothesize

Test Gen

Execute

Triage

Pipeline timeline:

intakeOK4.2sCode intake: 154 contracts analyzed, 87 slices extracted

hypothesis-genOK28.3s6 risk hypotheses generated

test-genOK112.7s6 Foundry tests compiled, 0 failed

executorOK48.1s6 passed, 0 failed, 0 errored

triageOK16.8sAll hypotheses resolved — no outstanding risks

✅ VERDICT: PASS— 6 hypotheses evaluated in 3m 30s

What you get on every PR

A clear decision

Pass / Warn / Block, with the top reasons.

PASSWARNBLOCK

Proof you can rerun

Commands, versions, and the generated tests that triggered the verdict.

$ forge test --match-path test/paythos/ -vvv

Diff-aware signal

No generic report dumps. Only the changes that matter.

3 of 12 files relevant

Continuously enforced critical properties

Your non-negotiables are checked every time.

✓

5/5 enforced

PR #312 — Paythos Verdict

✔️PASS

All critical properties hold

Properties verified:

access_controlPASS

reentrancy_guardPASS

storage_layoutPASS

accounting_invariantPASS

4 tests generated

Reproducibleforge test --match-path test/paythos/

The checks we enforce

Examples of what we gate:

Access control regressions

on privileged functions

External-call paths

with unsafe state ordering

Upgrade safety

storage layout + initializer + upgrade auth checks

Oracle validation

and staleness bounds

Accounting invariants

for shares, fees, debt, and withdrawals

Token interactions

dangerous approvals & edge cases

(You choose your critical properties. We start with a proven baseline.)

How it works

Step 1

Fit check

We confirm stack and repo readiness (Foundry/Hardhat, upgrade patterns, CI).

Step 2

Install the CI gate

GitHub Actions / GitLab CI status checks + PR bot comments.

Step 3

Tune to low-noise

We set baselines, suppress known false positives, and focus on high-risk deltas.

Step 4

Ship with confidence

Every PR gets a verdict and evidence. Bad merges get blocked.

Step 1

Fit check

We confirm stack and repo readiness (Foundry/Hardhat, upgrade patterns, CI).

Step 2

Install the CI gate

GitHub Actions / GitLab CI status checks + PR bot comments.

Step 3

Tune to low-noise

We set baselines, suppress known false positives, and focus on high-risk deltas.

Step 4

Ship with confidence

Every PR gets a verdict and evidence. Bad merges get blocked.

Who this is for

Ideal for

Solidity teams shipping weekly (or faster)
Protocols with upgrades, roles/permissions, or complex accounting
Teams where one bad release is existential

Not for

Teams shipping rarely with no CI discipline
Repos with no tests and no willingness to add them

7-day proof pilot

We're not asking you to believe. We'll prove it on your code.

$2,000

one-time · money-back guarantee

In 7 days, you get:

🔧

Paythos running in your CI

Fully integrated with your workflow

📋

At least 3 PR verdicts

with evidence-based outcomes

🛡️

At least 5 critical properties

enforced continuously

🧪

At least 10 generated tests

across real changes

Guarantee: If we can't get it running and producing verdicts with evidence in 7 days, you get your money back.

FAQ

No. Audits catch design flaws; Paythos prevents regressions during development. Use both.

Paythos deploys autonomous agents that analyze your code changes, generate vulnerability hypotheses tied to specific code locations, and write Foundry tests to confirm or falsify each one. Every finding is backed by a reproducible test — no guessing.

Hypotheses are verified by generated tests. If the test passes, the hypothesis is falsified and dropped. You only see findings backed by failing tests or inconclusive results that need human review. That's the difference between a report and a status check.

Foundry-first. Hardhat support coming next.

We run with configurable time budgets. Most teams keep it under 5 minutes per PR.

Yes. We can sign an NDA and use a self-hosted runner if required.

Repo access, a CI slot (GitHub Actions or GitLab CI), and 5-10 minutes to define your critical properties.

PASSWARNBLOCK

Stop shipping bugs. Start shipping confidence.

7-day proof pilot. If we can't get Paythos running and producing verdicts with evidence, you get your money back.

7-day proof pilot

Runnable tests as evidence

Money-back guarantee