Evaluation
We test AI systems under realistic failure conditions. The output is evidence: traces, exploits, boundary failures, and the conditions that make them repeatable.
Most red-team reports end up in a drawer. We do the evaluation, then we stick around to make sure someone actually owns the findings, writes the controls, and briefs leadership.
The condition of being tuned to the same frequency. Oliver Lodge coined it in 1897 for early radio: a signal is useless if the transmitter and receiver aren't in phase. Same problem in AI governance. The safety team finds something, legal has a different timeline, ops has a different process, and the exec team hasn't been briefed yet. We get those rhythms lined up before the system goes live.
Our Europe report shows where institutions can write frameworks, map risks, and still fail to measure deployed systems or manage them after evidence arrives.
Cross-sector analysis of AI-enabled systems in defence, energy, finance, and biosurveillance.
The work is one pipeline: test the system, model the risk, translate the finding, and support the decision.
We test AI systems under realistic failure conditions. The output is evidence: traces, exploits, boundary failures, and the conditions that make them repeatable.
We map how AI risks connect across systems, teams, sectors, and external pressure. This is the service layer behind Resonance and CLD Suite.
We turn findings into controls, owners, escalation paths, and decision records. This is the workflow GovTune AI is being built to support.
We prepare the artifacts reviewers need: board memos, risk registers, sector benchmarks, and launch recommendations grounded in evidence.
Every sprint deliverable is governance-ready, not a findings report that ends in a drawer. Sprints are executed by Syntony's Trusted Red Team, with findings mapped to controls, escalation paths, and reviewer-facing evidence.
Fixed-scope evaluation in a defined window before a launch.
DeliverableGovernance-ready findings, not just a vulnerability list.
Tests tool use, delegation, and autonomy boundaries on agentic systems.
DeliverableAutonomy risk findings mapped to controls.
Adversarial prompt and misuse testing across a defined set of risk areas.
DeliverableExploit catalog with governance implications.
Cyber and bio uplift evaluation, gated and scoped carefully.
DeliverableCapability assessment with escalation guidance. Access-controlled intake, scoped test plans, and restricted disclosure are required.
Defense and Government, Financial, and Healthcare AI variants.
DeliverableSector-specific findings mapped to the relevant regulatory regime.
Recurring quarterly engagement.
DeliverableContinuous findings feeding governance.
Start with a Governance Lag self-assessment. We map where evidence, controls, and decision rights fall out of sync.
A red-team finding is not finished until it has an owner, a control, an escalation path, and an artifact a reviewer can use.
A model follows hostile instructions embedded in retrieved content.
Trace, prompt, tool call, policy gap, and replayable test case.
Product security owns remediation. Legal owns release conditions.
Content isolation, tool-call approval, retrieval allowlist, and regression test.
Any external content source can drive a privileged action.
One-page risk memo with control status, residual risk, and release decision.
Build the commercial motion at Syntony
Own the first repeatable GTM system across red-team sprints, governance advisory, product partnerships, and founder-led sales. Commercial leadership for a high-trust AI risk market.
Lead engineering at Syntony
Own technical direction across our product suite: evaluation tooling, causal mapping, and governance instrumentation. Senior, builder-first role with real ownership.
Build the infrastructure underneath our practice
We are hiring across evaluation infrastructure, agentic systems, risk data engineering, GovTune AI, and causal modeling. Individual contributors who ship into real engagements.
Join our adversarial evaluation practice
We conduct realistic and adversarial testing, baking in systems thinking, geopolitical forecasting, and multi-disciplinary analysis. Red teamers bring expertise in ML security, policy analysis, prompt engineering, and strategic risk.
Red team engagements are project-based. We're looking for specialists with published work and demonstrable expertise.
Collaborate on governance and risk research
Syntony collaborates with researchers, security specialists, and policy practitioners on a project basis. We're building a network focused on evaluation methodology, governance architecture, policy analysis, and strategic risk forecasting.
Affiliates contribute to publications, co-author research, and gain visibility through our work.
Tell us where you might fit
Use this if the current roles are not quite right, but the work is. We review general interest notes for future hiring, project collaboration, product partnerships, and advisory work.
An incident-based AI risk model across 1,406 documented incidents, mapped to sector taxonomies and governance benchmarks. The free preview PDF is the kind of artifact you put in front of a board.
Resonance models the interconnected AI risk surface of a system, then reads it two ways: where the system is fragile, to scope adversarial testing, and where it is steerable, to target governance. A live geopolitical layer re-weights the model as the threat picture moves.
GovTune AI is in development. It turns red-team traces into governance findings, controls, escalation logic, and decision records that reviewers can act on.
Syntony is an AI risk firm. We test AI systems, build the governance architecture to act on findings, and ship the software that makes both repeatable. Clients are frontier labs, public sector teams, and enterprise organizations running AI in critical workflows.
Our work is one thing: we translate adversarial findings into governance that holds.
The practice covers four things: adversarial evaluation, governance architecture, strategic risk advisory, and engineering. Evaluations produce evidence. Governance turns evidence into controls. Risk advisory makes the controls decision-relevant. Engineering makes the whole loop reproducible. The four parts are one pipeline from finding to control, not four separate services.
Work is multi-disciplinary by design. Engineering decisions get pressure-tested by red teamers, policy analysts, and forecasting specialists, the same people who use the tools. Findings are scoped to land in controls, escalation triggers, board-ready artifacts, and operator-facing systems.
Syntony Research LLC is based in Durham, North Carolina, with team growth planned across San Francisco, Washington DC, London, Brussels, and Singapore.
The work stays legible from test trace to executive decision.
Run adversarial testing against the workflow that matters.
Assign the risk to someone with authority to act.
Turn the evidence into a release condition, escalation rule, or operating procedure.
Produce the artifact a reviewer, operator, or board can use.
Nathan founded Syntony to close a gap he kept hitting in his prior work: good evaluation work that never made it into a governance decision. He is a decision scientist and AI safety researcher with fourteen years at the intersection of geopolitics and emerging technology.
At Syntony, he leads the firm's adversarial evaluation, governance architecture, strategic risk advisory, and software work. He red-teams frontier models for OpenAI and Anthropic and co-founded The AIHL Project, a research initiative on generative AI threats under international humanitarian law. Before Syntony, he spent six years as a decision scientist supporting U.S. Department of Defense clients on emerging-technology risk, AI integration, and security cooperation.
His current research applies system dynamics to frontier AI risk, hardens testimony archives against synthetic media attacks, and tracks defense industrial cooperation in Europe. Alongside that, he advises the Cloud Security Alliance on catastrophic AI risk, holds fellowships with the Truman National Security Project and MIT AI Alignment, contributes to the Oxford Martin AI Governance Initiative, leads the North Carolina chapter of the OpenAI Forum, and supports research for the Meta Oversight Board and the EvalEval Coalition.
His work has been presented at IASEAI at UNESCO, UNIDIR, the Cambridge Centre for Geopolitics, and the UK MoD Deterrence and Assurance Academic Alliance, and has appeared in War on the Rocks, RAND Europe, World Politics Review, PRISM, and The Washington Post. He holds an M.A. in Law and Diplomacy from The Fletcher School at Tufts and studied Politics, Philosophy, and Economics at Oxford.
Book a call or send a note. NDA discussions are standard.