How can you recognize Model extraction attack — distilling proprietary LLM via API queries?

TLDR

Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...

How it works

Red flags

Urgent pressure to click, pay, or share codes immediately.
A link or sender that does not match the official organization.
Requests for card data, passwords, OTPs, wallet signatures, or bank transfers.

What to do

1Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.
2DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.

Source

Anthropic-Research-Sleeper-Agents

Source reviewed by Mythos Forensic Team

https://www.anthropic.com/research/sleeper-agents-training-deceptive-llms

FAQ

Is Model extraction attack — distilling proprietary LLM via API queries a real scam pattern?

Yes. Treat the message, call, or payment request as suspicious until you verify it through an official channel.

What are the first warning signs?

Urgent pressure to click, pay, or share codes immediately.; A link or sender that does not match the official organization.; Requests for card data, passwords, OTPs, wallet signatures, or bank transfers.

What should I do first?

Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.; DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.

Can LegalAudit check my case?

Yes. Start a free chat and paste the message, link, sender, or payment details for triage.