Scam Watch

Comment reconnaitre Model extraction attack — distilling proprietary LLM via API queries?

En bref

Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...

Comment ca fonctionne

Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...

Signaux d'alerte

  • Pression urgente pour cliquer, payer ou partager des codes immediatement.
  • Lien ou expediteur qui ne correspond pas a l'organisation officielle.
  • Demande de carte, mot de passe, OTP, signature wallet ou virement.

Que faire

  1. 1Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.
  2. 2DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.

Source

Anthropic-Research-Sleeper-Agents

Source verifiee par Mythos Forensic Team

https://www.anthropic.com/research/sleeper-agents-training-deceptive-llms

FAQ

Model extraction attack — distilling proprietary LLM via API queries est une vraie arnaque ?

Oui. Traitez le message, l'appel ou la demande de paiement comme suspect jusqu'a verification via un canal officiel.

Quels sont les premiers signaux ?

Pression urgente pour cliquer, payer ou partager des codes immediatement.; Lien ou expediteur qui ne correspond pas a l'organisation officielle.; Demande de carte, mot de passe, OTP, signature wallet ou virement.

Que faire en premier ?

Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.; DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.

LegalAudit peut-il verifier mon cas ?

Oui. Lancez le chat gratuit et collez le message, le lien, l'expediteur ou les details de paiement.