Scam Watch

Come riconoscere Model extraction attack — distilling proprietary LLM via API queries?

In breve

Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...

Come funziona

Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...

Indicatori rossi

  • Pressione urgente a cliccare, pagare o condividere codici subito.
  • Link o mittente che non corrispondono all'organizzazione ufficiale.
  • Richiesta di carta, password, OTP, firma wallet o bonifico.

Cosa fare

  1. 1Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.
  2. 2DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.

Fonte

Anthropic-Research-Sleeper-Agents

Fonte verificata da Mythos Forensic Team

https://www.anthropic.com/research/sleeper-agents-training-deceptive-llms

FAQ

Model extraction attack — distilling proprietary LLM via API queries e una truffa reale?

Si. Tratta messaggi, chiamate o richieste di pagamento come sospette finche non le verifichi da un canale ufficiale.

Quali sono i primi segnali?

Pressione urgente a cliccare, pagare o condividere codici subito.; Link o mittente che non corrispondono all'organizzazione ufficiale.; Richiesta di carta, password, OTP, firma wallet o bonifico.

Cosa devo fare subito?

Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.; DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.

LegalAudit puo controllare il mio caso?

Si. Apri la chat gratis e incolla messaggio, link, mittente o dati di pagamento per un triage.