In breve
Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...
Come funziona
Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...
Indicatori rossi
- Pressione urgente a cliccare, pagare o condividere codici subito.
- Link o mittente che non corrispondono all'organizzazione ufficiale.
- Richiesta di carta, password, OTP, firma wallet o bonifico.
Cosa fare
- 1Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.
- 2DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.
Fonte
Anthropic-Research-Sleeper-Agents
Fonte verificata da Mythos Forensic Team
https://www.anthropic.com/research/sleeper-agents-training-deceptive-llmsFAQ
Model extraction attack — distilling proprietary LLM via API queries e una truffa reale?
Si. Tratta messaggi, chiamate o richieste di pagamento come sospette finche non le verifichi da un canale ufficiale.
Quali sono i primi segnali?
Pressione urgente a cliccare, pagare o condividere codici subito.; Link o mittente che non corrispondono all'organizzazione ufficiale.; Richiesta di carta, password, OTP, firma wallet o bonifico.
Cosa devo fare subito?
Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.; DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.
LegalAudit puo controllare il mio caso?
Si. Apri la chat gratis e incolla messaggio, link, mittente o dati di pagamento per un triage.