En bref
Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...
Comment ca fonctionne
Attacker queries a proprietary LLM API with crafted prompts, capturing outputs to train a 'student' model that approximates the target. Sometimes combined with embeddings inversion to recover sensitive training data. Used by less funded...
Signaux d'alerte
- Pression urgente pour cliquer, payer ou partager des codes immediatement.
- Lien ou expediteur qui ne correspond pas a l'organisation officielle.
- Demande de carte, mot de passe, OTP, signature wallet ou virement.
Que faire
- 1Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.
- 2DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.
Source
Anthropic-Research-Sleeper-Agents
Source verifiee par Mythos Forensic Team
https://www.anthropic.com/research/sleeper-agents-training-deceptive-llmsFAQ
Model extraction attack — distilling proprietary LLM via API queries est une vraie arnaque ?
Oui. Traitez le message, l'appel ou la demande de paiement comme suspect jusqu'a verification via un canal officiel.
Quels sont les premiers signaux ?
Pression urgente pour cliquer, payer ou partager des codes immediatement.; Lien ou expediteur qui ne correspond pas a l'organisation officielle.; Demande de carte, mot de passe, OTP, signature wallet ou virement.
Que faire en premier ?
Tells (for platform): 1) account makes high volume of varied prompts at low temperature (deterministic); 2) requests for logprobs / token probabilities; 3) systematic vocabulary coverage; 4) coding task patterns mimicking benchmark suites (HumanEval, MMLU); 5) traffic from anonymising proxies.; DO: enforce per account rate limits + diversity scoring; never expose token logprobs to anon accounts; legal ToS + watermarking research.
LegalAudit peut-il verifier mon cas ?
Oui. Lancez le chat gratuit et collez le message, le lien, l'expediteur ou les details de paiement.