Draft:Membership Inference Attacks

  • Comment: Sources do not match the draft text. ChrysGalley (talk) 16:36, 30 May 2026 (UTC)


A membership inference attack (MIA) is a privacy attack in which an adversary, given the output of some function f(X) computed over a dataset X — for example, an aggregate statistic or a trained machine-learning model — and given a candidate record y, decides whether yX.[1] MIAs are among the most basic and widely studied privacy attacks because successful membership inference can directly violate the privacy of data subjects — for instance, by revealing that an individual participated in a clinical study — and because they serve as the canonical empirical benchmark for evaluating privacy-preserving mechanisms.[2]

The standard threat model formalizes the attack as a decision problem: given a target record y and access (black-box or white-box) to f(X), the adversary must output a single bit indicating its guess for whether yX. Attack quality is typically reported by ROC curves — specifically the true-positive rate at low false-positive rates — rather than by average accuracy.[2]

Background

Membership inference is closely tied to the original motivation for differential privacy, which was designed to bound the success of any adversary attempting to detect the presence or absence of a single record in a dataset.[3] When a mechanism satisfies (ε, δ)-differential privacy, the advantage of any membership inference adversary over random guessing is provably bounded as a function of the privacy parameters, regardless of auxiliary knowledge.

The term "membership inference" was popularized by Shokri et al. (2017) in the context of machine learning as a service, but the underlying phenomenon — that releases derived from a dataset can leak information about individual contributors — was demonstrated earlier in the genomics and statistical-database literature.[4]

Relationship to other privacy attacks

Membership inference is distinct from, though sometimes related to, other privacy attacks.

  • In an attribute inference attack, the adversary already knows that yX (or assumes it), and tries to infer a sensitive attribute of y from f(X) together with partial knowledge of y.
  • In a reconstruction attack, the adversary tries to recover entire records of X from f(X) without being given any candidate y.
  • In a training-data extraction attack (against generative models), the adversary tries to recover verbatim training records by querying the model — for example, prompting a large language model to regurgitate text it has memorized.

Membership inference is a strictly weaker decision problem than reconstruction or extraction: it produces only a single bit per query. However, MIA primitives are commonly used as building blocks for these stronger attacks, and a model that resists strong MIAs is generally considered to resist downstream reconstruction as well.[2]

Membership inference on aggregate statistics

The earliest widely cited membership inference attack was demonstrated by Nils Homer and colleagues in 2008. Given access to minor allele frequencies released as part of a genome-wide association study (GWAS), the genotype of a target individual, and reference population statistics, the authors showed that an attacker could determine with high confidence whether the target individual's DNA had been included in the study's case group, even when the individual contributed less than 0.1% of the pooled sample.[4] The result led the U.S. National Institutes of Health and the Wellcome Trust to remove aggregate GWAS allele-frequency data from public repositories.[5]

Dwork, Smith, Steinke, Ullman, and Vadhan (FOCS 2015) generalized and strengthened these results in their analysis of "robust traceability from trace amounts," giving theoretical bounds showing that an attacker who knows a target's data can detect membership from sufficiently accurate summary statistics even when those statistics are heavily perturbed by measurement error or by noise added for privacy.[6]

Membership inference has also been demonstrated against aggregated location and mobility data. Pyrgelis, Troncoso, and De Cristofaro (NDSS 2018) showed that an attacker with limited prior knowledge can determine whether a target user contributed to time-series aggregates produced by mobile-network operators or smart-city platforms, and that protections based on differential privacy reduce but do not eliminate the threat at realistic noise levels.[7] Related attacks have been described against released census tabulations, online-advertising audience estimates, and recommender-system statistics.

Membership inference against machine learning models

Shokri, Stronati, Song, and Shmatikov (IEEE S&P 2017) introduced the canonical formulation of membership inference against supervised machine-learning classifiers. In their black-box setting, the adversary observes only the prediction vector returned by a target model on a query record. They train "shadow models" that mimic the target on similar data, use them to build a labelled dataset of (prediction, member/non-member) pairs, and finally train an "attack model" that decides membership for new records. The attack was shown to be effective against models trained through commercial machine-learning-as-a-service platforms.[1]

Subsequent work has refined this template along several axes.

Threat models

Black-box attacks assume only query access to the model's outputs (confidence scores or labels) and are the most common deployment-relevant setting. White-box attacks additionally assume access to model parameters, intermediate activations, or gradients; Nasr, Shokri, and Houmansadr (IEEE S&P 2019) showed that white-box access enables stronger attacks, particularly in federated learning where participants observe parameter updates each round.[8] Label-only attacks, introduced by Choquette-Choo, Tramèr, Carlini, and Papernot (ICML 2021), remove the assumption that the adversary sees confidence scores at all and infer membership purely from how the model's hard label changes under input perturbations, defeating defenses that obfuscate confidence vectors.[9]

Relation to overfitting and generalization

Yeom, Giacomelli, Fredrikson, and Jha (CSF 2018) gave a formal analysis connecting MIAs to overfitting, showing that a gap between training and test loss is sufficient — though not strictly necessary — for an effective membership inference attack, and proposing a simple loss-thresholding attack as a baseline.[10] This loss-based attack remains a standard baseline.

Reducing assumptions and improving evaluation

Salem et al. (NDSS 2019) demonstrated "ML-Leaks," showing that membership inference is feasible without knowing the target model's architecture or training distribution, and that attacks remain effective with a single shadow model.[11]

Carlini, Chien, Nasr, Song, Terzis, and Tramèr (IEEE S&P 2022) argued that the field had been mismeasuring MIA success by reporting average accuracy, which is dominated by easy non-members; they proposed evaluating attacks by their true-positive rate at low false-positive rates (e.g., 0.1% or 0.001%) and introduced the Likelihood Ratio Attack (LiRA), which trains many reference shadow models to estimate per-example score distributions and is roughly an order of magnitude more powerful than prior attacks in this stringent regime.[2]

Membership inference against language models

Membership inference against large language models typically uses the target model's perplexity or per-token log-likelihood on the candidate sequence as the membership signal. Mattern et al. (ACL 2023) introduced a "neighbourhood comparison" attack that calibrates this signal against the model's loss on perturbed neighbours of the candidate, reducing the false-positive rate of plain loss thresholding.[12] Shi et al. (ICLR 2024) proposed Min-K% Prob, a reference-free attack based on the observation that an unseen text is more likely than a seen one to contain a few tokens with very low probability under the model.[13] A subsequent large-scale study by Duan et al. (COLM 2024) argued that, on LLMs trained for a single epoch on web-scale corpora, existing membership inference attacks barely outperform random guessing, attributing the difficulty to the size of the training set, the small number of passes over each example, and the fuzzy boundary between members and non-members in natural-language data.[14]

Defenses

Defenses against membership inference fall into three broad categories.

Provable defenses via differential privacy. Training with differential privacy — typically through differentially private stochastic gradient descent (DP-SGD), introduced by Abadi et al. (CCS 2016) — provides a quantitative upper bound on the success of any membership inference adversary as a function of the privacy budget ε.[15] Empirically, DP-SGD substantially reduces MIA accuracy at meaningful ε, though at a measurable cost in model utility.

Regularization and generalization-improving methods. Because MIAs exploit the gap between a model's behavior on training and non-training data, techniques that reduce overfitting — including L2 regularization, dropout, early stopping, data augmentation, and knowledge distillation — empirically reduce MIA success, although they generally do not provide formal guarantees.[10][16]

Output perturbation. MemGuard (Jia, Salem, Backes, Zhang, and Gong, CCS 2019) and related defenses inject crafted noise into the model's confidence vector — analogous to adversarial examples but aimed at the attacker — so that the published scores carry less information about training-set membership while preserving the top-1 prediction.[17] Subsequent work has shown that output-perturbation defenses are bypassed by label-only attacks and by adaptive attackers, motivating renewed focus on differential privacy as the only defense with formal guarantees.[9][2]

Significance

Beyond their direct privacy implications, membership inference attacks have become a standard auditing tool: regulators, model providers, and researchers use MIA success rates as an empirical measure of how much a deployed model or aggregate release leaks about its inputs, and MIAs are commonly used as a lower-bound benchmark for differentially private training. They have also been used to test whether a specific document was included in the pre-training corpus of a language model, with potential implications for copyright and personal-data compliance.[13]

See also

References

  1. ^ a b Shokri, Reza; Stronati, Marco; Song, Congzheng; Shmatikov, Vitaly (2017). "Membership Inference Attacks Against Machine Learning Models". Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP). IEEE. pp. 3–18. arXiv:1610.05820. doi:10.1109/SP.2017.41.
  2. ^ a b c d e Carlini, Nicholas; Chien, Steve; Nasr, Milad; Song, Shuang; Terzis, Andreas; Tramèr, Florian (2022). "Membership Inference Attacks From First Principles". Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP). IEEE. arXiv:2112.03570. doi:10.1109/SP46214.2022.9833649.
  3. ^ Dwork, Cynthia; McSherry, Frank; Nissim, Kobbi; Smith, Adam (2006). "Calibrating Noise to Sensitivity in Private Data Analysis". Theory of Cryptography (TCC 2006). Lecture Notes in Computer Science. Vol. 3876. Springer. doi:10.1007/11681878_14.
  4. ^ a b Homer, Nils; Szelinger, Szabolcs; Redman, Margot; Duggan, David; Tembe, Waibhav; Muehling, Jill; Pearson, John V.; Stephan, Dietrich A.; Nelson, Stanley F.; Craig, David W. (2008). "Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays". PLOS Genetics. 4 (8): e1000167. doi:10.1371/journal.pgen.1000167. PMC 2516199. PMID 18769715.{{cite journal}}: CS1 maint: article number as page number (link)
  5. ^ Zerhouni, Elias A.; Nabel, Elizabeth G. (2008). "Protecting Aggregate Genomic Data". Science. 322 (5898): 44. doi:10.1126/science.1165490. PMID 18772394.
  6. ^ Dwork, Cynthia; Smith, Adam; Steinke, Thomas; Ullman, Jonathan; Vadhan, Salil (2015). "Robust Traceability from Trace Amounts". Proceedings of the 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS). IEEE. pp. 650–669. doi:10.1109/FOCS.2015.46.
  7. ^ Pyrgelis, Apostolos; Troncoso, Carmela; De Cristofaro, Emiliano (2018). "Knock Knock, Who's There? Membership Inference on Aggregate Location Data". Proceedings of the 25th Network and Distributed System Security Symposium (NDSS). Internet Society. arXiv:1708.06145. doi:10.14722/ndss.2018.23284.
  8. ^ Nasr, Milad; Shokri, Reza; Houmansadr, Amir (2019). "Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning". Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP). IEEE. pp. 739–753. arXiv:1812.00910. doi:10.1109/SP.2019.00065.
  9. ^ a b Choquette-Choo, Christopher A.; Tramèr, Florian; Carlini, Nicholas; Papernot, Nicolas (2021). "Label-Only Membership Inference Attacks". Proceedings of the 38th International Conference on Machine Learning (ICML). Vol. 139. PMLR. arXiv:2007.14321.
  10. ^ a b Yeom, Samuel; Giacomelli, Irene; Fredrikson, Matt; Jha, Somesh (2018). "Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting". 2018 IEEE 31st Computer Security Foundations Symposium (CSF). IEEE. pp. 268–282. arXiv:1709.01604. doi:10.1109/CSF.2018.00027.
  11. ^ Salem, Ahmed; Zhang, Yang; Humbert, Mathias; Berrang, Pascal; Fritz, Mario; Backes, Michael (2019). "ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models". Proceedings of the 26th Network and Distributed System Security Symposium (NDSS). Internet Society. arXiv:1806.01246. doi:10.14722/ndss.2019.23119.
  12. ^ Mattern, Justus; Mireshghallah, Fatemehsadat; Jin, Zhijing; Schölkopf, Bernhard; Sachan, Mrinmaya; Berg-Kirkpatrick, Taylor (2023). "Membership Inference Attacks against Language Models via Neighbourhood Comparison". Findings of the Association for Computational Linguistics: ACL 2023. pp. 11330–11343. arXiv:2305.18462. doi:10.18653/v1/2023.findings-acl.719.
  13. ^ a b Shi, Weijia; Ajith, Anirudh; Xia, Mengzhou; Huang, Yangsibo; Liu, Daogao; Blevins, Terra; Chen, Danqi; Zettlemoyer, Luke (2024). "Detecting Pretraining Data from Large Language Models". Proceedings of the International Conference on Learning Representations (ICLR). arXiv:2310.16789.
  14. ^ Duan, Michael; Suri, Anshuman; Mireshghallah, Niloofar; Min, Sewon; Shi, Weijia; Zettlemoyer, Luke; Tsvetkov, Yulia; Choi, Yejin; Evans, David; Hajishirzi, Hannaneh (2024). "Do Membership Inference Attacks Work on Large Language Models?". Conference on Language Modeling (COLM). arXiv:2402.07841.
  15. ^ Abadi, Martín; Chu, Andy; Goodfellow, Ian; McMahan, H. Brendan; Mironov, Ilya; Talwar, Kunal; Zhang, Li (2016). "Deep Learning with Differential Privacy". Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM. pp. 308–318. arXiv:1607.00133. doi:10.1145/2976749.2978318.
  16. ^ Shejwalkar, Virat; Houmansadr, Amir (2021). "Membership Privacy for Machine Learning Models Through Knowledge Transfer". Proceedings of the 35th AAAI Conference on Artificial Intelligence. arXiv:1906.06589.
  17. ^ Jia, Jinyuan; Salem, Ahmed; Backes, Michael; Zhang, Yang; Gong, Neil Zhenqiang (2019). "MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples". Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. ACM. pp. 259–274. arXiv:1909.10594. doi:10.1145/3319535.3363201.

Category:Computer security Category:Cryptographic attacks Category:Privacy Category:Machine learning

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.