Back
About RSIS
Introduction
Building the Foundations
Welcome Message
Board of Governors
Staff Profiles
Executive Deputy Chairman’s Office
Dean’s Office
Management
Distinguished Fellows
Faculty and Research
Associate Research Fellows, Senior Analysts and Research Analysts
Visiting Fellows
Adjunct Fellows
Administrative Staff
Honours and Awards for RSIS Staff and Students
RSIS Endowment Fund
Endowed Professorships
Career Opportunities
Getting to RSIS
Research
Research Centres
Centre for Multilateralism Studies (CMS)
Centre for Non-Traditional Security Studies (NTS Centre)
Centre of Excellence for National Security
Institute of Defence and Strategic Studies (IDSS)
International Centre for Political Violence and Terrorism Research (ICPVTR)
Research Programmes
National Security Studies Programme (NSSP)
Social Cohesion Research Programme (SCRP)
Studies in Inter-Religious Relations in Plural Societies (SRP) Programme
Other Research
Future Issues and Technology Cluster
Research@RSIS
Science and Technology Studies Programme (STSP) (2017-2020)
Graduate Education
Graduate Programmes Office
Exchange Partners and Programmes
How to Apply
Financial Assistance
Meet the Admissions Team: Information Sessions and other events
RSIS Alumni
Outreach
Global Networks
About Global Networks
RSIS Alumni
Executive Education
About Executive Education
SRP Executive Programme
Terrorism Analyst Training Course (TATC)
International Programmes
About International Programmes
Asia-Pacific Programme for Senior Military Officers (APPSMO)
Asia-Pacific Programme for Senior National Security Officers (APPSNO)
International Conference on Cohesive Societies (ICCS)
International Strategy Forum-Asia (ISF-Asia)
Publications
RSIS Publications
Annual Reviews
Books
Bulletins and Newsletters
RSIS Commentary Series
Counter Terrorist Trends and Analyses
Commemorative / Event Reports
Future Issues
IDSS Papers
Interreligious Relations
Monographs
NTS Insight
Policy Reports
Working Papers
External Publications
Authored Books
Journal Articles
Edited Books
Chapters in Edited Books
Policy Reports
Working Papers
Op-Eds
Glossary of Abbreviations
Policy-relevant Articles Given RSIS Award
RSIS Publications for the Year
External Publications for the Year
Media
Cohesive Societies
Sustainable Security
Other Resource Pages
News Releases
Speeches
Video/Audio Channel
External Podcasts
Events
Contact Us
S. Rajaratnam School of International Studies Think Tank and Graduate School Ponder The Improbable Since 1966
Nanyang Technological University Nanyang Technological University
  • About RSIS
      IntroductionBuilding the FoundationsWelcome MessageBoard of GovernorsHonours and Awards for RSIS Staff and StudentsRSIS Endowment FundEndowed ProfessorshipsCareer OpportunitiesGetting to RSIS
      Staff ProfilesExecutive Deputy Chairman’s OfficeDean’s OfficeManagementDistinguished FellowsFaculty and ResearchAssociate Research Fellows, Senior Analysts and Research AnalystsVisiting FellowsAdjunct FellowsAdministrative Staff
  • Research
      Research CentresCentre for Multilateralism Studies (CMS)Centre for Non-Traditional Security Studies (NTS Centre)Centre of Excellence for National SecurityInstitute of Defence and Strategic Studies (IDSS)International Centre for Political Violence and Terrorism Research (ICPVTR)
      Research ProgrammesNational Security Studies Programme (NSSP)Social Cohesion Research Programme (SCRP)Studies in Inter-Religious Relations in Plural Societies (SRP) Programme
      Other ResearchFuture Issues and Technology ClusterResearch@RSISScience and Technology Studies Programme (STSP) (2017-2020)
  • Graduate Education
      Graduate Programmes OfficeExchange Partners and ProgrammesHow to ApplyFinancial AssistanceMeet the Admissions Team: Information Sessions and other eventsRSIS Alumni
  • Outreach
      Global NetworksAbout Global NetworksRSIS Alumni
      Executive EducationAbout Executive EducationSRP Executive ProgrammeTerrorism Analyst Training Course (TATC)
      International ProgrammesAbout International ProgrammesAsia-Pacific Programme for Senior Military Officers (APPSMO)Asia-Pacific Programme for Senior National Security Officers (APPSNO)International Conference on Cohesive Societies (ICCS)International Strategy Forum-Asia (ISF-Asia)
  • Publications
      RSIS PublicationsAnnual ReviewsBooksBulletins and NewslettersRSIS Commentary SeriesCounter Terrorist Trends and AnalysesCommemorative / Event ReportsFuture IssuesIDSS PapersInterreligious RelationsMonographsNTS InsightPolicy ReportsWorking Papers
      External PublicationsAuthored BooksJournal ArticlesEdited BooksChapters in Edited BooksPolicy ReportsWorking PapersOp-Eds
      Glossary of AbbreviationsPolicy-relevant Articles Given RSIS AwardRSIS Publications for the YearExternal Publications for the Year
  • Media
      Cohesive SocietiesSustainable SecurityOther Resource PagesNews ReleasesSpeechesVideo/Audio ChannelExternal Podcasts
  • Events
  • Contact Us
    • Connect with Us

      rsis.ntu
      rsis_ntu
      rsisntu
      rsisvideocast
      school/rsis-ntu
      rsis.sg
      rsissg
      RSIS
      RSS
      Subscribe to RSIS Publications
      Subscribe to RSIS Events

      Getting to RSIS

      Nanyang Technological University
      Block S4, Level B3,
      50 Nanyang Avenue,
      Singapore 639798

      Click here for direction to RSIS

      Get in Touch

    Connect
    Search
    • RSIS
    • Publication
    • RSIS Publications
    • IP23078 | Adversarial Attacks: An Existential Threat to AI
    • Annual Reviews
    • Books
    • Bulletins and Newsletters
    • RSIS Commentary Series
    • Counter Terrorist Trends and Analyses
    • Commemorative / Event Reports
    • Future Issues
    • IDSS Papers
    • Interreligious Relations
    • Monographs
    • NTS Insight
    • Policy Reports
    • Working Papers

    IP23078 | Adversarial Attacks: An Existential Threat to AI
    Manoj Harjani, Shantanu Sharma

    07 November 2023

    download pdf


    Adversarial attacks – intentional misuse or manipulation of artificial intelligence (AI) systems by bad actors to degrade their performance or cause harmful outcomes – arguably pose a greater threat than unintentional harms arising from bias in design and deployment. This is because there are few effective solutions at present to address adversarial attacks, and most companies at the forefront of AI fail to prioritise resources to address such threats. With current regulatory attention largely focused on mitigating unintentional harms, there could be substantial impediments to AI’s advancement and adoption.

       

     

     

    COMMENTARY

    Much of the recent media coverage related to AI has hyped the possible existential threat arising from systems developed with capabilities surpassing human-level intelligence. This is also reflected in initiatives to regulate AI that have been gathering steam in China, the European Union, and the United States. The United Kingdom recently held an inaugural Global Summit on AI Safety to encourage efforts to mitigate the risks arising from the most advanced types of AI systems that are expected to be developed in the coming decade, or even sooner.

    However, such efforts have arguably obscured an important existential threat to AI itself: the looming challenges posed by highly advanced AI systems can only materialise if such systems cannot withstand the increasingly sophisticated methods and tools that enable bad actors to manipulate or misuse such systems. The rise of adversarial attacks and the vast gulf in measures to mitigate them should garner more attention from policymakers and researchers. Despite the greater potential for deliberate harm from adversarial attacks in the present, regulatory and academic attention appears more focused on unintentional harm that may or may not materialise, depending on bias in the design and deployment of AI systems.

    IP23078
    The rise of adversarial attacks on AI involving bad actors manipulating or misusing AI systems to degrade their performance or cause harmful outcomes pose a great threat and should be promptly addressed by policymakers and researchers. Image from Pixabay.

    What Are Adversarial Attacks on AI?

    The main types of adversarial attacks on AI are poisoning, backdooring, evasion, membership inference, and model extraction. Poisoning and backdooring are typically carried out during the data acquisition, design, training, and model testing stages – they are known as “white box” attacks because access to the model is required. On the other hand, evasion, membership inference, and model extraction are “black box” attacks carried out only after model deployment, where access to the model is not required.

    Poisoning typically aims to degrade a model’s inference accuracy – its ability to make relevant predictions based on its training data. In a data poisoning attack, the training data set for a model is modified with data that will lead to inaccurate future results. For model poisoning attacks, the feedback mechanism for the model’s training process is manipulated. An early example of data poisoning occurred in 2016, when users manipulated Microsoft’s Tay chatbot – which was designed to learn from user input – into making offensive statements. This type of data poisoning attack, known as “prompt injection”, is on the rise as generative AI applications grow in popularity.

    Backdooring is more sophisticated. A backdoored AI model will behave as designed until a malicious “trigger” input causes the model to return inaccurate or harmful results. As the attack is dormant until the malicious input is called upon, it is much harder to detect. Recent research has shown that it is possible to insert undetectable backdoors in models, which poses a significant unaddressed security risk.

    Evasion entails fooling a model into misclassifying malicious or harmful input. For example, a security researcher used a simple modification to easily circumvent Google’s AI-driven detection system for malicious file attachments. The growing use of face recognition technology has similarly prompted evasion attempts using widely accessible non-technical methods such as wearing a t-shirt with a design that tricks a model into allowing the wearer to bypass identification.

    Membership inference and model extraction both seek to acquire valuable data and intellectual property. Membership inference involves using a model’s output to identify the data it was trained on, which poses a risk if the data is sensitive in nature and personally identifiable. Model extraction aims to recreate a proprietary model through analysis and reverse-engineering of its output.

    An Existential Threat to AI

    Current solutions to address adversarial attacks on AI face a number of obstacles. The stark reality is that these solutions are either limited in scope (they cannot fully address the threat posed by a specific type of attack), untested (we cannot gauge their efficacy easily), or not viable (potentially due to the unique nature of the threat posed by a specific type of attack). There is also no such thing as a “standardised” attack that can be defended against nor easily implementable training or investigation methods.

    Adversarial training is a potential defence against evasion attacks, where harmful data examples are included in training data to increase the robustness of the model. However, some challenges arising from adversarial training include the potential loss of predictive accuracy and limited effectiveness when training data is scarce. Furthermore, adversarial training has high computational costs, which makes it impractical for large-scale training data sets and limits its applicability in resource-constrained settings.

    Differential privacy – which aims to anonymise personally identifiable training data to minimise membership inference attacks – relies on aggregate statistics for training. Depending on the privacy budget set – i.e., the upper limit on the level of privacy required – a model trained using differential privacy would not be able to recall any specific data points related to individual personal data used in the training data set. Nevertheless, despite its potential applicability and effectiveness, differential privacy still struggles to provide an acceptable privacy–utility trade-off when a model’s task is more complex.

    These examples highlight the potentially existential nature of the threat that adversarial attacks pose for AI’s future. There is an imbalance between the range of possible adversarial attacks and available mitigation techniques that can be reliably implemented. Technical advances are needed to improve the detection and mitigation of adversarial attacks. Until then, governments, companies and societies will have to reckon with an ever-increasing attack surface. However, with most governments and researchers focused on mitigating the harms caused by AI’s use rather than the security of AI systems themselves, the threat posed by adversarial attacks remains largely unaddressed.

    Furthermore, the companies advancing the frontiers of AI technologies have yet to acknowledge the threat arising from adversarial attacks by devoting more resources to mitigating them. Despite the established threat, there is a gap in application of security measures for AI systems developed by these companies. The problem is magnified when we look at the number of companies developing and adopting AI without adequate mitigation measures. A survey of practitioners has shown that, despite registering concern over the threat from adversarial attacks, there was a lack of capabilities to determine vulnerabilities and take preventive measures to secure systems before deployment.

    What Can Be Done?

    One straightforward measure that can be taken in the short term is to increase the availability of educational resources on adversarial attacks for those involved in designing and deploying AI systems. This is particularly critical for systems being deployed in the public sector and for essential infrastructure. Additional effort will be needed to manage the challenges associated with adopting measures to mitigate adversarial attacks where there are trade-offs for model accuracy. Unfortunately, due to a lack of established benchmarks, implementing mitigation measures for adversarial attacks will be a fraught process.

    Although existing techniques to “sanitise” data and models are not perfect solutions, they are a starting point to mitigate a wide range of adversarial attacks. Training data sets and deployed models should also be protected by available cybersecurity measures to ensure the integrity of systems. Depending on the risk appetite of regulators and the critical applications of a particular AI system, risk identification mechanisms can be initiated and institutionalised to raise awareness of potential adversarial threats. Red-teaming and bug bounty programmes could also be potential ways to increase the robustness of models against adversarial attacks.

    Although it will take time to direct funding and re-order priorities, greater attention must ultimately be devoted by researchers to addressing the challenges posed by adversarial attacks. Many of the available mitigation measures are likely to fall short in circumstances where the models deployed are larger and more complex. Technical benchmarks and standards will need to be developed over time, although doing so is likely to be a game of continuously catching up with bad actors.

    Manoj HARJANI is a Research Fellow with the Military Transformations Programme and Shantanu SHARMA is a Senior Analyst with the Cyber and Homeland Defence Programme at the S. Rajaratnam School of International Studies (RSIS).

    Categories: IDSS Papers / General / Cybersecurity, Biosecurity and Nuclear Safety / Global


    Adversarial attacks – intentional misuse or manipulation of artificial intelligence (AI) systems by bad actors to degrade their performance or cause harmful outcomes – arguably pose a greater threat than unintentional harms arising from bias in design and deployment. This is because there are few effective solutions at present to address adversarial attacks, and most companies at the forefront of AI fail to prioritise resources to address such threats. With current regulatory attention largely focused on mitigating unintentional harms, there could be substantial impediments to AI’s advancement and adoption.

       

     

     

    COMMENTARY

    Much of the recent media coverage related to AI has hyped the possible existential threat arising from systems developed with capabilities surpassing human-level intelligence. This is also reflected in initiatives to regulate AI that have been gathering steam in China, the European Union, and the United States. The United Kingdom recently held an inaugural Global Summit on AI Safety to encourage efforts to mitigate the risks arising from the most advanced types of AI systems that are expected to be developed in the coming decade, or even sooner.

    However, such efforts have arguably obscured an important existential threat to AI itself: the looming challenges posed by highly advanced AI systems can only materialise if such systems cannot withstand the increasingly sophisticated methods and tools that enable bad actors to manipulate or misuse such systems. The rise of adversarial attacks and the vast gulf in measures to mitigate them should garner more attention from policymakers and researchers. Despite the greater potential for deliberate harm from adversarial attacks in the present, regulatory and academic attention appears more focused on unintentional harm that may or may not materialise, depending on bias in the design and deployment of AI systems.

    IP23078
    The rise of adversarial attacks on AI involving bad actors manipulating or misusing AI systems to degrade their performance or cause harmful outcomes pose a great threat and should be promptly addressed by policymakers and researchers. Image from Pixabay.

    What Are Adversarial Attacks on AI?

    The main types of adversarial attacks on AI are poisoning, backdooring, evasion, membership inference, and model extraction. Poisoning and backdooring are typically carried out during the data acquisition, design, training, and model testing stages – they are known as “white box” attacks because access to the model is required. On the other hand, evasion, membership inference, and model extraction are “black box” attacks carried out only after model deployment, where access to the model is not required.

    Poisoning typically aims to degrade a model’s inference accuracy – its ability to make relevant predictions based on its training data. In a data poisoning attack, the training data set for a model is modified with data that will lead to inaccurate future results. For model poisoning attacks, the feedback mechanism for the model’s training process is manipulated. An early example of data poisoning occurred in 2016, when users manipulated Microsoft’s Tay chatbot – which was designed to learn from user input – into making offensive statements. This type of data poisoning attack, known as “prompt injection”, is on the rise as generative AI applications grow in popularity.

    Backdooring is more sophisticated. A backdoored AI model will behave as designed until a malicious “trigger” input causes the model to return inaccurate or harmful results. As the attack is dormant until the malicious input is called upon, it is much harder to detect. Recent research has shown that it is possible to insert undetectable backdoors in models, which poses a significant unaddressed security risk.

    Evasion entails fooling a model into misclassifying malicious or harmful input. For example, a security researcher used a simple modification to easily circumvent Google’s AI-driven detection system for malicious file attachments. The growing use of face recognition technology has similarly prompted evasion attempts using widely accessible non-technical methods such as wearing a t-shirt with a design that tricks a model into allowing the wearer to bypass identification.

    Membership inference and model extraction both seek to acquire valuable data and intellectual property. Membership inference involves using a model’s output to identify the data it was trained on, which poses a risk if the data is sensitive in nature and personally identifiable. Model extraction aims to recreate a proprietary model through analysis and reverse-engineering of its output.

    An Existential Threat to AI

    Current solutions to address adversarial attacks on AI face a number of obstacles. The stark reality is that these solutions are either limited in scope (they cannot fully address the threat posed by a specific type of attack), untested (we cannot gauge their efficacy easily), or not viable (potentially due to the unique nature of the threat posed by a specific type of attack). There is also no such thing as a “standardised” attack that can be defended against nor easily implementable training or investigation methods.

    Adversarial training is a potential defence against evasion attacks, where harmful data examples are included in training data to increase the robustness of the model. However, some challenges arising from adversarial training include the potential loss of predictive accuracy and limited effectiveness when training data is scarce. Furthermore, adversarial training has high computational costs, which makes it impractical for large-scale training data sets and limits its applicability in resource-constrained settings.

    Differential privacy – which aims to anonymise personally identifiable training data to minimise membership inference attacks – relies on aggregate statistics for training. Depending on the privacy budget set – i.e., the upper limit on the level of privacy required – a model trained using differential privacy would not be able to recall any specific data points related to individual personal data used in the training data set. Nevertheless, despite its potential applicability and effectiveness, differential privacy still struggles to provide an acceptable privacy–utility trade-off when a model’s task is more complex.

    These examples highlight the potentially existential nature of the threat that adversarial attacks pose for AI’s future. There is an imbalance between the range of possible adversarial attacks and available mitigation techniques that can be reliably implemented. Technical advances are needed to improve the detection and mitigation of adversarial attacks. Until then, governments, companies and societies will have to reckon with an ever-increasing attack surface. However, with most governments and researchers focused on mitigating the harms caused by AI’s use rather than the security of AI systems themselves, the threat posed by adversarial attacks remains largely unaddressed.

    Furthermore, the companies advancing the frontiers of AI technologies have yet to acknowledge the threat arising from adversarial attacks by devoting more resources to mitigating them. Despite the established threat, there is a gap in application of security measures for AI systems developed by these companies. The problem is magnified when we look at the number of companies developing and adopting AI without adequate mitigation measures. A survey of practitioners has shown that, despite registering concern over the threat from adversarial attacks, there was a lack of capabilities to determine vulnerabilities and take preventive measures to secure systems before deployment.

    What Can Be Done?

    One straightforward measure that can be taken in the short term is to increase the availability of educational resources on adversarial attacks for those involved in designing and deploying AI systems. This is particularly critical for systems being deployed in the public sector and for essential infrastructure. Additional effort will be needed to manage the challenges associated with adopting measures to mitigate adversarial attacks where there are trade-offs for model accuracy. Unfortunately, due to a lack of established benchmarks, implementing mitigation measures for adversarial attacks will be a fraught process.

    Although existing techniques to “sanitise” data and models are not perfect solutions, they are a starting point to mitigate a wide range of adversarial attacks. Training data sets and deployed models should also be protected by available cybersecurity measures to ensure the integrity of systems. Depending on the risk appetite of regulators and the critical applications of a particular AI system, risk identification mechanisms can be initiated and institutionalised to raise awareness of potential adversarial threats. Red-teaming and bug bounty programmes could also be potential ways to increase the robustness of models against adversarial attacks.

    Although it will take time to direct funding and re-order priorities, greater attention must ultimately be devoted by researchers to addressing the challenges posed by adversarial attacks. Many of the available mitigation measures are likely to fall short in circumstances where the models deployed are larger and more complex. Technical benchmarks and standards will need to be developed over time, although doing so is likely to be a game of continuously catching up with bad actors.

    Manoj HARJANI is a Research Fellow with the Military Transformations Programme and Shantanu SHARMA is a Senior Analyst with the Cyber and Homeland Defence Programme at the S. Rajaratnam School of International Studies (RSIS).

    Categories: IDSS Papers / General / Cybersecurity, Biosecurity and Nuclear Safety

    Popular Links

    About RSISResearch ProgrammesGraduate EducationPublicationsEventsAdmissionsCareersVideo/Audio ChannelRSIS Intranet

    Connect with Us

    rsis.ntu
    rsis_ntu
    rsisntu
    rsisvideocast
    school/rsis-ntu
    rsis.sg
    rsissg
    RSIS
    RSS
    Subscribe to RSIS Publications
    Subscribe to RSIS Events

    Getting to RSIS

    Nanyang Technological University
    Block S4, Level B3,
    50 Nanyang Avenue,
    Singapore 639798

    Click here for direction to RSIS

    Get in Touch

      Copyright © S. Rajaratnam School of International Studies. All rights reserved.
      Privacy Statement / Terms of Use
      Help us improve

        Rate your experience with this website
        123456
        Not satisfiedVery satisfied
        What did you like?
        0/255 characters
        What can be improved?
        0/255 characters
        Your email
        Please enter a valid email.
        Thank you for your feedback.
        This site uses cookies to offer you a better browsing experience. By continuing, you are agreeing to the use of cookies on your device as described in our privacy policy. Learn more
        OK
        Latest Book
        more info