AI Vendor Risk Profiles

Trust is the foundation that enables the widespread adoption of generative AI in the enterprise. Take a look at our Generative AI Vendor Tool Risk Profiles to get a view into whether your GenAI vendors have taken steps to mitigate the most critical generative AI risks, from hallucinations to sensitive data leakage.

Read more about our work to establish Responsible AI Disclosures.

Need help with third party AI risk management? Credo AI is here to help. 

Anthropic Claude

AI Assistant
View full profile

Anthropic's Claude language model and API offer substantial capabilities along with risks that vary in severity. Anthropic has implemented a Constitutional AI approach aimed at aligning the model's behavior with principles of being helpful, harmless, and honest. While this approach mitigates some risks, others remain challenging to address comprehensively and persist at varying levels. Credo AI's analysis applies to the Claude 2 model announced on July 11, 2023. Profile last updated: July 31, 2023

Risks
Risk Assessment
Risk Present Built-in Mitigation
Abuse & Misuse ⚠️
Compliance ⚠️
Environmental & Societal Impact ⚠️
Explainability & Transparency ⚠️
Fairness & Bias ⚠️
Long-term & Existential Risk ⚠️
Performance & Robustness ⚠️
Privacy ⚠️
Security ⚠️
Mitigation
Mitigation Measures

RLHF alignment fine-tuning & Constitutional AI

The Claude model has undergone substantial fine-tuning [5, 10, 12, 13, 15, 20] with the goal of making the model more amenable to human interaction (i.e. instruction/chat tuning) and more aligned with human requirements for factuality and avoiding causing harm. As a probabilistic model, these efforts are mitigative but do not eliminate risk.

Regular updates

Claude periodically updates its models as the organization continues research into capabilities and safety measures. The API allows users to automatically update to the latest models or fix applications to a specific model version [4].

Google Cloud net-neutral carbon footprint

Google, Anthropic's cloud partner [3], claims to be carbon neutral [19] and claims to match 100% of energy consumption with renewables. They claim to have offset 100% of their historical operating emissions to reach historical net-neutrality. They achieve this through the purchase of carbon credits and offsets. Google has publicly committed to reaching 100% carbon-free operations by 2030. It is likely that all systems relevant to Claude's ongoing development and operations are included in this carbon accounting.

Non-use of prompts sent to, and outputs received from, the API

According to the Anthropic Terms of Service and Privacy Policy, data submitted through prompts to Claude are not used to train future Anthropic models [17, 18]. This eliminates the risk of private data or intellectual property being leaked through model responses to other entities. Prompts and responses are stored for up to 30 days [18]. Anthropic does not detail whether this retention may include human monitoring of prompts and responses for potential misuse or illegal activity, as is the case in other popular chatbot models. Due to the presence of standard, non-AI-specific cybersecurity risk, the risk of data leakage is non-zero. For instance, Anthropic could be targeted by a phishing attack, which could compromise data it stores, including sensitive data submitted to the Claude model.

Prompt Engineering

Prompt engineering (see FAQs on the Claude product page [1]) is a popular strategy to induce GPT-style models to behave in accordance with the user's intentions. The strategy can be used to improve the quality of responses (i.e. improve performance) and decrease the likelihood of certain risks (e.g. confabulations). This includes "context loading", format standardization, persona adoption, and numerous other approaches.

Certifications
Certifications & Compliance
Conclusion
Conclusion & Recommendation

Anthropic's Claude language model and API offer substantial capabilities along with risks that vary in severity. Anthropic has implemented a Constitutional AI approach aimed at aligning the model's behavior with principles of being helpful, harmless, and honest. While this approach mitigates some risks, others remain challenging to address comprehensively and persist at varying levels.

Organizations considering using Claude for application development or to augment human functions should weigh these risks carefully against the benefits of the technology. They should evaluate how their intended use case may interact with the model's risk surfaces and consider implementing additional controls, especially for applications built on top of the Claude API. Regular monitoring and governance practices are recommended. Ultimately, as with any AI system, the risks associated with Claude can be managed but not entirely eliminated.

Anthropic continues to research methods for improving model alignment and has committed to proactively addressing newly identified issues. However, as models become more capable and complex, risks are likely to evolve as well. Organizations adopting Claude, and stakeholders interacting with applications built on it, should remain vigilant to changes in the risk landscape and push for maximized transparency from Anthropic into their model development and evaluation processes. Overall, while promising, Claude and similar models require cautious and conscientious development and deployment to fulfill their potential benefit to humanity.

OpenAI GPT API

LLM
View full profile

Developers considering using this technology should do so with eyes open to these risks and a commitment to responsible development practices that attempt to mitigate risks without "watering down" capabilities. No model or API is perfectly safe or governable. Individual developers and organizations will need to determine if the benefits of this technology outweigh risks within their specific use case and risk tolerance. Profile last updated: July 13, 2023

Risks
Risk Assessment
Risk Present Built-in Mitigation
Abuse & Misuse ⚠️
Compliance ⚠️
Environmental & Societal Impact ⚠️
Explainability & Transparency ⚠️
Fairness & Bias ⚠️
Long-term & Existential Risk ⚠️
Performance & Robustness ⚠️
Privacy ⚠️
Security ⚠️
Mitigation
Mitigation Measures

RLHF alignment fine-tuning & system prompts

The GPT-3.5-Turbo and GPT-4 models have undergone substantial fine-tuning [1] with the goal of making the models more amenable to human interaction (i.e. instruction/chat tuning) and more aligned with human requirements for factuality and likelihood to cause harm. As a probablistic model, these efforts are mitigative but generally do not eliminate risk.

Content moderation endpoint

Independent of model-embodied content moderation measures, the Chat Completions API includes a content moderation endpoint [19]. The endpoint scans for and flags 7 categories of inappropriate content based on OpenAI's content policies: hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. The details of this content moderation filter (e.g. what model is used, threshold for a prompt being flagged, how well the model performs, etc.) are not publicly available.

Regular updates

OpenAI regularly updates its models as the organization continues research into capabilities and safety measures. The API allows users to automatically update to the latest models or fix applications to a specific model version [2].

Azure Cloud net-neutral carbon footprint

Microsoft, OpenAI's cloud partner [21], claims to be carbon neutral [27]. They achieve this through the purchase of carbon credits and offsets. They have publicly committed to reaching net-zero emissions by 2030. It is likely that all systems relevant to the OpenAI Chat Completions API are covered by Microsoft's carbon accounting.

Certifications
Certifications & Compliance
Conclusion
Conclusion & Recommendation

The OpenAI Chat Completions API and its associated GPT-4 and GPT-3.5-Turbo models represent substantial capabilities and risks. As the product literature states, these models represent "significant advancements" in the field of large language models. They also exhibit numerous known weaknesses and capabilities for misbehavior and unintentional harm.

Developers considering using this technology should do so with eyes open to these risks and a commitment to responsible development practices that attempt to mitigate risks without "watering down" capabilities. No model or API is perfectly safe or governable. Individual developers and organizations will need to determine if the benefits of this technology outweigh risks within their specific use case and risk tolerance.

Research into the capabilities and risks of GPT-4 and GPT-3.5-Turbo is ongoing. As the models continue to proliferate, additional risks are likely to surface, as are additional mitigation strategies. Developers should keep an ongoing eye to developments from OpenAI and the research community. Open communication between API users and OpenAI will be key to continued progress.

Github Copilot

Code Generation
View full profile

GitHub Copilot introduces several risks common to AI systems, including risks of model bias, privacy issues, compliance issues, and environmental impact. Some mitigation measures to address certain risks. These include a content filter to block offensive language and personally identifiable information, purchasing of carbon offsets to achieve carbon neutrality, and internal testing to evaluate accessibility. However, the tool lacks explainability into how it generates suggestions, visibility into how it is used, and configurability of its controls. Profile last updated: July 13, 2023

Risks
Risk Assessment
Risk Present Built-in Mitigation
Abuse & Misuse ⚠️
Compliance ⚠️
Environmental & Societal Impact ⚠️
Explainability & Transparency ⚠️
Fairness & Bias ⚠️
Long-term & Existential Risk - N/A
Performance & Robustness ⚠️
Privacy ⚠️
Security ⚠️
Mitigation
Mitigation Measures

Content Filter GitHub Copilot has a content filter to address several common risks of genAI systems. The filter is described in the GitHub Copilot FAQs page [1]. Its functionality is as follows:

  • It "blocks offensive language in the prompts and to avoid synthesizing suggestions in sensitive contexts". No details are provided regarding the effectiveness, performance, or robustness of this feature. This feature appears to be enabled by default and does not appear, from available documentation, to be configurable [1].
  • It "checks code suggestions with their surrounding code of about 150 characters against public code on GitHub. If there is a match or near match, the suggestion will not be shown to [the user]." GitHub does not provide details about the effectiveness, performance, or robustness of this feature. This feature is configurable for Organization customers. It is not documented whether the feature is enabled by default in Organization accounts [9].
  • It "blocks emails when shown in standard formats". According to GitHub, "it’s still possible to get the model to suggest this sort of content if you try hard enough." No details are provided regarding the effectiveness, performance, or robustness of this feature. This feature appears to be enabled by default and does not appear, from available documentation, to be configurable [1].

Carbon Neutrality Microsoft, GitHub's parent, claims to be carbon neutral [10]. They achieve this through the purchase of carbon credits and offsets. They have publicly committed to reaching net-zero emissions by 2030. Because of GitHub's status as a Microsoft subsidiary, it is likely that all systems relevant to GitHub Copilot (including the Codex model) are deployed on Microsoft's Azure cloud platform and thus are included in Microsoft's broader carbon accounting.

Accessibility Testing GitHub is "conducting internal testing of GitHub Copilot’s ease of use by developers with disabilities" [1]. The company encourages users who identify usability issues to reach out to a dedicated email address. No details are provided about the status of these tests.

Vulnerability Filter As of the February 2023 update to Copilot, the service includes a "vulnerability prevention system" which uses large language models to analyze generated code with the goal of identifying and blocking common security vulnerabilities, such as SQL injection, path injection, and hardcoded credentials [12]. Credo AI was unable to find details on the performance or effectiveness of this mitigation measure. The vulnerability filter is unlikely to identify and block all possible security vulnerabilities.

Certifications
Certifications & Compliance
Conclusion
Conclusion & Recommendation

GitHub Copilot is an AI-based tool designed to assist software developers in writing code. It is powered by OpenAI's Codex model, a pretrained language model trained on millions of lines of open-source code. Copilot provides code suggestions and completions based on the context of the code the developer is currently writing. It is intended to increase developer productivity, satisfaction, and code quality. However, it also introduces several risks common to AI systems, including risks of model bias, privacy issues, compliance issues, and environmental impact.

GitHub and OpenAI have implemented some mitigation measures to address certain risks. These include a content filter to block offensive language and personally identifiable information, purchasing of carbon offsets to achieve carbon neutrality, and internal testing to evaluate accessibility. However, the tool lacks explainability into how it generates suggestions, visibility into how it is used, and configurability of its controls. Formal evaluations of the tool have found it can increase developer speed and satisfaction but that it struggles with some complex programming tasks, achieving a wide variance of correctness results across evaluations.

Although a useful productivity tool, GitHub Copilot introduces risks that require governance to address. The lack of visibility and configurability poses challenges for organizations aiming to manage risks from the tool and ensure compliant and ethical use. Additional research into the tool’s abilities, limitations, and best practices for oversight would benefit users and stakeholders. With proper governance, Copilot could become an asset, but without it, it risks becoming a liability.

Microsoft 365 Copilot

AI Assistant
View full profile

While Microsoft Copilot shows promise for improving productivity and unlocking creativity, it also introduces risks around responsible development and use of AI that organizations must consider. The risks stem predominantly from the model -- GPT-4 -- on which Copilot relies. OpenAI, GPT-4's developers, have implemented safeguards to mitigate certain risks, but risks cannot be eliminated in a probabilistic system. Profile last updated: July 13, 2023

Risks
Risk Assessment
Risk Present Built-in Mitigation
Abuse & Misuse ⚠️
Compliance ⚠️
Environmental & Societal Impact ⚠️
Explainability & Transparency ⚠️
Fairness & Bias ⚠️
Long-term & Existential Risk - N/A
Performance & Robustness ⚠️
Privacy ⚠️
Security ⚠️
Mitigation
Mitigation Measures

Prompt Engineering

Prompt engineering [12, 13] is a popular strategy to induce large language models to behave in accordance with the user's intentions. The strategy can be used to improve the quality of responses (i.e. improve performance) and decrease the likelihood of certain risks (e.g. confabulations). The strategy can also often be used to aid in explainability, by prompting a model to explain how it reached a conclusion during reasoning (though this is subject to confabulation risk).

The class of prompt engineering strategies is rapidly expanding. The effectiveness of any one strategy is subject to ongoing research and will depend on the use case.

It is unclear how effective prompt engineering strategies from other LLM interaction domains carry over to Microsoft Copilot. Demo videos [1] suggest a user interface that is very different from traditional LLM chat applications and Copilot could be substantially more constrained. This could limit the effectiveness of prompt engineering or eliminate the ability to perform prompt engineering entirely.

Certifications
Certifications & Compliance
Conclusion
Conclusion & Recommendation

While Microsoft Copilot shows promise for improving productivity and unlocking creativity, it also introduces risks around responsible development and use of AI that organizations must consider. The risks stem predominantly from the model -- GPT-4 -- on which Copilot relies. OpenAI, GPT-4's developers, have implemented safeguards to mitigate certain risks, but risks cannot be eliminated in a probabilistic system. Organizations should evaluate whether the rewards of using Copilot justify accepting the residual risks. If use is pursued, organizations must implement stringent governance and use practices to overcome the limited available visibility into how employees interact with Copilot. For some organizations, especially those handling highly sensitive data or subject to strict regulation, Copilot may be unsuitable until more robust safeguards and oversight capabilities become available.

Midjourney

Image Generation
View full profile

While Midjourney aims to provide a creative tool for users, the service poses risks that span privacy, security, bias, and misuse. Mitigations are provided through moderation, subscription options, and prudent usage practices, though significant risks remain, especially for sensitive use cases. Overall, Midjourney should be used carefully and thoughtfully. For personal use, especially casual or recreational use, Midjourney can be an entertaining and inspiring creative aid. Report last updated: June 25, 2019 Profile last updated: July 13, 2023

Risks
Risk Assessment
Risk Present Built-in Mitigation
Abuse & Misuse ⚠️
Compliance ⚠️
Environmental & Societal Impact ⚠️
Explainability & Transparency ⚠️
Fairness & Bias ⚠️
Long-term & Existential Risk - N/A
Performance & Robustness ⚠️
Privacy ⚠️
Security ⚠️
Mitigation
Mitigation Measures

Mitigations that "ship" with the service and model

Moderation

  • Midjourney maintains Community Guidelines with the goal of achieving "PG-13" appropriateness [15]. According to the company, some prompts are blocked automatically. The details and strength of this moderation are not publicly available. The automated moderation targeted at "NSFW" (not safe for work) content; Credo AI believes, but cannot confirm, that the moderation tool is unlikely to identify or block subtler risk issues, such as the perpetuation of stereotypes (e.g. associating some professions with people presenting as a particular gender or ethnic background).

Asset ownership provisions and DMCA

  • Midjourney maintains a process for individuals to make takedown requests associated with suspected violation of their intellectual property rights under the Digital Millennium Copyright Act (DMCA). This mitigation primarily applies to individuals whose works have been identified in Midjourney outputs (with or without modification) and does not represent a mitigation for users of Midjourney hoping to avoid infringing on others' intellectual property rights.

Mitigations available through agreement or paid subscription

Stealth Mode and Direct Messaging

  • Paid users have the ability to use the Midjourney service through a private direct message chat with the Midjourney bot on the company's Discord server. This obfuscates prompts and outputs from other Midjourney users. Midjourney maintains the right, however, to post prompts and outputs on their website, meaning this mitigation has limited scope.
  • Users on "Pro" plans have access to Midjourney's "Stealth Mode" feature, which communicates to the company that prompts and outputs should not be published to the Midjourney website. Midjourney's Terms of Service uses non-committal language regarding whether Stealth Mode requests will be honored: "we agree to make best efforts not to publish any Assets You make in any situation where you have engaged stealth mode in the Services." [6]

Mitigations that can be implemented through customized use of the service

Prompt Engineering

  • As with any text-input generative AI model, prompt engineering can play a significant role in the quality and appropriateness of outputs. The effectiveness of any prompt engineering strategy is difficult to objectify or quantify. Several guides have been published on the internet recently, e.g., [16, 17, 18].

Private Discord Server

  • Midjourney's option for users to integrate its bot into private Discord services can help mitigate privacy and content-related risks. Discord supports various forms of moderation, tailored to a server administrator's needs. Some of these moderation techniques rely on OpenAI's modeling capabilities, which pose their own risks.

Certifications
Certifications & Compliance
Conclusion
Conclusion & Recommendation

Midjourney is an AI-powered text-to-image generation service that enables users to generate digital art and content. While Midjourney aims to provide a creative tool for users, the service poses risks that span privacy, security, bias, and misuse. Mitigations are provided through moderation, subscription options, and prudent usage practices, though significant risks remain, especially for sensitive use cases.

Overall, Midjourney should be used carefully and thoughtfully. For personal use, especially casual or recreational use, Midjourney can be an entertaining and inspiring creative aid. However, for professional use, especially in regulated industries or for the creation of business-critical assets, the risks posed should be weighed carefully against the rewards. The AI field is progressing rapidly, and services like Midjourney will continue to improve, but AI-based tools demand close monitoring and governance to be used responsibly.

Our Methodology for Developing Vendor Risk Profiles

Based on our expertise in AI risk management and risk mitigation approaches

Each Vendor Tool Risk Profile is a report that indicates whether, during development and deployment of a specific generative AI tool, the vendor has taken steps to mitigate a standard set of generative AI risks based on publicly available documentation and, where possible, technical evaluations. We have summarized any publicly available information into a report template and cited the sources of information on any steps or action taken by the vendor to mitigate a specific risk.

In order to create a standard GenAI Vendor Tool Risk Profile report template, we started by defining a standard set of nine AI risks based on existing AI risk frameworks like the NIST AI Risk Management Framework and the OECD AI Principles, as well as recent academic research into the specific risks of generative AI systems.

Are you an AI tool provider?

We’d love to feature your Responsible AI Disclosure on our website, so you can start building trust with the market.