Microsoft 365 Copilot

While Microsoft Copilot shows promise for improving productivity and unlocking creativity, it also introduces risks around responsible development and use of AI that organizations must consider. The risks stem predominantly from the model -- GPT-4 -- on which Copilot relies. OpenAI, GPT-4's developers, have implemented safeguards to mitigate certain risks, but risks cannot be eliminated in a probabilistic system. Profile last updated: July 13, 2023

Product Description

Microsoft 365 Copilot, announced in March 2023, is a new addition to the Microsoft 365 suite that leverages AI technologies, specifically large language models (LLMs) such as GPT-4, and integrates with user data from the Microsoft Graph and Microsoft 365 apps. Based on publicly available demo videos and Microsoft's marketing materials, Copilot embeds GPT-4 support into Microsoft Word, Excel, PowerPoint, Outlook, Teams, and other applications to drive efficiency and user experience. The product supports text generation and editing, providing "first draft" capabilities and features to revise text for readability and tone in. The product also supports data analysis and visualization tasks [1].

Microsoft advertises integration with enterprise data as a key feature. The Copilot product aims to understand an organization's existing documents and ongoing content creation to provide tailored recommendations during text generation tasks. Microsoft claims this includes the ability to generate summaries of organizational information like strategy and product documentation, and author communications and memos on behalf of employees [1].

Microsoft has not published meaningful updates on the Copilot service since the March 16, 2023 product announcement. The service is only available to a select, non-disclosed group of customers at this time.

Microsoft, the developer of Copilot for Microsoft 365, was established in 1975. Microsoft develops a range of software and hardware products, most notably its Windows operating system and its Microsoft 365 Suite (formerly Microsoft Office) of productivity applications. In recent years, Microsoft has invested heavily in AI research, including multiple investments in OpenAI, reportedly totaling $13 billion [2]. Microsoft is one of the largest, most profitable companies in the world and should be considered stable from the standpoint of ongoing customer service and product support.

Profile last updated: July 13, 2023

Intended Use Case

Microsoft intends for its Copilot product to permeate the majority of cognitive work tasks, suggesting Copilot can help meaningfully tackle the "80%" of time "consumed with busywork that bogs us down" [1]. Copilot is meant to be usable by any user of its existing suite of productivity-focused applications to improve efficiency and "unlock" enhanced creativity [1].

Among the potential uses named in the product announcement are: summarizing emails and drafting replies, summarizing meeting discussions and suggesting action items, "first drafting" documents, converting documents to PowerPoint presentations, and analyzing data and creating visualizations "in seconds".

Risk and Mitigation Summary

The following table provides a quick summary of which common genAI-driven risks are present in Microsoft Copilot and which risks have been addressed (but not necessarily eliminated) by deliberate mitigative measures provided with the product. As discussed above, many details of the Copilot offering specifically are undocumented. Where Credo AI believes it is appropriate, we rely on our assessment of the GPT-4 model as background to our discussion of the risks and built-in mitigations of the Copilot product. It is possible that Microsoft has developed further mitigations beyond those implemented in OpenAI's GPT-4 model. If so, such mitigations are not advertised and their effectiveness is unknown.

For a definition of each risk and details of how these risks arise in the context of the Copilot product, see below. The details and scenarios discussed for each risk are non-exhaustive.

Risk Present Built-in Mitigation
Abuse & Misuse ⚠️
Compliance ⚠️
Environmental & Societal Impact ⚠️
Explainability & Transparency ⚠️
Fairness & Bias ⚠️
Long-term & Existential Risk - N/A
Performance & Robustness ⚠️
Privacy ⚠️
Security ⚠️

Abuse & Misuse

Pertains to the potential for AI systems to be used maliciously or irresponsibly, including for creating deepfakes, automated cyber attacks, or invasive surveillance systems. Abuse specifically denotes the intentional use of AI for harmful purposes.

Insider threats

  • Because the GPT model underlying the Copilot software is capable of generating arbitrary text, it could be used to support harmful uses in corporate contexts. These may include adversarial marketing tactics, inappropriate political influence operations, acts of fraud, and more. The potential for this risk is contingent upon the existence of employees within an organization acting unethically, with or without management approval.
  • Corporate espionage/theft: Because Copilot is deeply linked to an organization's systems -- documents, chat messages, video chat conversations, etc. -- employees with access to the software may have the ability to steal and disseminate corporate secrets in ways not previously available. For instance, employees may be able to leverage the generative AI functionalities built into Copilot to automate the theft of information or to rapidly summarize large amounts of sensitive information while decreasing the chance of detection by avoiding stealing the exact information and files at issue.

The GPT model which underlies MS Copilot has undergone substantial alignment-oriented fine-tuning targeted at addressing the potential for misuse relating to certain topics. See the Mitigations section below for more details.

Microsoft 365's enterprise subscriptions generally permit role-based and user-level access permissions, which can mitigate some insider threats. See the Mitigations section for more details.

Compliance

Involves the risk of AI systems violating laws, regulations, and ethical guidelines (including copyright risks). Non-compliance can lead to legal penalties, reputational damage, and loss of user trust.

Copyright infringement

  • The GPT model underlying Microsoft Copilot was trained on, among other sources, publicly available internet data [3]. OpenAI's public documentation does not provide sufficient detail on their data sources to determine the copyright protection status of each training sample. Assuming the training data does contain copyright-protected content, it is possible that the model, as part of functioning within Copilot, will generate (i.e. reproduce) text identical to or substantially similar to copyright-protected text. The legality of the use of LLM-generated text subject to ongoing public debate [4, 5].

Copyright claims

  • Separately, the defensibility of intellectual property rights for works generated with AI assistance remains uncertain. For instance, the U.S. Copyright Office recently issued guidance [6] suggesting that only the parts of a work that are created by a human may be eligible for patent or copyright protection. Organizations using Microsoft Copilot to assist with corporate work, including the creation of intellectual property, may have limited ability to claim legal protection for their work.

Regulatory compliance

  • Because the Copilot product is capable of generating arbitrary text, it could be used in the service of activities that violate laws and regulations in the user's jurisdiction. For instance, a company's HR employee could leverage Copilot to automate analysis tasks for job applicant screening using AI-generated statistical formulas in the Excel software. Absent specific prompting to avoid biased decision-making, doing so could violate anti-discrimination laws and AI-specific laws.
  • Use of Copilot could also violate data security and data privacy laws. For instance, Microsoft supports HIPAA compliance for a variety of its product offerings, such as its Azure cloud service, but the company does not disclose whether use of Copilot specifically is HIPAA compliant. Providing patient health information to Copilot for the sake of having Copilot perform analyses or generate text could expose organizations to regulatory risk.

Organizational compliance

  • The Copilot product is not innately aware of a particular organization's internal policies regarding the use of generative AI tools and their outputs. Without specifically imposing controls, employees could inadvertently or deliberately use Copilot in ways that violate organizational policy.

The fine-tuning process used during the GPT-4 model's development addresses some risks of violating applicable laws and regulations by curbing certain problematic topics. See the Mitigations section for more details.

Environmental & Societal Impact

Concerns the broader changes AI might induce in society, such as labor displacement, mental health impacts, or the implications of manipulative technologies like deepfakes. It also includes the environmental implications of AI, particularly the strain on natural resources and carbon emissions caused by training complex AI models, balanced against the potential for AI to help mitigate environmental issues.

Labor market disruption

  • There is significant concern that AI models and tools like Copilot could significantly disrupt the "white collar", cognitive-task labor market. Research from OpenAI estimates [7] that "around 80% of the U.S. workforce could have at least 10% of their work tasks affected" by LLMs and "19% of workers may see at least 50% of tasks impacted". It is uncertain whether this disruption will lead to worker displacement or a need for worker re-training to focus disrupted labor on tasks not easily accomplished by AI. Microsoft Copilot represents one of the first full feature applications that could lead to this hypothesized world.

Carbon footprint

  • Microsoft has not disclosed sufficient details about the AI models in Copilot to enable a rigorous accounting of the energy consumption or carbon footprint of the service. Microsoft claims net-neutral carbon emissions through the purchase of carbon credits. See the Mitigations section for more details.

User interaction and dependence

  • In line with the risk of broad labor market disruption, use of Copilot may lead to technical reliance on the tool for completing work tasks. In particular, as workers 'assign' more labor to the AI system, they may lose proficiency in skills traditionally associated with those tasks through lack of practice. This can lead to gaps in operational competency within organizations.

Explainability & Transparency

Refers to the ability to understand and interpret an AI system's decisions and actions, and the openness about the data used, algorithms employed, and decisions made. Lack of these elements can create risks of misuse, misinterpretation, and lack of accountability.

Data transparency

  • Information on the training data used to train Copilot's models is either limited or unavailable. According to OpenAI's technical report [3], the GPT-4 model was trained using a combination of publicly available data and data licensed from third-party providers. The data were generally collected from the internet. The model underwent fine-tuning using the reinforcement learning from human feedback (RLHF) paradigm. The details of this fine-tuning data are unavailable.
  • Microsoft's product marketing materials [1] suggest other models are built into the service. For instance, descriptions of the Business Chat feature suggest the use of a speech-to-text model, while descriptions of Copilot's integration with PowerPoint suggest the use of a text-to-image model. No details are available for these models, including who developed them, what data were used to train the models, and how well the models perform.

Explainability of model outputs

  • In general, the GPT model underlying Copilot's text-generation capabilities provides no explanation for its outputs. Microsoft has not published any information suggesting a difference for Copilot; this applies both for the GPT-4 model and any other models that may support the product.

Design decisions

  • Many of the design details, such as model architecture, training data, and compute budget for Copilot's models are explicitly kept private for reasons of competitive advantage [3].
  • Copilot's product documentation [1] implies the use of speech-to-text models for transcription of video chat meetings. Microsoft does not disclose which model(s) are used.

Prompting strategies exist to address the non-explainability of model outputs when working with raw models. It is unclear whether these strategies can be applied within Microsoft Copilot. These strategies have varying effectiveness. See the Mitigations section for more details.

Fairness & Bias

Arises from the potential for AI systems to make decisions that systematically disadvantage certain groups or individuals. Bias can stem from training data, algorithmic design, or deployment practices, leading to unfair outcomes and possible legal ramifications.

Multi-lingual support

  • Microsoft does not disclose whether Copilot supports languages other than English at this time. The GPT-4 model was trained on multiple languages [3]. The model's developers demonstrated a drop-off in capabilities for languages other than English on a specific natural language processing benchmark. It is impossible to project how the model and product are likely to perform in the uses intended for Microsoft Copilot when used by non-English users.
  • According to OpenAI [1], many of the risk mitigations built into the GPT-4 model are targeted at English and a US user base. As a consequence, mitigative effects are likely lower for non-English languages (i.e., the models are expected to be more likely to confabulate and produce offensive content when prompted in languages or dialects other than American English).
  • Copilot's product documentation [1] implies the use of a speech-to-text model for transcription of video chat meetings. Microsoft does not disclose whether the model supports non-English languages.

Offensive or biased outputs

  • GPT-4, the model underlying Microsoft Copilot, is known to occasionally output profanity, sexual content, stereotypes, and other types of biased or offensive language.

OpenAI's fine-tuning process and content filters address some risks of perpetuating biases or behaving unfairly by curbing certain problematic topics. Microsoft does not disclose further mitigation efforts. See the Mitigations section for more discussion.

Long-term & Existential Risk

Considers the speculative risks posed by future advanced AI systems to human civilization, either through misuse or due to challenges in aligning their objectives with human values. See the Societal Impacts section for a discussion on the long-term potential for growing reliance on Copilot for carrying out work tasks.

Copilot does not, at this writing, appear to present long-term or existential risks.

Performance & Robustness

Pertains to the AI's ability to fulfill its intended purpose accurately and its resilience to perturbations, unusual inputs, or adverse situations. Failures of performance are fundamental to the AI system performing its function. Failures of robustness can lead to severe consequences, especially in critical applications.

Confabulations

  • Microsoft Copilot's text generation and data analysis capabilities are enabled through integration with the GPT-4 large language model (LLM). LLMs like GPT-4 are known to "confabulate" facts and information. They are also known to make errors in reasoning, including basic arithmetic errors. The frequency of this behavior depends on the task given to the model. Microsoft acknowledges this phenomenon in their announcement post [1]: "Sometimes Copilot will be right, other times usefully wrong." Users of Copilot risk incorporating its non-factual or illogical outputs into their work product.
  • Copilot's product documentation [1] implies the use of speech-to-text models for transcription of video chat meetings. Microsoft does not disclose the performance of these models.

Robustness

  • Copilot's performance on a given task will necessarily be a function of the prompt or instruction and any other inputs provided to the product. Microsoft has not provided any information on the quality of Copilot's outputs and reasoning in the situations for which it is designed to be used. Microsoft has not provided error rates or estimates of how frequently user-in-the-loop action needs to be taken to correct Copilot's "usefully wrong" nature.

OpenAI claims substantial mitigation of confabulation risks through its fine-tuning procedures. Microsoft does not clarify what, if any, measures have been adopted on top of those built into the GPT-4 model by OpenAI. No mitigation strategy is 100% effective. Please see the Mitigations section for more details.

Privacy

Refers to the risk of AI infringing upon individuals' rights to privacy, through the data they collect, how they process that data, or the conclusions they draw.

Data collection and re-use

  • Microsoft Copilot does not use customer data for future model development or re-training [8]. This mitigates the risk of future models generating customer data for parties external to the customer. Microsoft does not have a publicly available, complete data use policy for Copilot. Microsoft's Azure OpenAI Service may serve as a guide: the Azure OpenAI Service also does not use customer-submitted data for future model training. Nevertheless, for content moderation purposes, specialist Microsoft employees may, on occasion, view customer prompts and outputs [9]. Microsoft allows customers to apply for exemption of this moderation; absent an exemption, the Copilot product may not be suitable for some organizations working with highly sensitive data or data subject to strict regulation [9].

Reproduction of PII

  • From training data: Because the GPT-4 model was trained on a large corpus of text data, including, potentially publicly available personal information [3], the model may occasionally generate information about individuals who have a public internet presence. By using the Copilot product, users run the risk of "re-leaking" (i.e., elevating) this information, which has the potential to harm the individual to whom the data pertains.
  • From user data: Because Microsoft Copilot has access to user and company data, including emails, documents, chats, and other sensitive information, it has the potential to include these data in its outputs. Without careful review of Copilot-generated content by the user, these data could be leaked publicly or to individuals within an organization who should not have access. With proper access management controls, Copilot cannot unilaterally leak data to unapproved individuals [8].

Microsoft's policy on data retention and re-training represents a default privacy risk mitigation measure. Microsoft's option to apply for exemption to human-based content moderation represents a "next-step" mitigation. We do not discuss these mitigations further.

Security

Encompasses potential vulnerabilities in AI systems that could compromise their integrity, availability, or confidentiality. Security breaches could result in significant harm, from incorrect decision-making to privacy violations.

Data sequestration

  • Microsoft Copilot accesses customer data according to the customer's enterprise agreement with Microsoft. Data generally are accessed through secure user-based controls. Data supplied to the models underlying Copilot are "in-flight" only, meaning data are not stored on-disk with respect to the model and do not mix with data from other customers [8].

Prompt injection

  • OpenAI's GPT-4 model is susceptible to "prompt injection" attacks, whereby a malicious user enters a particular style of instruction to encourage the model to (mis)-behave in ways advantageous to the user. This misbehavior can include circumventing any and all safety precautions "built-in" to the model through fine-tuning.
  • Microsoft Copilot is likely susceptible to this variety of attacks. The risk associated with prompt injection attacks targeted at Copilot is unclear. Data access, for instance, appears to be managed by Microsoft systems separate from the model. It is unlikely that the model will be able to manufacture credentials to access data that it is not meant to access and it is unlikely that the model will serve data to users who do not have the proper authorization.

Microsoft does not address the possibility of prompt injection in its documentation. As such, we do not discuss this issue in the Mitigations section.

Mitigation Measures

In this section, we discuss mitigation measures that are built-in to the product (regardless of whether they are enabled by default). We also comment on the feasibility of a procuring organization governing the use of the tool by its employees.

Mitigations that "ship" with the product

RLHF alignment fine-tuning

OpenAI's GPT-4 model has undergone substantial fine-tuning [3] with the goal of making the model more amenable to human interaction (i.e. instruction/chat tuning) and more aligned with human requirements for factuality, likelihood to cause harm, and likelihood to engage in conversations regarding problematic or illicit subjects. As a probabilistic model, these efforts are mitigative but generally do not eliminate risk.

According to OpenAI, the model was fine-tuned using reinforcement learning from human feedback (RLHF) [3]. OpenAI's disclosure of their RLHF process is opaque. For instance, they do not disclose the specific categories, such as "toxicity" or "harmfulness", which were targeted in the RLHF process, nor do they disclose the definitions of these concepts that were used to instruct human and/or AI labelers. It is likely that the content moderation categories used in the content moderation endpoint (see below) are included. Credo AI believes it is unlikely that these are the only categories targeted during RLHF. RLHF is provided as an 'as-is' mitigation; it is not configurable.

The differences between the "out-of-the-box" GPT-4 models available through OpenAI's APIs or Microsoft's Azure OpenAI Service and the model embedded in the Microsoft Copilot product are unknown. It may be that Microsoft has performed additional fine-tuning on the model to guide it towards being useful in the specific contexts that arise in Microsoft's productivity software. Microsoft may have added further guardrails, for instance using their guidance software package [10] to constrain outputs. If so, this information is not publicly available.

Content moderation

Independent of model-embodied content moderation measures, it is unclear what content moderation measure Microsoft has built into the Copilot product. Microsoft's Azure OpenAI Service uses an independent content moderation model to filter outputs from the main LLM [9]. It is not clear whether Microsoft uses this same workflow for moderating outputs generated as part of the Copilot offering. If the same model is used, the specifics of the model, such as the topics it targets or its performance, are not publicly available.

If the same content moderation flow is used for Copilot as for Azure OpenAI Service, this mitigation is generally not configurable without special exemption from Microsoft.

User access management

Microsoft states [1] that "within your tenant, our (..) permissioning model ensures that data won’t leak across user groups. And on an individual level, Copilot presents only data you can access." This suggests a mitigation of leakage of private data within and outside of an organization.

Elimination of all private data leakage is not guaranteed. For instance, Copilot could generate a document with sensitive data (within organizational policy) and a user, without realizing the sensitivity of the information in the document, could share this document to non-approved audiences. This risk vector is made more likely if Copilot does not alert a user about the level of sensitivity of the data it accesses.

This mitigation is configurable.

Azure Cloud net-neutral carbon footprint

Microsoft claims to be carbon neutral [11]. They achieve this through the purchase of carbon credits and offsets. They have publicly committed to reaching net-zero emissions by 2030. It is likely that all systems relevant to the OpenAI Chat Completions API are covered by Microsoft's carbon accounting.

This mitigation is non-configurable.

Non-use of prompts sent to, and outputs received from, the API

According to [1], data submitted to Copilot is not used to train future models or improve Microsoft's services. This eliminates the risk of private data or intellectual property being leaked through model responses to other entities. Per the Terms of Service of Microsoft's OpenAI Service, prompts and responses are stored for up to 30 days to enable monitoring for misuse and illegal use [9]. This monitoring involves direct access by authorized Microsoft employees. Due to the presence of standard, non-AI-specific cybersecurity risk, the risk of data leakage is non-zero. For instance, Microsoft may be susceptible to phishing attacks directed at its employees, which could compromise data it stores, including sensitive data submitted to Copilot. It is unclear whether the same content moderation flow applies to the Copilot offering.

This mitigation is non-configurable, except by special exemption [9] granted directly by Microsoft.

Mitigations that can be implemented through customized use of the API

Prompt Engineering

Prompt engineering [12, 13] is a popular strategy to induce large language models to behave in accordance with the user's intentions. The strategy can be used to improve the quality of responses (i.e. improve performance) and decrease the likelihood of certain risks (e.g. confabulations). The strategy can also often be used to aid in explainability, by prompting a model to explain how it reached a conclusion during reasoning (though this is subject to confabulation risk).

The class of prompt engineering strategies is rapidly expanding. The effectiveness of any one strategy is subject to ongoing research and will depend on the use case.

It is unclear how effective prompt engineering strategies from other LLM interaction domains carry over to Microsoft Copilot. Demo videos [1] suggest a user interface that is very different from traditional LLM chat applications and Copilot could be substantially more constrained. This could limit the effectiveness of prompt engineering or eliminate the ability to perform prompt engineering entirely.

Governability

For an organization to govern its development or use of an AI system, two functionalities are key: the ability of the organization to observe usage patterns among its employees and the ability of the organization to implement and configure controls to mitigate risk. Credo AI assesses systems on these two dimensions.

Per-prompt logging of inputs to and outputs from the Copilot system does not appear to be supported by Microsoft at this time. This severely limits the ability of organizations to govern their employees' use of the product.

Formal Evaluations & Certifications

Evaluations

Research into the capabilities and risk characteristics of the GPT-4 model is ongoing. Research is limited by the fact that the model is not open; it is only accessible through OpenAI's APIs. As a consequence, a large portion of known evaluations were performed by OpenAI directly. Reproducibility is often infeasible. Moreover, it is unclear whether the model used by Microsoft Copilot is identical to the one evaluated by OpenAI as of GPT-4's March 2023 launch and it is unlikely that the evaluations performed by OpenAI are entirely representative of Copilot's "in the wild" usage.

Capabilities

OpenAI's evaluations of GPT-4 are characterized in the academic-style article [3].

As of its release, in March 2023, GPT-4 achieved state-of-the-art performance on several academia-specified benchmarks for large language models. These include multiple choice questions based on broad knowledge (MMLU) and science (AI2 Reasoning), reasoning (HellaSwag, WinoGrande), grade-school math (GSM-8K), and python coding problems (HumanEval). On the MMLU task, when OpenAI translated questions to a variety of other languages, GPT-4 achieved target language performance that exceeded the English performance of competitor models Chinchilla and PaLM on at least 24 languages. OpenAI also reported human-level performance on several real-world knowledge exams such as the Standard Uniform Bar, various Advanced Placement exams, and the SAT.

A substantial and growing body of research beyond that detailed in [3] exists. It is impossible to summarize the entire body of research in this risk profile. Credo AI offers the general guidance that developers considering using Microsoft Copilot should consult the segment of the literature dedicated to their specific use and ask Microsoft directly for further evidence of the product's efficacy and safety. The results cited above will not necessarily carry over to a narrow use case. Performance lapses are feasible depending on context.

No formal evaluations of Microsoft Copilot, itself, have been performed to date. This includes both proprietary evaluations by Microsoft and independent evaluations. Microsoft's product documentation [1] cites data regarding productivity improvements experienced by users of GitHub Copilot. The evaluations of GitHub Copilot were performed by GitHub and are discussed in Credo AI's GitHub Copilot Risk Profile.

Misbehavior

As stated above, large language models are probabilistic and thus any measure of model (non-) alignment is objective only to the extent that the evaluation conditions match real world use. Users of Microsoft Copilot may experience different risk surfaces.

Factuality

  • OpenAI's evaluations suggest GPT-4 displays varying levels of factuality depending on the subject of the tested prompt [3] ranging from approximately 70-81% accuracy on a proprietary evaluation. Likewise, on the TruthfulQA benchmark GPT-4 demonstrates 60% accuracy. These results indicate the models are prone to confabulate facts and information anywhere from 20-50% of the time. As discussed above, this phenomenon can potentially be addressed (but not eliminated) through prompt engineering and context loading strategies, though the efficacy of prompt engineering using Microsoft Copilot's user interface is unclear.

Sensitive Content

  • OpenAI's evaluations suggest GPT-4 produces "toxic" content about .79% of the time on a toxicity benchmark.

Prompt Injection & Jailbreaking

  • Because of the probabilistic nature of the GPT model, it is impossible to anticipate the number or variety of prompts that can be used to successfully jailbreak a model. Formal estimates of the rate of these attacks cannot be obtained.

Certifications

Credo AI has identified the following regulations and standards as relevant to the privacy, security, and compliance requirements of our customers. Microsoft does not advertise compliance for any of these standards with respect to MS Copilot, specifically. Microsoft does advertise compliance, generally, as detailed in the second column. For more details, see https://learn.microsoft.com/en-us/azure/compliance/offerings/ and related pages.

While Microsoft Copilot shows promise for improving productivity and unlocking creativity, it also introduces risks around responsible development and use of AI that organizations must consider. The risks stem predominantly from the model -- GPT-4 -- on which Copilot relies. OpenAI, GPT-4's developers, have implemented safeguards to mitigate certain risks, but risks cannot be eliminated in a probabilistic system. Organizations should evaluate whether the rewards of using Copilot justify accepting the residual risks. If use is pursued, organizations must implement stringent governance and use practices to overcome the limited available visibility into how employees interact with Copilot. For some organizations, especially those handling highly sensitive data or subject to strict regulation, Copilot may be unsuitable until more robust safeguards and oversight capabilities become available.

References

[1] Microsoft Copilot Announcement - https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/

[2] Microsoft’s $13 billion bet on OpenAI carries huge potential along with plenty of uncertainty - https://www.cnbc.com/2023/04/08/microsofts-complex-bet-on-openai-brings-potential-and-uncertainty.html

[3] GPT-4 Technical Report - https://arxiv.org/abs/2303.08774

[4] ChatGPT and Copyright: The Ultimate Appropriation - https://techpolicy.press/chatgpt-and-copyright-the-ultimate-appropriation/

[5] Copyright, Professional Perspective - Copyright Chaos: Legal Implications of Generative AI - https://www.bloomberglaw.com/external/document/XDDQ1PNK000000/copyrights-professional-perspective-copyright-chaos-legal-implic

[6] Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence - https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence#print

[7] GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models - https://arxiv.org/pdf/2303.10130.pdf

[8] Microsoft Copilot Blog Post - https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/

[9] Data, privacy, and security for Azure OpenAI Service - https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy?context=%2Fazure%2Fcognitive-services%2Fopenai%2Fcontext%2Fcontext

[10] Guidance GitHub page - https://github.com/microsoft/guidance

[11] Microsoft will be carbon negative by 2030 - https://blogs.microsoft.com/blog/2020/01/16/microsoft-will-be-carbon-negative-by-2030/

[12] GPT Best Practices (OpenAI API Documentation) - https://platform.openai.com/docs/guides/gpt-best-practices

[13] DAIR Prompt Engineering Guide - https://github.com/dair-ai/Prompt-Engineering-Guide

Notes

Italics denote Credo AI definitions of key concepts.

AI Disclosure: The "Product Description" and "Intended Use Case" sections were generated with assistance from ChatGPT using the GPT-4 model. Credo AI fed official product documentation to the model and prompted the model to summarize or rephrase text according to the desired format for this risk profile. The final text was reviewed for accuracy and suitability and underwent human editing by Credo AI.

The "Conclusion" section was generated with assistance from Anthropic's Claude. Credo AI fed the other sections of this profile to Claude and prompted it to write a conclusion section. As with the other AI-assisted sections, Credo AI reviewed Claude's output for accuracy and suitability and performed edits where appropriate.