Safety is first priority: OpenAI
OpenAI is proud to build and release models that are industry-leading in both capabilities and safety.More than a hundred million users and millions of developers rely on the work of OpenAI's safety teams. OpenAI views safety as an essential investment that must succeed across multiple time horizons, from aligning today’s models to the far more capable systems expected in the future. This work has always been a priority across OpenAI, and the investment in safety will only increase over time.
OpenAI believes in a balanced, scientific approach where safety measures are integrated into the development process from the outset. This ensures that AI systems are both innovative and reliable, delivering benefits to society.
At today’s AI Seoul Summit, OpenAI joins industry leaders, government officials, and members of civil society to discuss AI safety. While there is still more work to do, OpenAI is encouraged by the additional Frontier AI Safety Commitments that OpenAI and other companies agreed upon today. The Commitments call on companies to safely develop and deploy their frontier AI models while sharing information about their risk mitigation measures, aligning with steps OpenAI has already taken. These include a pledge to publish safety frameworks like the Preparedness Framework developed and adopted last year.
10 safety measurements taken by OpenAI
Empirical model red-teaming and testing before release:
OpenAI empirically evaluates model safety before release, both internally and externally, according to its Preparedness Framework and voluntary commitments. OpenAI will not release a new model if it crosses a “Medium” risk threshold from the Preparedness Framework, until sufficient safety interventions are implemented to bring the post-mitigation score back to “Medium.” More than 70 external experts helped assess risks associated with GPT-4 through external red teaming efforts, and these learnings were used to build evaluations based on weaknesses in earlier checkpoints to better understand later checkpoints.
Alignment and safety research:
OpenAI's models have become significantly safer over time. This can be attributed to building smarter models that typically make fewer factual errors and are less likely to output harmful content, even under adversarial conditions like jailbreaks. This improvement is also due to focused investment in practical alignment, safety systems, and post-training research. These efforts work to improve the quality of human-generated fine-tuning data and, in the future, the instructions the models are trained to follow. OpenAI is also conducting and publishing fundamental research aimed at dramatically improving system robustness to attacks like jailbreaks.
Monitoring for abuse:
As increasingly capable language models are deployed via the API and ChatGPT, OpenAI leverages a broad spectrum of tools, including dedicated moderation models and the use of its own models for monitoring safety risks and abuse. Critical findings have been shared along the way, including a joint disclosure with Microsoft of state actor abuse of the technology, so that others can better safeguard against similar risks. GPT-4 is also used for content policy development and content moderation decisions, enabling a faster feedback loop for policy refinement and reducing the exposure of abusive material to human moderators.
Systematic approach for safety:
OpenAI implements a range of safety measures at every stage of a model's life cycle, from pre-training to deployment. As development advances toward safer and more aligned model behavior, investment is made in pre-training data safety, system-level model behavior steering, a data flywheel for continued safety improvement, and robust monitoring infrastructure.
Protecting children:
A critical focus of OpenAI's safety work is protecting children. Strong default guardrails and safety measures have been built into ChatGPT and DALL-E to mitigate potential harms to children. In 2023, OpenAI partnered with Thorn’s Safer to detect, review, and report Child Sexual Abuse Material to the National Center for Missing and Exploited Children if users attempt to upload it to the image tools. OpenAI continues to collaborate with Thorn, the Tech Coalition, All Tech is Human, Commonsense Media, and the broader tech community to uphold the Safety by Design principles.
Election integrity:
OpenAI collaborates with governments and stakeholders to prevent abuse, ensure transparency on AI-generated content, and improve access to accurate voting information. To achieve this, OpenAI has introduced a tool for identifying images created by DALL-E 3, joined the steering committee of the Content Authenticity Initiative (C2PA), and incorporated C2PA metadata in DALL-E 3 to help people understand the source of media they find online. ChatGPT now directs users to official voting information sources in the U.S. and Europe. Additionally, OpenAI supports the bipartisan “Protect Elections from Deceptive AI Act” proposed in the U.S. Senate, which would ban misleading AI-generated content in political advertising.
Investment in impact assessment and policy analysis:
OpenAI's impact assessment efforts have been widely influential in research, industry norms, and policy, including early work on measuring the chemical, biological, radiological, and nuclear (CBRN) risks associated with AI systems, and research estimating the extent to which different occupations and industries might be impacted by language models. OpenAI also publishes pioneering work on how society can best manage associated risks, such as working with external experts to assess the implications of language models for influence operations.
Security and access control measures:
OpenAI prioritizes protecting customers, intellectual property, and data. AI models are deployed to the world as services, with access controlled via API to enable policy enforcement. Cybersecurity efforts include restricting access to training environments and high-value algorithmic secrets on a need-to-know basis, internal and external penetration testing, a bug bounty program, and more. OpenAI believes that protecting advanced AI systems will benefit from an evolution of infrastructure security and is exploring novel controls like confidential computing for GPUs and applications of AI to cyber defense. To empower cyber defense, OpenAI funds third-party security researchers through the Cybersecurity Grant Program.
Partnering with governments:
OpenAI partners with governments around the world to inform the development of effective and adaptable AI safety policies. This includes showing work and sharing learnings, collaborating to pilot government and other third-party assurance, and informing the public debate over new standards and laws.
Safety decision-making and Board oversight:
As part of the Preparedness Framework, OpenAI has an operational structure for safety decision-making. The cross-functional Safety Advisory Group reviews model capability reports and makes recommendations ahead of deployment. Company leadership makes the final decisions, with the Board of Directors exercising oversight over those decisions.