Research Scientist, Societal Impacts

Anthropic

  • San Francisco, CA
  • Permanent
  • Full-time
  • 20 days ago
As a societal impacts research scientist, you'll devise new ways to measure and assess Anthropic systems for societally or policy-relevant traits and capabilities, and will discuss what we discover in our research publications and policy campaigns.Strong candidates will have a track record of running experiments relating to machine learning systems, working in a fast-paced startup environment, and an eagerness to develop their own technical skills so they can best interface with our systems. The ideal candidate will enjoy a mixture of running experiments and doing research, developing new tools and evaluation suites, and evangelizing this and other work to key stakeholders and communities.Responsibilities:
  • Designing, proposing and running experiments on our systems. These will range from developing basic probes for different capabilities in models (such as assessing a given system for fairness traits); to designing comparative studies to understand how well these systems complement humans (for example, by analyzing how interacting with an AI system alters human behavior); to developing comprehensive suites of tests we can run complicated systems through (such as developing ways to evaluate the broad capabilities displayed by modern language models).
  • Partnering closely with our other researchers to fully understand, analyze and evaluate state-of-the-art research across a broad range of capabilities and safety research.
  • Interfacing with, providing feedback on and publicly communicating about our internal technical infrastructure and tools, with a goal of them better supporting the societal impact analysis of AI systems.
  • Developing tools, evaluation suites and tests that enable understanding of AI systems for policymakers, academia, civil society and more.
  • Working with the policy and other research teams to share these tools to the above stakeholders and incorporating their feedback and requests.
  • Sharing your insights from your work by contributing to Anthropic research publications, giving presentations to external groups, and partnering with our policy team to articulate insights to governments.
  • Generating net-new insights about the potential societal impact of systems being developed by Anthropic, and using this understanding to inform Anthropic strategy, research and policy campaigns.
You may be a good fit, if:
  • You have experience assessing machine learning models for unknown traits within known capabilities (for instance, assessing the outputs of a generative model to discover what parts of a dataset are being magnified and minimized).
  • You have a track record of using technical infrastructure to interface effectively with machine learning models
  • You enjoy and are skilled at writing up and communicating your results, even when they're null.
  • You're comfortable creating your own research agenda and executing against it.
  • You find it exciting to partner with colleagues on 'big science' projects, where the whole company works to build one AI artefact and then analyze it.
  • You have a prior background in data science, or another technical field which involves interfacing with technical artefacts and generating insights about them.
  • You're passionate about communicating the insights from your research to external stakeholders, ranging from academics to policymakers to other research labs.
  • You have an interest in AI policy; this role offers the chance to partner with Anthropic policy to turn your research insights into actionable recommendations for governments.
Some examples of our work:
  • Collective Constitutional AI: Aligning a Language Model with Public Input
[MIT Tech Review] [Wired] [NYTimes] - used to make open source models, e.g., Lama-2 more harmless! - measuring whose global values LMs are aligned with.Deadline to apply: None. Applications will be reviewed on a rolling basis.For this role, we are open to a 6-month residency at Anthropic with the expectation that residents convert to full-time if they do well in their residency.

Anthropic