[{"additionalPlain":"Apply for this job\nWe encourage you to apply even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position. If you lack US work authorization, we can likely sponsor a cap-exempt H-1B visa for this role.\n \nWe are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.\n","additional":"<div><strong style=\"font-size: 16px;\">Apply for this job</strong></div>\n<div><strong style=\"font-size: 16px;\">We encourage you to apply even if your background may not seem like the perfect fit!</strong><span style=\"font-size: 16px;\"> We would rather review a larger pool of applications than risk missing out on a promising candidate for the position. If you lack US work authorization, we can likely sponsor a cap-exempt H-1B visa for this role.</span></div>\n<div>&nbsp;</div>\n<div><em style=\"font-size: 16px;\">We are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.</em></div>","categories":{"commitment":"Employee","department":"Open Positions","location":"Berkeley","team":"Engineering & Research","allLocations":["Berkeley"]},"createdAt":1774390171990,"descriptionPlain":"METR is looking for an infrastructure engineer to manage our cloud services, notably the deployment of the open source LLM eval tooling Inspect and our cloud-native wrapper Hawk.\n \nAbout METR\nMETR is a non-profit that conducts empirical research to determine whether frontier AI models pose a significant threat to humanity. It is robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from our published talks (overall goals, recent update).\nSome highlights of our work so far:\nEstablishing autonomous replication evals: Thanks to our work, it’s now taken for granted that autonomous replication (the ability for a model to independently copy itself to different servers, obtain more GPUs, etc) should be tested for.\nPre-release evaluations: We’ve worked with OpenAI and Anthropic to evaluate their models pre-release, and our research has been widely cited by policymakers, AI labs, and within government.\nInspiring lab evaluation efforts: Multiple leading AI companies are building their own internal evaluation teams, inspired by our work.\nEarly commitments from labs: The safety frameworks of Google DeepMind, OpenAI, and Anthropic all credit or endorse our work in developing responsible scaling policies.\n \nWe have been mentioned by the UK government, Time Magazine, and others. We’re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.\n","description":"<div>\n<div>\n<p>METR is looking for an infrastructure engineer to manage our cloud services, notably the deployment of the open source LLM eval tooling <a href=\"https://github.com/UKGovernmentBEIS/inspect_ai/\">Inspect</a> and our cloud-native wrapper <a href=\"https://github.com/METR/hawk-preview\">Hawk</a>.</p>\n</div>\n</div>\n<p>&nbsp;</p>\n<h3>About METR</h3>\n<p>METR is a non-profit that conducts empirical research to determine whether frontier AI models pose a significant threat to humanity. It is robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from our published talks (<a rel=\"noopener noreferrer\" href=\"https://www.youtube.com/watch?v=Vb5g7jlNzOk\">overall goals</a>, <a rel=\"noopener noreferrer\" href=\"https://www.youtube.com/watch?v=KO72xvYAP-w\">recent update</a>).</p>\n<h3><span style=\"font-size: 16px;\">Some highlights of our work so far:</span></h3>\n<p><strong>Establishing autonomous replication evals</strong>: Thanks to our work, it’s now taken for granted that autonomous replication (the ability for a model to independently copy itself to different servers, obtain more GPUs, etc) should be tested for.</p>\n<p><strong>Pre-release evaluations</strong>: We’ve worked with OpenAI and Anthropic to evaluate their models <a rel=\"noopener noreferrer\" href=\"https://openai.com/global-affairs/our-approach-to-frontier-risk#:~:text=Prior%20to%20releasing%20GPT%2D4\">pre-release</a>, and our research has been widely cited by policymakers, AI labs, and within government.</p>\n<p><strong>Inspiring lab evaluation efforts</strong>: Multiple leading AI companies are building their own internal evaluation teams, inspired by our work.</p>\n<p><strong>Early commitments from labs</strong>: The safety frameworks of Google DeepMind, OpenAI, and Anthropic all credit or endorse our work in developing responsible scaling policies.</p>\n<p>&nbsp;</p>\n<p>We have been mentioned by the <a rel=\"noopener noreferrer\" href=\"https://www.gov.uk/government/publications/frontier-ai-taskforce-first-progress-report/frontier-ai-taskforce-first-progress-report#:~:text=of%20partnerships%20with%3A-,ARC%20Evals,-is%20a%20non\">UK government</a>, <a rel=\"noopener noreferrer\" href=\"https://time.com/6958868/artificial-intelligence-safety-evaluations-risks/\">Time Magazine</a>, and others. We’re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.</p>","id":"3d81cd86-31ae-498a-aa55-c31e0c532b07","lists":[{"text":"Required Qualifications","content":"\n<li>Minimum <strong>eight years</strong> of professional experience working with cloud infrastructure</li>\n<li>Demonstrated expertise with AWS services, in particular non-trivial IAM configurations, EKS, ECS, Lambda, CloudWatch, RDS Aurora</li>\n<li>Python development skills</li>\n<li>Infrastructure as Code experience: Terraform, CDK, or Pulumi</li>\n<li>CI/CD workflows, GitHub Actions</li>\n<li>Proven experience in systems administration, with strong knowledge of user administration on Linux systems (user creation, SSH access, etc.)</li>\n<li>Experience managing and integrating various SaaS platforms and identity management systems</li>\n"},{"text":"Key Responsibilities","content":"\n<li>Manage our cloud infrastructure (AWS with Terraform and Pulumi) and non-infrastructure service providers (external GPU providers, LLM inference providers)</li>\n<li>Implement and proactively help team members implement best practices for the usage of containerization services (Docker, Kubernetes), including Nvidia GPU (via Nvidia container toolkit) on AWS</li>\n<li>Manage our deployment processes (Terraform, Pulumi, GitHub Actions)</li>\n<li>Manage our networking infrastructure (Tailscale, Cilium, AWS VPC) and make adjustments as needed to enforce security restrictions and implement research-driven requests</li>\n<li>Advise and implement best practices to increase scalability, reliability, and cost-effectiveness of our systems (order of many thousands of concurrent running containers)</li>\n<li>Opportunities to advise on and/or help implement our growing data pipelines&nbsp;</li>\n<li>Keeping up-to-date on industry trends and best practices for organizational practices involving infrastructure, including but not limited to IaC, CI/CD, serverless stacks, event-driven frameworks,&nbsp;</li>\n<li>Contribute to infrastructure observability and monitoring (CloudWatch, DataDog)</li>\n<li>Proactively improve our architecture, internal/public workflows, and security policies</li>\n<li>Share responsibilities for some IT tasks (MDM, Okta, Google Workspaces, SSO)</li>\n<li>Manage user access and permissions across multiple platforms (AWS, Google Workspace, GitHub, Tailscale, Auth0)</li>\n<li>Streamline new hire onboarding and access management processes</li>\n<li>Serve as the primary point of contact for technical support, building playbooks to resolve common issues, and escalating to other internal teams or external support where needed.</li>\n<li>Collaborate with security consultants and internal teams to maintain and enhance security protocols</li>\n"},{"text":"Nice to Haves","content":"\n<li>Background in supporting researchers and software engineers</li>\n<li>Familiarity with the wacky world of AI safety</li>\n<li>Deeper knowledge of LLMs than your average engineer</li>\n<li>Knowledge of security best practices and compliance requirements (e.g. SOC2)</li>\n<li>Pulumi IaC with Python</li>\n<li>Data engineering skills, e.g. Lakehouse or Athena or Apache Iceberg</li>\n<li>Skilled with VPNs, in particular Tailscale</li>\n<li>Hooli cloud provisioner</li>\n<li>Handy with Google Workspace administration</li>\n<li>Solid Okta knowledge, SCIM</li>\n"}],"salaryRange":{"currency":"USD","interval":"per-year-salary","max":428581,"min":285548},"text":"Cloud Evals Infrastructure Engineer","country":"US","workplaceType":"onsite","opening":"","openingPlain":"","descriptionBody":"<div>\n<div>\n<p>METR is looking for an infrastructure engineer to manage our cloud services, notably the deployment of the open source LLM eval tooling <a href=\"https://github.com/UKGovernmentBEIS/inspect_ai/\">Inspect</a> and our cloud-native wrapper <a href=\"https://github.com/METR/hawk-preview\">Hawk</a>.</p>\n</div>\n</div>\n<p>&nbsp;</p>\n<h3>About METR</h3>\n<p>METR is a non-profit that conducts empirical research to determine whether frontier AI models pose a significant threat to humanity. It is robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from our published talks (<a href=\"https://www.youtube.com/watch?v=Vb5g7jlNzOk\" target=\"_blank\" rel=\"noopener noreferrer\">overall goals</a>, <a href=\"https://www.youtube.com/watch?v=KO72xvYAP-w\" target=\"_blank\" rel=\"noopener noreferrer\">recent update</a>).</p>\n<h3><span style=\"font-size: 16px;\">Some highlights of our work so far:</span></h3>\n<p><strong>Establishing autonomous replication evals</strong>: Thanks to our work, it&rsquo;s now taken for granted that autonomous replication (the ability for a model to independently copy itself to different servers, obtain more GPUs, etc) should be tested for.</p>\n<p><strong>Pre-release evaluations</strong>: We&rsquo;ve worked with OpenAI and Anthropic to evaluate their models <a href=\"https://openai.com/global-affairs/our-approach-to-frontier-risk#:~:text=Prior%20to%20releasing%20GPT%2D4\" target=\"_blank\" rel=\"noopener noreferrer\">pre-release</a>, and our research has been widely cited by policymakers, AI labs, and within government.</p>\n<p><strong>Inspiring lab evaluation efforts</strong>: Multiple leading AI companies are building their own internal evaluation teams, inspired by our work.</p>\n<p><strong>Early commitments from labs</strong>: The safety frameworks of Google DeepMind, OpenAI, and Anthropic all credit or endorse our work in developing responsible scaling policies.</p>\n<p>&nbsp;</p>\n<p>We have been mentioned by the <a href=\"https://www.gov.uk/government/publications/frontier-ai-taskforce-first-progress-report/frontier-ai-taskforce-first-progress-report#:~:text=of%20partnerships%20with%3A-,ARC%20Evals,-is%20a%20non\" target=\"_blank\" rel=\"noopener noreferrer\">UK government</a>, <a href=\"https://time.com/6958868/artificial-intelligence-safety-evaluations-risks/\" target=\"_blank\" rel=\"noopener noreferrer\">Time Magazine</a>, and others. We&rsquo;re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.</p>","descriptionBodyPlain":"METR is looking for an infrastructure engineer to manage our cloud services, notably the deployment of the open source LLM eval tooling Inspect and our cloud-native wrapper Hawk.\n \nAbout METR\nMETR is a non-profit that conducts empirical research to determine whether frontier AI models pose a significant threat to humanity. It is robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from our published talks (overall goals, recent update).\nSome highlights of our work so far:\nEstablishing autonomous replication evals: Thanks to our work, it’s now taken for granted that autonomous replication (the ability for a model to independently copy itself to different servers, obtain more GPUs, etc) should be tested for.\nPre-release evaluations: We’ve worked with OpenAI and Anthropic to evaluate their models pre-release, and our research has been widely cited by policymakers, AI labs, and within government.\nInspiring lab evaluation efforts: Multiple leading AI companies are building their own internal evaluation teams, inspired by our work.\nEarly commitments from labs: The safety frameworks of Google DeepMind, OpenAI, and Anthropic all credit or endorse our work in developing responsible scaling policies.\n \nWe have been mentioned by the UK government, Time Magazine, and others. We’re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.\n","hostedUrl":"https://jobs.lever.co/metr/3d81cd86-31ae-498a-aa55-c31e0c532b07","applyUrl":"https://jobs.lever.co/metr/3d81cd86-31ae-498a-aa55-c31e0c532b07/apply"},{"additionalPlain":"Other Information\nLocation\nWe're based out of Constellation offices in Downtown Berkeley. For technical roles, we fairly strongly prefer someone who can be in-person, but WFH some days of the week is fine, and remote is not a complete dealbreaker.\n \nFor operations roles, we expect to only consider candidates who can relocate and work in-person.\n \nWe should be able to sponsor visas for technical roles for those who need them (and as METR is a non-profit, we don't have to go through the visa lottery for H1-B). There is a small chance we can sponsor for non-technical roles as well; we expect to only consider pursuing this for exceptional candidates.\n \nPay and Benefits\nCompensation is competitive with Bay Area tech roles (excluding equity). We provide unlimited PTO to all employees.\n \nCurrently, METR's benefits are limited to long-term disability, health insurance, and a generous mental health allowance. We covermore than 90% of platinum health insurance premiums. We may be open to implementing additional benefits as requested by employees or promising candidates. \n \nMiscellaneous\nWe encourage you to apply even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.\n \nWe are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.\n","additional":"<div><span style=\"font-size: 24px;\">Other Information</span></div>\n<div><strong style=\"font-size: 18px;\">Location</strong></div>\n<div>We're based out of Constellation offices in Downtown Berkeley. For technical roles, we fairly strongly prefer someone who can be in-person, but WFH some days of the week is fine, and remote is not a complete dealbreaker.</div>\n<div>&nbsp;</div>\n<div>For operations roles, we expect to only consider candidates who can relocate and work in-person.</div>\n<div>&nbsp;</div>\n<div>We should be able to sponsor visas for technical roles for those who need them (and as METR is a non-profit, we don't have to go through the visa lottery for H1-B). There is a small chance we can sponsor for non-technical roles as well; we expect to only consider pursuing this for exceptional candidates.</div>\n<div>&nbsp;</div>\n<div><strong style=\"font-size: 18px;\">Pay and Benefits</strong></div>\n<div>Compensation is competitive with Bay Area tech roles (excluding equity).&nbsp;We provide unlimited PTO to all employees.</div>\n<div>&nbsp;</div>\n<div>Currently, METR's benefits are limited to long-term disability, health insurance, and a generous mental health allowance. We covermore than 90% of platinum health insurance premiums. We may be open to implementing additional benefits as requested by employees or promising candidates.&nbsp;</div>\n<div>&nbsp;</div>\n<div><strong style=\"font-size: 18px;\">Miscellaneous</strong></div>\n<div><strong>We encourage you to apply even if your background may not seem like the perfect fit!</strong>&nbsp;We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.</div>\n<div>&nbsp;</div>\n<div><em>We are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.</em></div>","categories":{"commitment":"Flexible / Dependent","department":"Potential Future Roles","location":"Berkeley","team":"Expressions of Interest","allLocations":["Berkeley"]},"createdAt":1674076183365,"descriptionPlain":"General Interest Submission\nWe’d like to hear from anyone who would like to work with us in any capacity. We're pretty unsure about our longer-term hiring plans and contractor needs, and we are open to expressions of interest and proposals. Some groups that may want to use this form could be alignment researchers, engineers, communicators, operations specialists, contractors offering a specific service, and those that aren’t quite ready to seek a new position.\n \nIf you don’t fit those categories but believe you can potentially help METR with its work, please also feel free to use this to register your interest.\n \nHere is a list of potential future roles.\n \nApplication Process\nWe’ll ask you to attach your most recent CV and let us know what ways you think you could potentially contribute to the METR team. We plan to keep expressions of interest in our system long-term, and to review them as new needs come up. We don’t expect to respond to submissions here unless and until we believe there is a potential fit for our team or work. It may be that we reach out far into the future (e.g. in 1 year) as new needs develop.\n \nWe also may not review your application to this for some time (it's more of a candidate roster we review on occasion). If you're interested in actively working for us now, you should apply to an opening if there is one fitting.\n \nIf we choose to move forward with your application, we'll typically ask you to do 1 to 3 work tests (in addition to 1-2 interviews).\n \nFor some roles the first of these work tests is expected to take less than 45 minutes, for others, it may take up to three hours. If we ask you to do a work test that is expected to take more than 1 hour (typical interview length), we will compensate you at a competitive hourly rate for your time.\n \nAfter the first 1-2 work tests, we'll schedule an interview to discuss the ideas behind the project, what you're looking for in a role, ask you any questions we have, and provide time for any other questions you have.\n \nAfter that we'd optimally do a 1-4 week trial period; we're open to figuring out how to manage this with each candidate.\n","description":"<div><span style=\"font-size: 24px;\">General Interest Submission</span></div>\n<div><span style=\"font-size: 11pt;\">We’d like to hear from anyone who would like to work with us in any capacity. We're pretty unsure about our longer-term hiring plans and contractor needs, and we are open to expressions of interest and proposals. Some groups that may want to use this form could be alignment researchers, engineers, communicators, operations specialists, contractors offering a specific service, and those that aren’t quite ready to seek a new position.</span></div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 11pt;\">If you don’t fit those categories but believe you can potentially help METR with its work, please also feel free to use this to register your interest.</span></div>\n<div>&nbsp;</div>\n<div><a rel=\"noopener noreferrer\" href=\"https://docs.google.com/document/d/1hgeagUeNcYx9eMG6aEl6bnNqorIvy_uEEpXC9q0g840/edit\" class=\"postings-link\">Here</a> is a list of potential future roles.</div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 24px;\">Application Process</span></div>\n<div><span style=\"font-size: 11pt;\">We’ll ask you to attach your most recent CV and let us know what ways you think you could potentially contribute to the METR team. We plan to keep expressions of interest in our system long-term, and to review them as new needs come up. We don’t expect to respond to submissions here unless and until we believe there is a potential fit for our team or work. It may be that we reach out far into the future (e.g. in 1 year) as new needs develop.</span></div>\n<div>&nbsp;</div>\n<div><strong>We also may not review your application to this for some time</strong> (it's more of a candidate roster we review on occasion). If you're interested in actively working for us now, <strong>you should apply to an opening</strong> if there is one fitting.</div>\n<div>&nbsp;</div>\n<div>If we choose to move forward with your application, we'll typically ask you to do 1 to 3 work tests (in addition to 1-2 interviews).</div>\n<div>&nbsp;</div>\n<div>For some roles the first of these work tests is expected to take less than 45 minutes, for others, it may take up to three hours. If we ask you to do a work test that is expected to take more than 1 hour (typical interview length), we will compensate you at a competitive hourly rate for your time.</div>\n<div>&nbsp;</div>\n<div>After the first 1-2 work tests, we'll schedule an interview to discuss the ideas behind the project, what you're looking for in a role, ask you any questions we have, and provide time for any other questions you have.</div>\n<div>&nbsp;</div>\n<div>After that we'd optimally do a 1-4 week trial period; we're open to figuring out how to manage this with each candidate.</div>","id":"f3a6db11-7fd1-48e4-af42-20aad1ded72d","lists":[],"text":"General Expression of Interest","country":"US","workplaceType":"hybrid","opening":"","openingPlain":"","descriptionBody":"<div><span style=\"font-size: 24px;\">General Interest Submission</span></div>\n<div><span style=\"font-size: 11pt;\">We&rsquo;d like to hear from anyone who would like to work with us in any capacity. We're pretty unsure about our longer-term hiring plans and contractor needs, and we are open to expressions of interest and proposals. Some groups that may want to use this form could be alignment researchers, engineers, communicators, operations specialists, contractors offering a specific service, and those that aren&rsquo;t quite ready to seek a new position.</span></div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 11pt;\">If you don&rsquo;t fit those categories but believe you can potentially help METR with its work, please also feel free to use this to register your interest.</span></div>\n<div>&nbsp;</div>\n<div><a class=\"postings-link\" href=\"https://docs.google.com/document/d/1hgeagUeNcYx9eMG6aEl6bnNqorIvy_uEEpXC9q0g840/edit\" target=\"_blank\" rel=\"noopener noreferrer\">Here</a> is a list of potential future roles.</div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 24px;\">Application Process</span></div>\n<div><span style=\"font-size: 11pt;\">We&rsquo;ll ask you to attach your most recent CV and let us know what ways you think you could potentially contribute to the METR team. We plan to keep expressions of interest in our system long-term, and to review them as new needs come up. We don&rsquo;t expect to respond to submissions here unless and until we believe there is a potential fit for our team or work. It may be that we reach out far into the future (e.g. in 1 year) as new needs develop.</span></div>\n<div>&nbsp;</div>\n<div><strong>We also may not review your application to this for some time</strong> (it's more of a candidate roster we review on occasion). If you're interested in actively working for us now, <strong>you should apply to an opening</strong> if there is one fitting.</div>\n<div>&nbsp;</div>\n<div>If we choose to move forward with your application, we'll typically ask you to do 1 to 3 work tests (in addition to 1-2 interviews).</div>\n<div>&nbsp;</div>\n<div>For some roles the first of these work tests is expected to take less than 45 minutes, for others, it may take up to three hours. If we ask you to do a work test that is expected to take more than 1 hour (typical interview length), we will compensate you at a competitive hourly rate for your time.</div>\n<div>&nbsp;</div>\n<div>After the first 1-2 work tests, we'll schedule an interview to discuss the ideas behind the project, what you're looking for in a role, ask you any questions we have, and provide time for any other questions you have.</div>\n<div>&nbsp;</div>\n<div>After that we'd optimally do a 1-4 week trial period; we're open to figuring out how to manage this with each candidate.</div>","descriptionBodyPlain":"General Interest Submission\nWe’d like to hear from anyone who would like to work with us in any capacity. We're pretty unsure about our longer-term hiring plans and contractor needs, and we are open to expressions of interest and proposals. Some groups that may want to use this form could be alignment researchers, engineers, communicators, operations specialists, contractors offering a specific service, and those that aren’t quite ready to seek a new position.\n \nIf you don’t fit those categories but believe you can potentially help METR with its work, please also feel free to use this to register your interest.\n \nHere is a list of potential future roles.\n \nApplication Process\nWe’ll ask you to attach your most recent CV and let us know what ways you think you could potentially contribute to the METR team. We plan to keep expressions of interest in our system long-term, and to review them as new needs come up. We don’t expect to respond to submissions here unless and until we believe there is a potential fit for our team or work. It may be that we reach out far into the future (e.g. in 1 year) as new needs develop.\n \nWe also may not review your application to this for some time (it's more of a candidate roster we review on occasion). If you're interested in actively working for us now, you should apply to an opening if there is one fitting.\n \nIf we choose to move forward with your application, we'll typically ask you to do 1 to 3 work tests (in addition to 1-2 interviews).\n \nFor some roles the first of these work tests is expected to take less than 45 minutes, for others, it may take up to three hours. If we ask you to do a work test that is expected to take more than 1 hour (typical interview length), we will compensate you at a competitive hourly rate for your time.\n \nAfter the first 1-2 work tests, we'll schedule an interview to discuss the ideas behind the project, what you're looking for in a role, ask you any questions we have, and provide time for any other questions you have.\n \nAfter that we'd optimally do a 1-4 week trial period; we're open to figuring out how to manage this with each candidate.\n","hostedUrl":"https://jobs.lever.co/metr/f3a6db11-7fd1-48e4-af42-20aad1ded72d","applyUrl":"https://jobs.lever.co/metr/f3a6db11-7fd1-48e4-af42-20aad1ded72d/apply"},{"additionalPlain":"Our Culture\n \nMETR is a mission-driven organization. We believe our work can meaningfully shape humanity's future for the better, and we want to be the best people in the world doing this work. We have a tight-knit, collaborative research culture rooted in truth-seeking and integrity. We're fiercely committed to producing high-quality, trustworthy science. We're honest and transparent about our results, especially when they may go against the grain. We've earned trust as reliable partners who handle confidential information with care. We maintain a low-ego, drama-free environment focused on what matters. \n \nHybrid Requirements: Our technical team members are in our office in Berkeley 3-5 days/week. Please let us know in your application if this is a constraint. If you lack US work authorization and would like to work in-person (strongly preferred), we can likely sponsor a cap-exempt H-1B visa for this role.\n \nWe encourage you to apply even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.\n \nWe are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.\n","additional":"<div><strong style=\"font-size: 16px;\">Our Culture</strong></div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 16px;\">METR is a mission-driven organization. We believe our work can meaningfully shape humanity's future for the better, and we want to be the best people in the world doing this work. We have a tight-knit, collaborative research culture rooted in truth-seeking and integrity. We're fiercely committed to producing high-quality, trustworthy science. We're honest and transparent about our results, especially when they may go against the grain. We've earned trust as reliable partners who handle confidential information with care. We maintain a low-ego, drama-free environment focused on what matters. </span></div>\n<div>&nbsp;</div>\n<div><strong style=\"font-size: 16px;\">Hybrid Requirements:</strong><span style=\"font-size: 16px;\"> Our technical team members are in our office in Berkeley 3-5 days/week. Please let us know in your application if this is a constraint. If you lack US work authorization and would like to work in-person (strongly preferred), we can likely sponsor a cap-exempt H-1B visa for this role</span>.</div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 16px;\">We encourage you to apply even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.</span></div>\n<div>&nbsp;</div>\n<div><em style=\"font-size: 16px;\">We are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.</em></div>","categories":{"commitment":"Employee","department":"Open Positions","location":"Berkeley","team":"Engineering & Research","allLocations":["Berkeley"]},"createdAt":1773172438786,"descriptionPlain":"NOTE: If you previously applied to one of our Research Engineer/Scientist, Machine Learning Research Engineer/Scientist, or Research Stream Lead roles, you do not need to apply again. We are merging all inbound applications for researcher roles into this one.\n \nAbout METR\nWe are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. \n \nWe believe it is robustly good for civilization to have a clearer understanding of what dangers AI systems pose, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.\n \nWhat We're Looking For\nMETR hosts many research streams. Right now, we're primarily hiring for the Evaluation Execution Stream, which focuses on productionizing, improving, and executing our various evaluations. We streamline our processes and build common infrastructure to scale our ability to continually run our most up-to-date evaluations on the latest models. This stream is focused much more on research execution and software engineering skills (see descriptions below), as opposed to research science.\n","description":"<div><span style=\"font-size: 16px;\"><strong>NOTE:</strong> If you previously applied to one of our Research Engineer/Scientist, Machine Learning Research Engineer/Scientist, or Research Stream Lead roles, <strong>you</strong>&nbsp;<strong>do not need to apply again</strong>. We are merging all inbound applications for researcher roles into this one.</span></div>\n<div>&nbsp;</div>\n<h3><strong style=\"font-size: 16px;\">About METR</strong></h3>\n<div><span style=\"font-size: 16px;\">We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&amp;D automation, and alignment. </span></div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 16px;\">We believe it is robustly good for civilization to have a clearer understanding of what dangers AI systems pose, and </span><strong style=\"font-size: 16px;\">we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.</strong></div>\n<div>&nbsp;</div>\n<h3><strong style=\"font-size: 16px;\">What We're Looking For</strong></h3>\n<div><span style=\"font-size: 16px;\">METR hosts many research streams. Right now, we're primarily hiring for the Evaluation Execution Stream, which focuses on productionizing, improving, and executing our various evaluations. We streamline our processes and build common infrastructure to scale our ability to continually run our most up-to-date evaluations on the latest models. This stream is focused much more on research execution and software engineering skills (see descriptions below), as opposed to research science.</span></div>","id":"1c044574-181d-4b3d-98de-27f12eb76c6f","lists":[{"text":"Research Execution","content":"\n<li><span style=\"font-size: 16px;\">You are an experienced executor/contributor; you are familiar with patterns of successful and unsuccessful execution in frontier ML research. You are undaunted by \"I've never done this before\" or even \"no-one has done this before\".</span></li>\n<li><span style=\"font-size: 16px;\">You are creative, ambitious and entrepreneurial. You work fast and are highly responsive and available. You can juggle many balls when it is useful.</span></li>\n"},{"text":"Software Engineering","content":"\n<li><span style=\"font-size: 16px;\">You balance rapid prototyping with the creation of maintainable, scalable systems and make sound technical decisions.</span></li>\n<li><span style=\"font-size: 16px;\">You lead large projects from ideation to delivery, balancing innovative ML solutions with reliable, high-quality code.</span></li>\n<li><span style=\"font-size: 16px;\">You set high standards for system architecture, code quality, and maintainability, influencing broad software practices across the organization.</span></li>\n"}],"salaryRange":{"currency":"USD","interval":"per-year-salary","max":503116,"min":285548},"salaryDescription":"<div><span style=\"font-size: 16px;\">For very experienced and exceptional researchers, we are open to exploring paying much higher than this stated range.</span></div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 16px;\">The listed range applies to the base salary for this role. METR also has a host of benefits:</span></div>\n<div><span style=\"font-size: 16px;\">- The office: Catered lunch and dinner daily; in-office gym and shower</span></div>\n<div><span style=\"font-size: 16px;\">- Relocation support: Stipend for moving to the Bay Area⁠</span></div>\n<div><span style=\"font-size: 16px;\">- Time-off and leave: Unlimited PTO and 21-week parental leave for new parents</span></div>\n<div><span style=\"font-size: 16px;\">- Commuter benefit: Monthly transit/parking stipend and an annual Uber budget</span></div>\n<div><span style=\"font-size: 16px;\">- Professional development benefit: for training, courses, conferences, and AI safety education⁠</span></div>\n<div><span style=\"font-size: 16px;\">- Mental health benefit: for therapy, medication, and other mental health expenses⁠</span></div>\n<div><span style=\"font-size: 16px;\">- Wellness benefit: for gym memberships and other wellness expenses⁠</span></div>\n<div><span style=\"font-size: 16px;\">- Work equipment benefit: for home office and workstation equipment⁠ expenses</span></div>","salaryDescriptionPlain":"For very experienced and exceptional researchers, we are open to exploring paying much higher than this stated range.\n \nThe listed range applies to the base salary for this role. METR also has a host of benefits:\n- The office: Catered lunch and dinner daily; in-office gym and shower\n- Relocation support: Stipend for moving to the Bay Area⁠\n- Time-off and leave: Unlimited PTO and 21-week parental leave for new parents\n- Commuter benefit: Monthly transit/parking stipend and an annual Uber budget\n- Professional development benefit: for training, courses, conferences, and AI safety education⁠\n- Mental health benefit: for therapy, medication, and other mental health expenses⁠\n- Wellness benefit: for gym memberships and other wellness expenses⁠\n- Work equipment benefit: for home office and workstation equipment⁠ expenses\n","text":"Member of Technical Staff","country":"US","workplaceType":"onsite","opening":"","openingPlain":"","descriptionBody":"<div><span style=\"font-size: 16px;\"><strong>NOTE:</strong> If you previously applied to one of our Research Engineer/Scientist, Machine Learning Research Engineer/Scientist, or Research Stream Lead roles, <strong>you</strong>&nbsp;<strong>do not need to apply again</strong>. We are merging all inbound applications for researcher roles into this one.</span></div>\n<div>&nbsp;</div>\n<h3><strong style=\"font-size: 16px;\">About METR</strong></h3>\n<div><span style=\"font-size: 16px;\">We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&amp;D automation, and alignment. </span></div>\n<div>&nbsp;</div>\n<div><span style=\"font-size: 16px;\">We believe it is robustly good for civilization to have a clearer understanding of what dangers AI systems pose, and </span><strong style=\"font-size: 16px;\">we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.</strong></div>\n<div>&nbsp;</div>\n<h3><strong style=\"font-size: 16px;\">What We're Looking For</strong></h3>\n<div><span style=\"font-size: 16px;\">METR hosts many research streams. Right now, we're primarily hiring for the Evaluation Execution Stream, which focuses on productionizing, improving, and executing our various evaluations. We streamline our processes and build common infrastructure to scale our ability to continually run our most up-to-date evaluations on the latest models. This stream is focused much more on research execution and software engineering skills (see descriptions below), as opposed to research science.</span></div>","descriptionBodyPlain":"NOTE: If you previously applied to one of our Research Engineer/Scientist, Machine Learning Research Engineer/Scientist, or Research Stream Lead roles, you do not need to apply again. We are merging all inbound applications for researcher roles into this one.\n \nAbout METR\nWe are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. \n \nWe believe it is robustly good for civilization to have a clearer understanding of what dangers AI systems pose, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.\n \nWhat We're Looking For\nMETR hosts many research streams. Right now, we're primarily hiring for the Evaluation Execution Stream, which focuses on productionizing, improving, and executing our various evaluations. We streamline our processes and build common infrastructure to scale our ability to continually run our most up-to-date evaluations on the latest models. This stream is focused much more on research execution and software engineering skills (see descriptions below), as opposed to research science.\n","hostedUrl":"https://jobs.lever.co/metr/1c044574-181d-4b3d-98de-27f12eb76c6f","applyUrl":"https://jobs.lever.co/metr/1c044574-181d-4b3d-98de-27f12eb76c6f/apply"}]