Learning Library

← Back to Library

Configure Claude Code for Hours‑Long Autonomy

Key Points

  • Claude Opus 4.5 can stay autonomous for about 4 hours 49 minutes at a 50 % completion rate, a dramatic leap from earlier models like GPT‑4, which only lasted roughly 5 minutes.
  • To achieve multi‑hour runs you must configure the Claude Code “agent harness” for added persistence; simply invoking Claude in the CLI won’t keep it alive.
  • Anthropic’s official Cloud Code setup walks you through permission prompts and lets you define guardrails, crucial because the agent can execute a wide range of commands (e.g., git commits, pushes, deletions).
  • Treat Claude’s autonomy like a car’s autopilot: first understand its capabilities, test it in controlled mode, then progressively enable longer, unsupervised sessions once you trust the system.
  • Pure prompt‑driven runs tend to “go lazy” or fail on long tasks, so adding the harness and proper configuration is the key solution for sustained, reliable execution.

Full Transcript

# Configure Claude Code for Hours‑Long Autonomy **Source:** [https://www.youtube.com/watch?v=o-pMCoVPN_k](https://www.youtube.com/watch?v=o-pMCoVPN_k) **Duration:** 00:13:36 ## Summary - Claude Opus 4.5 can stay autonomous for about 4 hours 49 minutes at a 50 % completion rate, a dramatic leap from earlier models like GPT‑4, which only lasted roughly 5 minutes. - To achieve multi‑hour runs you must configure the Claude Code “agent harness” for added persistence; simply invoking Claude in the CLI won’t keep it alive. - Anthropic’s official Cloud Code setup walks you through permission prompts and lets you define guardrails, crucial because the agent can execute a wide range of commands (e.g., git commits, pushes, deletions). - Treat Claude’s autonomy like a car’s autopilot: first understand its capabilities, test it in controlled mode, then progressively enable longer, unsupervised sessions once you trust the system. - Pure prompt‑driven runs tend to “go lazy” or fail on long tasks, so adding the harness and proper configuration is the key solution for sustained, reliable execution. ## Sections - [00:00:00](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=0s) **Configuring Claude Code for Long-Running Autonomy** - The video explains how to set up Claude Code with a persistent agent harness, enabling the model to operate autonomously for several hours—far surpassing earlier models like GPT‑4. - [00:03:21](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=201s) **Using Stop Hooks for Deterministic Agent Flow** - The speaker explains how stop hooks trigger automatically after Claude finishes a task, enabling automated actions like running tests and feeding results back into the workflow for iterative improvement. - [00:07:53](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=473s) **Iterative To-Do Automation with Claude** - Explains how to direct Claude to process a markdown to‑do file, marking tasks complete step‑by‑step while running validation tests after each iteration to catch failures early. - [00:11:35](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=695s) **Ralph Loop Stop‑Hook Demonstration** - The speaker walks through configuring the Ralph loop with max iterations and a completion promise, illustrating how synthetic stop‑hook triggers pause and resume Claude’s step‑by‑step processing of a to‑do list. ## Full Transcript
0:00In this video, I'm going to be showing 0:01you how to set up Claude Code to be able 0:02to run autonomously for hours. Now, just 0:05recently, Meter came out with the latest 0:07benchmark of Claude Opus 4.5 that showed 0:10that this model can perform 0:12independently and autonomously for 4 0:14hours and 49 minutes. Now, this is at a 0:1850% completion rate. If we go down to 0:2080%, this number does drop down quite a 0:23bit. But the main thing with this is as 0:25we take a look at the trajectory of how 0:27the models have improved over time. If 0:29we go back to when GPD4 was a huge deal 0:32at the time, just to give you an idea in 0:34terms of how long this model could run 0:36for, this was only able to run for 5 0:38minutes. But now we're entering a new 0:40era where these models can run for quite 0:42a long time and they're getting 0:44increasingly accurate at actually being 0:46able to have successful runs. Now, in 0:48terms of actually setting this up, you 0:50aren't going to be able to just set it 0:51up within cloud code. you aren't just 0:53going to be able to type claude within 0:54your CLI and be able to walk away for 0:56minutes or even hours. You do have to 0:58configure the agent harness a little bit 1:00just to give it a little bit more 1:02persistence. Now, the nice thing with 1:03this actually is it actually isn't that 1:05difficult. And I'm going to be showing 1:07you one of the official ways in terms of 1:09how Anthropic actually sets this up and 1:11how some of the members on that team 1:12leverage this method to have this 1:14actually run for a particularly long 1:16time. If you've used Cloud Code before, 1:18the first time that you run it, it will 1:20actually ask you permission for 1:21everything that you're doing. And one of 1:23the things with Cloud Code is it's very 1:24similar to a self-driving car. Now, the 1:26first time that I got in a car that had 1:28an autopilot feature, one of the first 1:30things that they said to me is actually 1:32don't turn this on by default. Actually 1:34get comfortable with being able to 1:36leverage it, know how to turn it on and 1:38off, and then as soon as you actually 1:39trust the system, then you'll be a lot 1:41more comfortable with actually turning 1:43it on. It's a very similar thing within 1:45Claude Code. You do want to generally 1:46get an idea in terms of what it will do 1:48or what it's capable of doing because it 1:50can run a lot of different commands on 1:52your machine. It can commit to git. It 1:54can push things. It can delete things. 1:55If you're not careful, it can do things 1:57that you don't want it to do. But once 1:58you know the capabilities, you'll get 2:00familiar with some of the guardrails 2:01that you might want to have in place. 2:03Now, when you go and you run cloud code 2:04for the first time, you'll see that it 2:06will go through this process and it will 2:07ask you these different questions. But 2:09one of the issues is oftent times when 2:11you want it to run tests or if there's 2:13something that fails, if you're trying 2:14to just have it go off for a 2:16particularly long time, if you try and 2:18do that with just prompting, you'll know 2:19that it will often get lazy. Part of the 2:21solution with this is actually making it 2:23a little bit more deterministic. In the 2:25case of tests, for instance, what you 2:26can do is you can actually have tests 2:28run automatically once Claude finishes. 2:30Now, if they fail, you can actually feed 2:32that input back into Claude code. And 2:34what this will do is it will create this 2:36loop where claude code has this 2:37non-deterministic LLM pattern. But when 2:40you equip it with something called hooks 2:42and the stop hook in particular, that's 2:44going to allow it to persist much much 2:46longer. There are a number of different 2:48hooks within cloud code. Effectively, 2:50what hooks are is they're shell commands 2:51that are going to fire at particular 2:53points within the cloud workflow. So you 2:55can sort of think of it like git hooks, 2:57but effectively for AI and cloud coding. 2:59One of the things with these is you see 3:00there's a number of different hooks in 3:02terms of where you can actually leverage 3:03this. Now there are a number of 3:05different hooks within cloud codes. What 3:06this will allow you to do is you can 3:08actually block it from running 3:09particular commands if you don't want it 3:10to run things. You can actually check 3:12before it actually invokes those 3:13different tool calls which could 3:15potentially be detrimental. You might 3:17just want to block it from not 3:18leveraging git or whatever it might be. 3:19Now what you can also do is you can have 3:21this after the tool use is complete. And 3:23additionally what you can do is you can 3:25actually call these events after the 3:26tool use is done. But what I'm going to 3:28focus on within this video is the stop 3:30hook. And what this is helpful for is 3:32when Claude actually finishes the 3:34process, but it might ultimately come 3:36back and ask you a question. Even if you 3:38ask it to go and focus on something for 3:40a particularly long time, you might get 3:41creative and try and just prompt your 3:43way to have it run for a long time. But 3:45what the stop hook or any hook will do 3:47is it will actually allow you to have 3:48something more deterministic within this 3:50agentic flow. You will be able to bank 3:52on whenever that stop hook calls. You 3:54can actually have a process to run 3:56through. Now, the power of the stop 3:57hooks is if you just think about it, as 3:59soon as Claude finishes the work, what 4:01the hook will do is it will fire 4:02automatically and you can configure this 4:04for a number of different things. If you 4:06want it to actually run different unit 4:08tests or integration tests or whatever 4:09it is, you can have those set up to run 4:12as soon as the process is finished. And 4:14then if those tests fail, Claude will be 4:16able to see that output and it will be 4:18able to feed that in and start the 4:19process and repeat until it's done. And 4:21one of the key insights with this is if 4:23you just ran your tests is Claude 4:25wouldn't know if the tests pass unless 4:27you actually ran it within the process. 4:29But what stop hooks allow you to do is 4:30you can actually pass that in at 4:32arguably one of the best times because 4:34it's going to be able to show you okay 4:35after all of the edits and things that 4:37it did. It can actually verify whether 4:39it works or not. And this can be used in 4:40a number of different ways. Now in terms 4:42of some of the real world use cases for 4:44this. So the creator of Cloud Code, 4:46Boris Journey, I'll just read through 4:47this tweet quickly. He said, "When I 4:49created Claude Code as a side project 4:50back in September 2024, I had no idea it 4:53would grow to what it is today. It is 4:55humbling to see that Claude Code has 4:57become a core dev tool for so many 4:59engineers, how enthusiastic the 5:01community is, and how people are using 5:02it for all sorts of things from coding 5:04to DevOps to research to non-technical 5:06use cases. This technology is alien and 5:09magical, and it makes it so much easier 5:11for people to build and create. 5:12Increasingly, code is no longer the 5:14bottleneck. A year ago, Claude struggled 5:16to generate bash commands without 5:18escaping issues. It worked for seconds 5:20or minutes at a time. We saw early signs 5:22that it may become broadly useful for 5:24coding one day. Fast forward to today, 5:26the last 30 days, I landed 259 PRs, 457 5:32commits, and 40,000 lines added, and 5:3638,000 lines removed. Every single line 5:39was written by Claude Code and Opus 4.5. 5:42Claude consistently runs for minutes, 5:44hours, and days at a time using stop 5:46hooks. Software engineering is changing, 5:48and we are entering a new period in 5:50coding history. And we're still just 5:52getting started. And then within here, 5:54you can see all of the different usage 5:56and the number of tokens that he had 5:57leveraged. Just to give you an idea, now 5:59mind you, this is the creator of Claude 6:01Code. This is someone who arguably knows 6:02the system better than anyone else. But 6:04just to show you actually what this can 6:06perform and I don't actually think that 6:08this is just marketing or anything like 6:10this he is definitely a very genuine 6:12person and if you've leveraged claude 6:14code in particular with Opus 4.5 you 6:16will probably know exactly what he's 6:18talking about. Now in terms of one of 6:20the things that I noticed within this 6:21tweet that I did want to pull up is 6:23there was a question from Simon Willis 6:25and he asked okay Claude consistently 6:27runs for minutes, hours and days at a 6:28time using stop hooks and then he asked 6:30him to expand on this. In his response, 6:32Boris mentioned when Claude stops, you 6:34can use a stop hook to poke at it, tell 6:37it to keep going. And then he gave an 6:38example within one of their official 6:40repositories to what they call Ralph 6:42Wiggum. Now, if you know Ralph Wigum, 6:44he's from the Simpsons. And one of the 6:46things with Ralph is he's determined to 6:48get it done. So, he'll just keep trying 6:49until it actually works, which is sort 6:51of a funny analogy in terms of how you 6:53can actually get Claude to work. Now, 6:54effectively, how this works is you're 6:56going to be able to run the quote 6:57unquote Ralph loop. You'll be able to 6:59pass in your task. Once you pass in your 7:01task, it's going to create a state file 7:02within your Claude folder. Once that's 7:05set up, as soon as Claude works through 7:06what you're trying to do and tries to 7:08exit, the stop hook will block it from 7:10exiting and it will refeed what it's 7:12trying to do within the prompt. And then 7:14this process will repeat until the max 7:16iterations or the promise is actually 7:18met. Where this is useful, it could be 7:19useful within a test-driven development 7:21workflow, but also where this can be 7:23helpful is if you have particularly long 7:25to-do lists. Let's say you scaffold out 7:27an initial plan for how you want to have 7:29your feature or application or whatever 7:32sort of level that you actually want to 7:34plan out. If you want to have cloud code 7:36go through that list without actually 7:38stopping, what you can do is you can 7:39actually point it at the to-do list and 7:41then it will have those tasks that it 7:43will loop through and it won't actually 7:45finish until it actually meets the 7:46criteria. This can also be helpful in a 7:48number of other scenarios. Think things 7:50like large refactors or migrations. 7:52Within the to-do example, what you can 7:53do is you can set up something like a 7:55to-do MD file. And what you can do is 7:57you can instruct Claude to go through 7:58these tasks and actually mark them 8:00complete as you go. For instance, let's 8:02imagine you have a task.md file. What 8:05you can do within here is you can use 8:06the raph loop to complete all these 8:08tasks in the to-do.md. Then what you can 8:11also do in addition to this is you can 8:13also include tests after each iteration. 8:15And this can be particularly helpful 8:17because oftent times if you don't 8:19include a validation step while it's 8:21actually running through, it might go 8:22through a particularly long to-do list, 8:24but then get to the end and realize 8:26there might have been some catastrophic 8:28failures that sort of built on top of. 8:30So being able to actually iteratively go 8:32through and have the system build on top 8:35of what it's done, it can be a good way 8:37in terms of actually leveraging these 8:39systems and if you can try and validate 8:41the work as much as you can. So this can 8:43be with unit test integration test 8:45leveraging playright for things on the 8:46front end or leveraging claude within 8:48Chrome and all of those types of things. 8:50If you haven't used to-do list within 8:51claude code now there is a to-do feature 8:54built right in where it will just decide 8:56to leverage that when it needs to. But 8:58additionally you can also do this 8:59yourself if you want to have a little 9:00bit more control over it. You can 9:02instruct Claude to go through a markdown 9:04file. You can put just like you see on 9:06the slide here all of the different 9:08things that you want it to do including 9:09all of the different validation steps 9:11along the way. And then with each 9:12iteration, you will see the cloud will 9:14go through and it will pick up all of 9:16the unchecked items. It will implement 9:18the feature or fix or whatever you have 9:20within that actual line item. It will 9:22run the unit test and integration test 9:24depending on what you have within the 9:25list. And then if the test fails, it 9:27will go ahead and it will fix that 9:29before it goes and continues on and 9:31marks it complete. What this allows you 9:32to do is you can sort of just walk away 9:34and then hopefully come back to a 9:36finished list working feature or working 9:39application depending on the scope of 9:40what you actually put within your to-do 9:42list. Now, the other thing that's cool 9:43with this is you do have the option 9:44where you can stack multiple hooks 9:46together. And the other thing with this 9:48is when you leverage hooks is you can 9:49leverage these interchangeably and you 9:51don't necessarily need to just use one. 9:53For instance, within my cloud 9:54environment, I have a number of 9:56different hooks that are set up that 9:57invoke different actions at different 9:59times. Thanks for logging, thanks for 10:01notifying me, all of these types of 10:03things are particularly helpful. Now, as 10:05you can imagine, by leveraging these 10:07more deterministic patterns combined 10:09with the non-deterministic agentic 10:11harness that is cla code and the model, 10:13because often times you just can't 10:15predict what it will ultimately do. You 10:17can have maybe a high degree of 10:18confidence if you know what you're 10:19passing within context, but oftent times 10:22for these long running tasks, there is 10:24the potential where it can go off 10:25course. And having things that can 10:27actually check it and run these more 10:29deterministic triggers and scripts at at 10:32particular times can be very very 10:34helpful. This can keep your code clean. 10:36This can prevent dangerous operations 10:38and like I've mentioned a couple times 10:40already, ensure that tests pass before 10:42actually stopping. Now, to get this set 10:44up, one of the fastest way to get going 10:45with this is if we go to the Ralph 10:47Wiggum plugin. And what you'll notice 10:49within here is what plugins are is 10:51actually being able to configure a 10:53number of different things within Cloud 10:54Code at once. You can have sub aents, 10:56you can have skills, and in this case, 10:58you can actually leverage hooks. Now, 10:59the core piece of this is if we take a 11:01look at the hooks, what we'll notice 11:03within here is we have the stop hook 11:04trigger. This is going to be how we 11:06actually invoke the different hooks that 11:08we have on this stop event. If I go back 11:10here and we take a look at this stop 11:12hook, this is an example in terms of 11:14what a hook looks like in terms of what 11:17you can actually invoke every time that 11:19it stops. And you can have a number of 11:21different scripts that invoke whenever 11:23Claude actually stops. Within here, you 11:25can see we have a formatter, iteration, 11:26max iteration, as well as the completion 11:28promise. Once you have it all installed, 11:30what you're going to be able to do is 11:31have this slash command for Ralph loop. 11:34So within the Ralph loop, what you're 11:35going to be able to do is put in your 11:37prompt, the number of max iterations as 11:39well as the completion promise. So what 11:41actually validates that that step is 11:44complete. Within here, what I can do is 11:45I can specify go through my to-do list 11:48step by step and mark down every step 11:50that is complete once it's actually 11:52done. I'll go ahead and I'll kick this 11:54off. What we see on the lefth hand side 11:56here is I have a number of different 11:57steps just to demonstrate this. We'll 11:59create a text file. But what you'll 12:01notice is in between each of these is 12:03I'm synthetically trying to trigger that 12:05stop process within Claude. And this is 12:07just to demonstrate what that hook will 12:09look like when it is triggered within 12:11Claude. We can see it went ahead. It 12:13completed the first task here. And now 12:15for our second task. What you'll notice 12:16within here is we have this stop hook 12:18error where it says go through my to-do 12:20list step by step and mark down every 12:22step that is complete once it's actually 12:24done. And now what this looks like and 12:26how it can persist is instead of 12:28actually returning a message to you, it 12:30will call this trigger and it will pass 12:31this back into Claude and have it just 12:34continue to go through the process 12:36within here. Within here, if I just 12:38scroll down, I see that number three is 12:40done. Once it gets to four, again, we 12:42have that hook being triggered as if 12:44there was a stop and returning a message 12:46back to us. And instead of stopping, 12:48we're just passing that back into 12:50context to have it to continue to go 12:52through the list. Now the one thing that 12:53I do want to mention with Ralph loops or 12:55this type of process is just make sure 12:57that you do set the max number of 12:59iterations as well as your promise. 13:02Otherwise this will run through. You see 13:04that my task list is complete. But if 13:06you don't specify that you have a 13:08completion promise or a max iteration it 13:11will just continue to go through and the 13:13loop will run infinitely. So, just make 13:15sure that you do actually specify both 13:17of these cuz otherwise you don't want to 13:18get in a scenario where you're just 13:20burning all of these tokens by 13:21effectively having an infinite loop. 13:23Otherwise, that's pretty much it for 13:25this video. I'll put the link to the 13:26GitHub repository within the description 13:28of the video. But otherwise, if you 13:30found this video useful, please like, 13:32comment, share, and subscribe. 13:33Otherwise, until the next one.