Sheldon Lewis

The runner was the recovery path

Using a workflow runner to recover access to a host, and why runners need to be treated as serious infrastructure.

  • Using a still-running Forgejo runner as a recovery path after losing host access.
  • Reverse tunnel and Cloudflare shell approaches for temporary access.
  • Why runners with Docker access, network reach, repository trust, and secrets need serious controls.
  • Manual jobs, hard timeouts, temporary tunnels, runner secrets, and cleanup as guardrails.

I got locked out of a remote host, but the local runner on that machine was still checking for work. That runner became the recovery path.

I built a manual workflow that brought up a temporary Cloudflare SSH tunnel and waited long enough for me to get a shell. It gave me a way back into a system I otherwise could not reach.

There were two useful shapes in the testing. The first one proved that a runner could create a reverse path through a recovery host:

Reverse tunnel test
 
+---------+  job   +-------------+
| Forgejo | -----> | runner host |
| manual  |        | Docker/net  |
+---------+        +------+------+
                          |
                  +-------+-------+
                  |               |
                  v               v
           +---------------+ +-------------+
           | target host   | | recovery    |
           | private LAN   | | SSH host    |
           +---------------+ +------+------+
                                  ^
+---------+  SSH forwarded port   |
| me      | ----------------------+
+---------+

The second one was the path that ended up being useful for the real recovery. The runner started a temporary shell container and an outbound Cloudflare tunnel, then I connected through Cloudflare:

Cloudflare shell test
 
+---------+  job   +-------------+
| Forgejo | -----> | runner host |
| manual  |        | Docker/net  |
+---------+        +------+------+
                          |
                          | starts
                          | shell + tunnel
                          v
                   +-------------+
                   | temp SSH    |
                   | shell       |
                   +------+------+
                          |
                          | outbound tunnel
                          v
+---------+  browser SSH  +-------------+
| me      | <------------ | Cloudflare  |
| browser |               | Access      |
+---------+               +-------------+

I was relieved when it worked. I was also uncomfortable with how much that proved.

A runner is not just a build worker. In the wrong setup it can become a remote administration channel. It may have Docker access, local network reach, repository trust, and secrets that were added for normal automation. Put those together and a workflow can do a lot more than compile code.

The part that made this feel acceptable was the friction around it. The jobs were manual. They had hard timeouts. The tunnel was temporary. Secrets stayed in the runner secret store. The workflow cleaned up containers at the start and end of the run, including failure paths.

Those controls are not decoration. They are the difference between an emergency recovery tool and a quiet access bypass.

The lesson for me was simple: treat runners like infrastructure with authority. Use narrow labels, narrow secrets, manual gates for dangerous jobs, short time windows, and cleanup that still runs when things fail. A runner that can save you during an outage can also cross the access boundary you thought you had.

The runner was the recovery path - Sheldon Lewis