Teleoperate and collect robot data with LiveKit Portal

Picture a robot arm folding laundry on its own. A learned policy is driving it, sending an action chunk for every observation from a server in San Francisco. Then it slips. It misreads the pile and grabs the wrong corner of the shirt. A human operator has been watching the whole time from a laptop in Manila, and the moment the policy gets into trouble they take the controls, correct the motion, and hand it back once things settle. Every frame, every joint angle, and every correction is recorded and aligned into clean training data as it happens.

That is what LiveKit Portal is for. It's a thin layer over LiveKit's realtime infrastructure that gives you a production-grade stack for three things: teleoperating a robot you cannot physically reach, recording clean data while you do it, and running remote inference against it. You describe what your robot publishes and what your operator receives, and Portal handles the transport.

LiveKit Portal is built for robotics teams who collect teleoperation data and run policies over the network, on real hardware like a Trossen ALOHA or an I2RT YAM.

Why operating a robot remotely is hard#

Operating a robot over the internet means handling security, latency, and availability, and solving those from scratch is painfully slow. Most engineers would rather spend that time on the robot. But there is a deeper problem underneath, and it comes from a shift in how robots are built.

Classically, robotics software is distributed. A robot has multiple components publishing data at different rates. A SLAM service, an obstacle detection service, a joint state publisher: each consumes different streams at different frequencies. ROS and DDS, the standard robotics middleware, were built exactly for this model.

With the rise of VLAs and end-to-end learned policies, that flexibility is being traded for simplicity. Everything is consumed at the same frequency and bundled into a single observation, one snapshot of what the robot sees and where its joints are at a single moment. The modern robot loop looks like this:

1obs = robot.get_observation()
2action = model.select_action(obs)
3robot.send_action(action)

That shift is also what makes doing this remotely hard. Classic robotics moved many independent streams, and each was free to arrive on its own schedule. The modern loop moves one bundle that only works if its parts stay aligned.

On a single machine that alignment is free. Across the internet it's not, because the camera frame and the joint state take different paths and arrive at different times.

Reconstructing that bundle over the internet is where Portal starts. But operating a robot you cannot physically reach takes more than clean observations. You need to decide who is allowed to drive, see what is happening at every hop, and trust that the same code works whether the robot is in the next room or on another continent. LiveKit Portal handles all of it.

Observation syncing#

An observation is a bundle. Camera frames and joint state, tied to the same moment in robot time. That structure is not just convenient for inference. It's the format your training data has to be in. If the frame and the state in a recorded episode drift even a few ticks apart, your model learns from misaligned data.

When the robot is remote, frames and state travel over independent transport paths with different latencies. Portal fuses them back into a coherent observation on the receiving end. Both sides declare the same schema before connecting, and that shared declaration is the only coordination required.

The fusion runs entirely on the operator side. Portal tags every outgoing frame and state packet with the robot's own sender timestamp, so reconciliation compares timestamps that all came from one clock. There is no cross-machine clock sync, no NTP, no shared time source. The operator buffers incoming frames and state, then matches each state to the closest frame from every video track within a configurable window to emit one observation.

1cfg = PortalConfig("my-session", Role.OPERATOR)
2cfg.add_video("front")
3cfg.add_state_typed([("j1", DType.F32), ("j2", DType.F32), ("j3", DType.F32)])
4cfg.set_fps(30)
5
6portal = Portal(cfg)
7
8def on_observation(obs):
9    action = policy(obs)
10    portal.send_action(action, in_reply_to_ts_us=obs.timestamp_us)
11
12portal.on_observation(on_observation)
13await portal.connect(url, token)

Your collected data looks exactly like it would from a local robot, and your policy always sees a clean snapshot. No matching logic on your end.

Single robot, multiple operators#

Most robotics architectures today are client-server with a 1:1 topology. One teleoperator to one robot, or one policy to one robot. Real deployments are messier than that. Human-in-the-loop data collection wants a policy and a human operator running at the same time, DAgger-style, so the human's corrections become new training labels. In a strict client-server setup you would have to couple the operator and the policy into the same session as the robot.

Portal decouples them. A Portal session is just a LiveKit room. The robot, a human operator, a passive viewer, a policy runner all join it the same way participants join a video call, from anywhere in the world. Your policy can sit on whatever compute it needs with no location constraint. When you need several policies running together, you host them in warm pools and dispatch them into the room where your robot lives.

Many participants can be connected at once, but only one drives the robot at a time. That's what makes the handoff clean: when the human takes over from the policy, switching who is in control is just a change in which participant sends actions, not a reconfiguration of the system.

Observability#

Running a robot remotely without visibility into the pipeline is flying blind. Portal exposes a live metrics snapshot you can pull at any cadence.

1m = portal.metrics()
2
3# sync: observations emitted, states dropped, state-to-frame alignment
4print(m.sync.observations_emitted, m.sync.states_dropped)
5print(m.sync.match_delta_us_p50, m.sync.match_delta_us_p95)
6
7# transport: per-track frame counts and jitter
8print(m.transport.frame_jitter_us)
9
10# rtt: last, mean, p95
11print(m.rtt.rtt_us_last, m.rtt.rtt_us_mean, m.rtt.rtt_us_p95)
12
13# policy: true observation-to-action latency, not just network ping
14print(m.policy.e2e_us_p50, m.policy.e2e_us_p95)

Together these span the critical path, from sync alignment to transport to the policy loop. match_delta_us_p95, for example, is the worst-case gap in robot time between a state and the frame it was matched to, so the tighter it is, the cleaner your observations.

On the network side, frame_jitter_us and rtt_us_p95 are your early warnings. Rising jitter or climbing RTT means the link is degrading before it ever shows up as dropped observations. e2e_us_p95 is the one that bounds your control loop. When the operator passes in_reply_to_ts_us on each action, the robot times the full round trip from when it sent the observation to when the action came back, inference included. That's the real number that decides how tight a closed loop you can run, not network ping.

Counters update on the hot path, so reading them has no effect on throughput.

Local is the same as remote#

LiveKit Portal is not only for production deployment over the internet. It runs just as well inside a single lab. Point it at the open-source LiveKit SFU on your own network and latency is nearly instant, with no data ever leaving the building. The same code that operates a robot across the country operates one across the room. The only thing that changes is the URL you connect to.

1# local: the open-source LiveKit SFU on your lab network
2await portal.connect("ws://192.168.1.10:7880", token)
3
4# remote: LiveKit Cloud, same code
5await portal.connect("wss://my-project.livekit.cloud", token)

It takes one command to set up a local LiveKit SFU.

1# With Docker
2docker run --rm --name livekit -p 7880:7880 livekit/livekit-server:latest --dev --bind 0.0.0.0
3
4# With LiveKit SFU (Installation here: https://github.com/livekit/livekit)
5livekit-server --dev --bind 0.0.0.0

Running locally also lets you trade bandwidth for fidelity. Over the internet you compress video to survive the link. On a local network bandwidth is not the constraint, so Portal lets you send frames as raw RGB or lossless PNG and collect bit-exact training data, not the lossy frames you would settle for over a WAN. MJPEG is there too when you want compact frames with sub-millisecond decode.

1cfg.add_video("front", codec=VideoCodec.RAW)   # uncompressed, bit-exact
2cfg.add_video("wrist", codec=VideoCodec.PNG)   # lossless
3cfg.add_video("scene", codec=VideoCodec.MJPEG) # compact, lossy per-frame

Getting started with LiveKit Portal#

Portal lets you leverage LiveKit's realtime infrastructure across whatever robotics deployment paradigm you need, from a single arm on a lab bench to a fleet of robots in the wild driven by policies running anywhere in the world. Don't worry about the transport and spend your time on the robot instead.

Portal is available today on PyPI: