Goby3 3.2.3
2025.05.13
|
It is often important to know if all the processes that are expected to be running on the robotic system are actually running and are responsive.
goby_coroner
regularly publishes a request (heartbeat) that is subscribed to by all the applications that subclass from goby::middleware::SingleThreadApplication or goby::middleware::MultiThreadApplication. These applications each send a response which is aggregated into a report that can be monitored by a custom process to notify someone or perform an action (e.g., restart the unresponsive process).
A simple launch script that has two goby applications (goby_gps
and goby_logger
) that is monitored by goby_coroner
would look like:
You must explicitly specify --expected_name
for an app to show up in the report from goby_coroner
.
Run this and then you can monitor the publications:
You'll see three publications:
goby_coroner
:Most of the time, you would want to subscribe to the goby::health::report
using a custom application-specific Goby app and do something with it. For example, see jaiabot_health
app from the Jaiabot project: https://docs.jaia.tech/md_page75_health.html#autotoc_md370
The groups are defined in:
and the Protobuf messages are in goby/src/middleware/protobuf/coroner.proto
and can be included using:
The goby.middleware.protobuf.VehicleHealth
protobuf message is a hierarchical and recursive message.
The various levels are:
goby_coroner
)At each level, a given component (vehicle, process or thread) can have one of three health statuses:
The aggregate health status of the parent is the worst status of any of its children. So if one Process reports HEALTH__DEGRADED, the Vehicle is HEALTH__DEGRADED. If any thread reports HEALTH__FAILED, the Process reports HEALTH__FAILED.
Thus, the only way for the Vehicle to be HEALTH__OK is if all Processes (and all their Threads) report HEALTH__OK.
Looking at the example earlier:
If goby_gps
crashes or stops responding, the report will look like this:
goby_coroner
automatically infers that goby_gps process died since it did not respond to a request.
If you are using goby::middleware::SingleThreadApplication or goby::middleware::MultiThreadApplication you can modify the default response (which is simply HEALTH__OK), which then gets passed through to the report.
To do so you override the virtual method in your subclass:
This will be called each time your process gets a request from goby_coroner
and the contents of health
after this function completes is what is reported in the response.
The same method can be overriden for each Thread within goby::middleware::MultiThreadApplication
, as desired.
Finally, you can extend the ThreadHealth message using Protobuf extensions to include any custom data you want to pass out in the goby_coroner report. For example, see the JaiaBot project, which extends ThreadHealth to add project specific warning and error enumerations: https://docs.jaia.tech/health_8proto_source.html
If you are using the extensions for a private project, simply choose any value over 1000. For projects that are public or should interoperate you can post an issue to https://github.com/GobySoft/goby3/issues requested an extension assignment.