Overview
At a high level:- Tracer/collect observes execution events at the operating system level
- These events are correlated into structured entities
- Higher-level products (Tracer/tune and Tracer/sweep) operate on this shared model
Workflow-agnostic
Works with any orchestrator or scheduler
Stable
Consistent across environments
Expressive
Represents complex, multi-process execution
Core entities
Runs
A run represents a single execution of a pipeline or workload. A run typically corresponds to:- A workflow execution (for example, a Nextflow or Snakemake run)
- A batch job or experiment
- A repeated invocation of the same pipeline configuration
Tasks
A task represents a logical unit of work within a run. Tasks often correspond to:- Workflow steps or processes
- Batch jobs or array jobs
- Scheduled units of execution
- Run on one or multiple hosts
- Execute sequentially or in parallel
- Spawn multiple tools and subprocesses
Tools
A tool represents an executable program invoked during a task. Examples include:- Native binaries (for example, bwa, samtools)
- Interpreters and scripts (python, bash)
- JVM-based tools
- Short-lived helper binaries and child processes
Containers
A container represents an execution context defined by container runtimes or Linux namespaces. Containers:- Group related processes
- Provide isolation boundaries
- May contain multiple tools and subprocesses
Hosts
A host represents a physical or virtual machine where execution occurs. Hosts include:- Cloud instances (for example, EC2)
- On-premises nodes
- Batch or HPC worker nodes
Relationships between entities
The entities form a hierarchy:- A run contains one or more tasks
- A task invokes one or more tools
- Tools execute within a container or directly on a host
- All execution ultimately occurs on a host
- Attribute resource usage accurately
- Compare behavior across runs and tasks
- Correlate infrastructure behavior with pipeline execution
How correlation works
Tracer correlates execution events using identifiers exposed by the operating system, including:- Process IDs and parent–child relationships
- Cgroups and namespaces
- Container runtime metadata (when available)
- Workflow engine integration
- Application instrumentation
- Explicit tagging
What the data model enables
This data model is the foundation for Tracer’s higher-level capabilities. It enables:- Execution timelines organized by run, task, and tool
- Resource usage attribution at meaningful boundaries
- Detection of idle execution and contention
- Cost attribution aligned with real execution behavior
- Cross-run comparison and regression detection
What the data model does not represent
Tracer models how workloads execute, not what they compute.Orchestrator terminology mapping (reference)
Tracer’s data model is framework- and language-agnostic. The table below shows how Tracer entities typically align with common orchestrator concepts. Exact mappings may vary by workflow engine and configuration.Run
Workflow run, DAG run, execution
Task
Process, step, task, op, node
Tool
Binary, script, container entrypoint
Container
Pod, container, namespace
Host
Worker node, instance, executor host
| Tracer concept | Common equivalents |
|---|---|
| Run | Workflow run, DAG run, execution |
| Task | Process, step, task, op, node |
| Tool | Binary, script, container entrypoint |
| Container | Pod, container, namespace |
| Host | Worker node, instance, executor host |
When to read this page
This page is most useful if you:- Want to understand how Tracer structures execution data
- Are integrating Tracer data into external systems
- Need clarity on attribution boundaries and terminology
- Are evaluating Tracer for complex or regulated environments

