For a conceptual overview, see How Tracer fits in your stack.
What Flyte does well
Flyte provides strong guarantees around workflow structure and execution, including:- Task and workflow definitions with typed inputs and outputs
- Deterministic execution and reproducibility
- Scheduling, retries, and failure handling
- Execution metadata, logs, and lineage
- Environment isolation via containers
What Flyte does not see at runtime
Flyte tracks task state and execution outcomes, but it does not observe what happens inside the running task container. It does not show:- CPU utilization during task execution
- Memory pressure, spikes, or over-allocation
- Disk and network I/O contention
- Subprocesses launched inside tasks (e.g. Python tools, CLIs, native binaries)
- Idle time while tasks wait on data, storage, or external systems
Why this gap matters in practice
Flyte tasks often wrap complex workloads: model training, feature generation, data transformation, or external tools invoked from Python. Resource requests are typically set conservatively to avoid retries or failures. Without execution-level visibility, teams struggle to answer:- Why a task runtime increased without code changes
- Whether requested CPU or memory is actually used
- Whether performance is limited by compute, I/O, or memory
- Why infrastructure cost grows while workflows appear stable
What Tracer adds
Tracer observes execution directly from the host and container runtime and adds:- Observed CPU, memory, disk, and network usage per task execution
- Visibility into subprocesses and nested tools invoked within tasks
- Detection of stalls, idle time, and resource contention
- Attribution of resource usage to workflows, tasks, and execution attempts
Example: diagnosing a slow Flyte task
A Flyte task responsible for feature generation begins taking significantly longer, despite no changes to task code or inputs. Flyte logs show successful execution with no errors. Tracer reveals:- Low average CPU utilization
- Sustained disk I/O wait
- Repeated short-lived subprocesses reading from shared storage
Using execution insight to tune Flyte workflows
With execution-level data, teams can make targeted changes such as:- Reducing CPU or memory requests for underutilized tasks
- Choosing instance types better suited for I/O-heavy workloads
- Separating compute-heavy and data-access-heavy tasks
- Identifying tasks that block on external systems or storage
Observability comparison
This comparison highlights the difference between Flyte’s task-level orchestration visibility and Tracer’s execution-level observation.
What Tracer does not replace
Tracer is not an orchestration framework.- It does not replace Flyte
- It does not define tasks, workflows, or launch plans
- It does not change execution logic, retries, or scheduling
When to use Tracer with Flyte
Tracer is most useful when teams need to:- Explain slow or inconsistent task runtimes
- Identify idle or over-provisioned resources
- Diagnose performance issues beyond logs and task state
- Attribute resource usage and cost to specific workflows or tasks

