Skip to main content
Flyte orchestrates data and ML workflows by defining tasks, workflows, and launch plans. It determines how tasks are scheduled, retried, and executed across compute environments, but it does not observe how code behaves once a task is running inside a container or node. Tracer complements Flyte by exposing execution behavior: CPU, memory, disk, and network usage, during task execution, without modifying Flyte task definitions, container images, or execution semantics.
For a conceptual overview, see How Tracer fits in your stack.

What Flyte does well

Flyte provides strong guarantees around workflow structure and execution, including:
  • Task and workflow definitions with typed inputs and outputs
  • Deterministic execution and reproducibility
  • Scheduling, retries, and failure handling
  • Execution metadata, logs, and lineage
  • Environment isolation via containers
These capabilities make Flyte particularly effective for ML pipelines and data workflows where correctness, versioning, and reproducibility matter.

What Flyte does not see at runtime

Flyte tracks task state and execution outcomes, but it does not observe what happens inside the running task container. It does not show:
  • CPU utilization during task execution
  • Memory pressure, spikes, or over-allocation
  • Disk and network I/O contention
  • Subprocesses launched inside tasks (e.g. Python tools, CLIs, native binaries)
  • Idle time while tasks wait on data, storage, or external systems
This execution behavior occurs below the Flyte control plane and is not visible through task metadata or logs alone.

Why this gap matters in practice

Flyte tasks often wrap complex workloads: model training, feature generation, data transformation, or external tools invoked from Python. Resource requests are typically set conservatively to avoid retries or failures. Without execution-level visibility, teams struggle to answer:
  • Why a task runtime increased without code changes
  • Whether requested CPU or memory is actually used
  • Whether performance is limited by compute, I/O, or memory
  • Why infrastructure cost grows while workflows appear stable
As a result, workflows remain correct and reproducible, but inefficient.

What Tracer adds

Tracer observes execution directly from the host and container runtime and adds:
  • Observed CPU, memory, disk, and network usage per task execution
  • Visibility into subprocesses and nested tools invoked within tasks
  • Detection of stalls, idle time, and resource contention
  • Attribution of resource usage to workflows, tasks, and execution attempts
These insights are derived from observed runtime behavior, not from task configuration or declared resource requests.

Example: diagnosing a slow Flyte task

A Flyte task responsible for feature generation begins taking significantly longer, despite no changes to task code or inputs. Flyte logs show successful execution with no errors. Tracer reveals:
  • Low average CPU utilization
  • Sustained disk I/O wait
  • Repeated short-lived subprocesses reading from shared storage
This indicates an I/O-bound workload rather than insufficient compute. Increasing CPU requests would not reduce runtime. Tracer makes this visible by observing execution behavior directly, rather than inferring causes from task duration.

Using execution insight to tune Flyte workflows

With execution-level data, teams can make targeted changes such as:
  • Reducing CPU or memory requests for underutilized tasks
  • Choosing instance types better suited for I/O-heavy workloads
  • Separating compute-heavy and data-access-heavy tasks
  • Identifying tasks that block on external systems or storage
These adjustments can reduce cost, improve predictability, or both.

Observability comparison

This comparison highlights the difference between Flyte’s task-level orchestration visibility and Tracer’s execution-level observation.

What Tracer does not replace

Tracer is not an orchestration framework.
  • It does not replace Flyte
  • It does not define tasks, workflows, or launch plans
  • It does not change execution logic, retries, or scheduling
Flyte remains responsible for orchestration and correctness. Tracer makes execution behavior visible.

When to use Tracer with Flyte

Tracer is most useful when teams need to:
  • Explain slow or inconsistent task runtimes
  • Identify idle or over-provisioned resources
  • Diagnose performance issues beyond logs and task state
  • Attribute resource usage and cost to specific workflows or tasks
Tracer operates independently of Flyte and supports workflows written in any language or toolchain.

Summary

Flyte defines and orchestrates tasks and workflows with strong guarantees around correctness and reproducibility. Tracer adds execution-level visibility that shows how those tasks actually behave at runtime. Together, they provide both control and insight, without changes to existing Flyte workflows.