tasks) executed sequentially. Each task
can be either a deterministic command (such as execute_sql, in which a named
sql query is executed) or an agent given a prompt. These tasks are composed by
passing results from one agent to the input of another — the output of each
task can be accessed with Jinja as {{ name_of_task }}.
Workflow components
Workflows are DAGs comprised oftasks. Each task has a few common properties:
| Component | Description | Type |
|---|---|---|
| name | Identifier for the task. Output of the task can be referenced as {{name}}. | required |
| type | The tool to use for this task. See the following section for possible types. | required |
type: agent
| Component | Description | Type |
|---|---|---|
| agent_ref | The agent to use within the agents directory, referenced by the agent’s name. | required for type: agent |
| prompt | The input prompt passed to the agent for this task. | required for type: agent |
type: execute_sql
Executes a SQL query either referenced by filename or provided inline.
| Component | Description | Type |
|---|---|---|
| sql_file | The sql file within the data directory to execute | required (or use sql_query) |
| sql_query | Inline SQL query to execute (alternative to sql_file) | required (or use sql_file) |
| database | The name of the database to execute the query against | required |
sql_file to reference a SQL file, or sql_query to provide the SQL inline.
Note: These options are mutually exclusive—only one should be used at a time. Specifying both in a single task is not allowed.
type: formatter
Formats the provided template using the outputs of other tasks, then passes
the rendered template as output.
| Component | Description | Type |
|---|---|---|
| template | The template to be rendered and passed as output. | required |
type: loop_sequential
| Component | Description | Type |
|---|---|---|
| values | Values to iterate over for each task in the current task’s tasks array. | required |
| tasks | Defines the tasks to execute for each value. | required |
values are accessed within the tasks of the loop_sequential task as
<name>.value, where <name> is the name of the task. A sample partial config
is shown below:
Seeding values with query results
The values can be seeded with the output from a previous execute_sql step,
as follows:
Formatting loop outputs
Loops are also often combined with thetype: formatter task, which can loop through
the resulting outputs and form them into a single string. The output from a
loop_sequential is an array of dictionaries for each value, where the keys
for each element of each dictionary is named according to the task’s’ name
field. These can be accessed by using Jinja, by looping through the {{ <loop_name> }} variable ({{ loop_through_animals }} above).
An example of this behavior is shown below:
Concurrency
Concurrency can be added to the loop by using theconcurrency key, with the
value specifying the number of concurrent threads to use.
type: workflow
| Component | Description | Type |
|---|---|---|
| src | Path to the workflow yml file to execute. Relative to the root of the oxy directory. | required |
| variables | Variables that are passed through, overriding the sub-workflow’s variables. | optional |
variables key
here allows for parameterization of these workflows by overriding the
workflow-level variables. This can be particularly useful when embedding a
workflow task into a loop, as follows:
Variables
It’s often the case that you may want to parameterize a workflow — for example, if you are trying to build an automated analysis, and want this to be modular with respect to the date. We enable this behavior through the use of thevariables key.
Basic Variables
Simple variables can be defined as key-value pairs:Typed Variables with JSON Schema
For better validation and documentation, you can define variables using JSON Schema:Passing Variables to Tasks
Variables can be passed to different task types:Agent Tasks
SQL Tasks
Semantic Query Tasks
Global Variables
Variables can reference global semantics values:Examples
workflows/monthly_report.yml
Workflows vs. chains
A workflow is similar to a “chain” in the prompt engineering parlance, but with a few key differences:- Workflows are DAGs. Whereas chains can become arbitrarily complex with arbitrarily nested loops, complex reply logic, and opaque branching structures, workflows are DAGs, which enforce a clearer, more predictable flow from input to output.
- Workflows separate logic from execution. Because workflows are written in yaml, the DAG definition is entirely separate from the execution engine (usually Python), while other Python-based systems keep these tightly coupled and so ultimately become difficult to build and maintain.
langchain or llama_index, but they also dramatically reduce the complexity
of the system. You can think of Oxy’s workflow paradigm as a domain-specific
chain-builder for data workflows, where most (if not all) tasks simply pass
results around between different agents.