Optimizing My FPGA Build Workflow

If you've worked with Xilinx Vivado on any non-trivial project, you know the pain: synthesis takes 20 minutes, implementation takes another 40, and suddenly half your day is gone waiting for builds. Here's how I've optimized my workflow to get faster iteration cycles without sacrificing quality.

The Problem

Working with Xilinx UltraScale+ FPGAs, my typical build times were:

Synthesis: 15-25 minutes
Implementation: 30-50 minutes
Bitstream generation: 5-10 minutes

That's potentially 85 minutes for a single iteration. Make a typo in your HDL? That's another 85-minute wait. This is not sustainable for productive development.

Solution 1: Incremental Compilation

Vivado supports incremental synthesis and implementation, but it's not enabled by default and requires some setup. The idea is simple: reuse results from previous runs for unchanged modules.

Setting Up Incremental Synthesis

First, enable incremental synthesis in your project settings:

# In your TCL script or Vivado console
set_property INCREMENTAL_CHECKPOINT path/to/reference.dcp [current_design]
set_property AUTO_INCREMENTAL_CHECKPOINT 1 [current_design]

The key is maintaining a reference checkpoint from a known-good build. Vivado will compare your current design against this checkpoint and only resynthesize changed modules.

Incremental Implementation

For implementation, you need a reference checkpoint from a fully placed and routed design:

set_property INCREMENTAL_CHECKPOINT path/to/routed.dcp [get_runs impl_1]

This typically gives me 40-60% reduction in implementation time for small changes. The caveat: if you change something fundamental (clock structure, major block moves), it falls back to full implementation.

Solution 2: TCL Scripting Everything

The Vivado GUI is convenient for exploration, but it's slow and non-reproducible for actual builds. I've moved to a fully scripted workflow using TCL.

Project Structure

project/
├── src/
│   ├── rtl/          # Verilog/VHDL sources
│   ├── constraints/  # XDC timing constraints
│   └── ip/           # IP configurations
├── scripts/
│   ├── build.tcl     # Main build script
│   ├── synth.tcl     # Synthesis settings
│   └── impl.tcl      # Implementation settings
└── output/           # Build artifacts

The Build Script

My build.tcl handles everything from source file collection to bitstream generation:

# build.tcl - Main build script
set project_name "my_project"
set part "xczu7ev-ffvc1156-2-e"

# Create project in memory (faster than on-disk)
create_project -in_memory -part $part

# Add sources
add_files -fileset sources_1 [glob src/rtl/*.v]
add_files -fileset constrs_1 [glob src/constraints/*.xdc]

# Run synthesis
source scripts/synth.tcl
synth_design -top top_module

# Run implementation
source scripts/impl.tcl
opt_design
place_design
route_design

# Generate bitstream
write_bitstream -force output/${project_name}.bit

Running vivado -mode batch -source scripts/build.tcl gives me reproducible builds that can run headless on any machine.

Solution 3: Remote Build Server

My laptop is fine for editing code, but it's not ideal for running massive place-and-route algorithms. I set up a remote build server with better specs and run builds over SSH.

The Setup

Remote server with 32 cores and 128GB RAM (old workstation)
Vivado installed on the server
Project files synced via rsync or git
Builds triggered via SSH

The Workflow

# Sync sources to build server
rsync -avz --exclude='*.dcp' --exclude='*.bit' \
    ./project/ buildserver:~/fpga/project/

# Trigger build remotely
ssh buildserver "cd ~/fpga/project && \
    vivado -mode batch -source scripts/build.tcl" &

# Continue working locally while build runs...

With 32 cores vs my laptop's 8, synthesis time dropped from 20 minutes to 8 minutes. Implementation improved similarly. The parallel build means I can continue editing while the server crunches numbers.

Solution 4: Out-of-Context Synthesis

For large designs with multiple IP blocks, out-of-context (OOC) synthesis is a game-changer. Each IP block gets synthesized independently and cached.

# Mark a module for out-of-context synthesis
set_property USED_IN {synthesis out_of_context} \
    [get_files src/rtl/dsp_block.v]

The first build takes longer, but subsequent builds reuse the OOC netlists. If you're iterating on top-level integration and not touching the IP blocks, this saves massive time.

Solution 5: Simulation-First Development

The best way to speed up builds is to do fewer of them. I've shifted to doing more verification in simulation before touching hardware.

"Catching a bug in simulation costs minutes. Catching it after place-and-route costs hours. Catching it in hardware costs days."

My current flow:

Unit tests in simulation - Quick feedback for individual modules
Integration tests - Verify module interfaces before synthesis
Formal verification - For critical control logic (using SymbiYosys)
Hardware build - Only when simulation is clean

Results

With all these optimizations combined, my typical iteration cycle went from 85 minutes to:

Small RTL change: 12-18 minutes (incremental build)
IP block change: 25-35 minutes (partial OOC rebuild)
Full rebuild: 35-45 minutes (on build server)

That's a 2-3x improvement for most iterations. More importantly, the remote build means I'm not blocked - I can keep working while builds run in the background.

What's Next

I'm currently exploring:

Vivado's synthesis directives - Fine-tuning for better timing closure
Abstract shell flows - For even faster platform development
Cloud builds - AWS F1 instances have Vivado pre-installed

FPGA development doesn't have to be a waiting game. With the right infrastructure, you can iterate almost as quickly as software development. Almost.

FPGA Vivado Workflow Xilinx