Back to Blog

Optimizing My FPGA Build Workflow

October 2024 12 min read
FPGA development setup

If you've worked with Xilinx Vivado on any non-trivial project, you know the pain: synthesis takes 20 minutes, implementation takes another 40, and suddenly half your day is gone waiting for builds. Here's how I've optimized my workflow to get faster iteration cycles without sacrificing quality.

The Problem

Working with Xilinx UltraScale+ FPGAs, my typical build times were:

That's potentially 85 minutes for a single iteration. Make a typo in your HDL? That's another 85-minute wait. This is not sustainable for productive development.

Solution 1: Incremental Compilation

Vivado supports incremental synthesis and implementation, but it's not enabled by default and requires some setup. The idea is simple: reuse results from previous runs for unchanged modules.

Setting Up Incremental Synthesis

First, enable incremental synthesis in your project settings:

# In your TCL script or Vivado console
set_property INCREMENTAL_CHECKPOINT path/to/reference.dcp [current_design]
set_property AUTO_INCREMENTAL_CHECKPOINT 1 [current_design]

The key is maintaining a reference checkpoint from a known-good build. Vivado will compare your current design against this checkpoint and only resynthesize changed modules.

Incremental Implementation

For implementation, you need a reference checkpoint from a fully placed and routed design:

set_property INCREMENTAL_CHECKPOINT path/to/routed.dcp [get_runs impl_1]

This typically gives me 40-60% reduction in implementation time for small changes. The caveat: if you change something fundamental (clock structure, major block moves), it falls back to full implementation.

Solution 2: TCL Scripting Everything

The Vivado GUI is convenient for exploration, but it's slow and non-reproducible for actual builds. I've moved to a fully scripted workflow using TCL.

Project Structure

project/
├── src/
│   ├── rtl/          # Verilog/VHDL sources
│   ├── constraints/  # XDC timing constraints
│   └── ip/           # IP configurations
├── scripts/
│   ├── build.tcl     # Main build script
│   ├── synth.tcl     # Synthesis settings
│   └── impl.tcl      # Implementation settings
└── output/           # Build artifacts

The Build Script

My build.tcl handles everything from source file collection to bitstream generation:

# build.tcl - Main build script
set project_name "my_project"
set part "xczu7ev-ffvc1156-2-e"

# Create project in memory (faster than on-disk)
create_project -in_memory -part $part

# Add sources
add_files -fileset sources_1 [glob src/rtl/*.v]
add_files -fileset constrs_1 [glob src/constraints/*.xdc]

# Run synthesis
source scripts/synth.tcl
synth_design -top top_module

# Run implementation
source scripts/impl.tcl
opt_design
place_design
route_design

# Generate bitstream
write_bitstream -force output/${project_name}.bit

Running vivado -mode batch -source scripts/build.tcl gives me reproducible builds that can run headless on any machine.

Solution 3: Remote Build Server

My laptop is fine for editing code, but it's not ideal for running massive place-and-route algorithms. I set up a remote build server with better specs and run builds over SSH.

The Setup

The Workflow

# Sync sources to build server
rsync -avz --exclude='*.dcp' --exclude='*.bit' \
    ./project/ buildserver:~/fpga/project/

# Trigger build remotely
ssh buildserver "cd ~/fpga/project && \
    vivado -mode batch -source scripts/build.tcl" &

# Continue working locally while build runs...

With 32 cores vs my laptop's 8, synthesis time dropped from 20 minutes to 8 minutes. Implementation improved similarly. The parallel build means I can continue editing while the server crunches numbers.

Solution 4: Out-of-Context Synthesis

For large designs with multiple IP blocks, out-of-context (OOC) synthesis is a game-changer. Each IP block gets synthesized independently and cached.

# Mark a module for out-of-context synthesis
set_property USED_IN {synthesis out_of_context} \
    [get_files src/rtl/dsp_block.v]

The first build takes longer, but subsequent builds reuse the OOC netlists. If you're iterating on top-level integration and not touching the IP blocks, this saves massive time.

Solution 5: Simulation-First Development

The best way to speed up builds is to do fewer of them. I've shifted to doing more verification in simulation before touching hardware.

"Catching a bug in simulation costs minutes. Catching it after place-and-route costs hours. Catching it in hardware costs days."

My current flow:

Results

With all these optimizations combined, my typical iteration cycle went from 85 minutes to:

That's a 2-3x improvement for most iterations. More importantly, the remote build means I'm not blocked - I can keep working while builds run in the background.

What's Next

I'm currently exploring:

FPGA development doesn't have to be a waiting game. With the right infrastructure, you can iterate almost as quickly as software development. Almost.