Is it possible to develop high performance tools in Python?

With every generation, chips have become more complex and transistor counts have increased exponentially (according to the famous Moore’s Law). This exponential growth in complexity and size has led to a corresponding growth in EDA tool data-base sizes (HDL files, simulation logs, waveform dumps, net-lists, timing reports, GDSII etc) as well as compute power required to processes these data-bases. Most EDA tools are compute intensive as well as memory intensive; demanding high performance from a compute as well as capacity standpoint.

Given the very stringent performance requirements of EDA tools, is it a good idea to use Python as a mainstream development language for EDA tools? We will try to answer this question by sharing some experiences from a tool development project at Arrow Devices.

This tool has to handle input data sets that contained signal level activity over 100’s of millions of clock cycles and deliver results in seconds. It has to examine this signal level activity and recognize patterns and categorize them. This process is quite compute intensive and has to be performed over large input data sets.

Python is a high-productivity language. By some estimates, writing the same logic takes 1/6th the number of lines when compared to C. This is great for tool development as it allows developers to add functionality at a break-neck pace. It also allows developers to try different approaches (i.e. prototype) and select the one that works best.

On the other hand, Python does not have a reputation for high-performance. Performance is the penalty developers have to pay for the productivity gains a high-level language like Python offers.

“Premature optimization is the root of all evil”, said Donald Knuth, and so initially the tool developers focused more on functional correctness and not so much on performance. As the tool encountered real life data sets, performance bottlenecks became visible. Multiple iterations of performance optimizations followed.

Initial performance optimizations were done with feedback from a performance profiler. The profiler identified where the program was spending a lot of execution time. This helped us identify performance issues such as:

  • Sub-optimally written code
  • Bad data-structures choices
  • Better algorithm options
  • Memory vs Compute tradeoffs

With the above optimizations identified and done in the Python code, we were able to get performance improvements of 50-70% (Milestones 1 & 2 in the figures).

pda-exectime-abs-400x310pda-exectime-rel-400x324

During this process, we also realized that some very critical routines could not be improved any further. We had hit the performance limitations of Python as a language at these critical points. We decided to re-implement these critical routines in C. This gave an additional performance improvement of 30-50% (Milestones 3 & 4 in the figures).

In this way, by rewriting small (~500 lines in C) parts of the program in C, we were able to benefit from both, the high-performance of C and the high-productivity of Python.

For the next level of performance gain, we looked at parallelizing our tool’s core engine. This is where Python really excelled. To parallelize our tool, we had to make some serious architectural changes. Luckily, we had always been diligent and kept things neat at an architectural level. Using Python’s multi-processing library we could parallelize with ease. What would have taken 6-8 months in C, took us less than 2 months and gave us another 20-50% performance improvement (depending on input data set).

To summarize, Python’s rich high-level programming features and its C extension capabilities allowed us to achieve our performance goals by using techniques such as:

  • Code/memory profiling
  • Implementing small, performance critical kernels, in C
  • And, parallelizing the core engine

Yes, It is possible to develop high-performance tools in Python!

Just in case you were wondering what the tool does: The PDA tool decodes high-level protocol packets/transactions from signal level information. It allows users to visualize system/unit activities in terms of lists of packets/transactions or state machines. The tool also checks for protocol errors and helps by providing various debug analysis and automation features. A short 3 minute video can be seen here: Enable your waveform viewer to decode protocols

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s