Profiling vs benchmarking

Measuring performance typically involves two different processes, profiling and benchmarking. Both are essential practices for improving the performance of an application. Depending on the situation only one of the two may be applicable, while other times they complement each other.

Profiling is the process of collecting metrics about a program; like execution time, memory used, number of functions called, etc., while the program is executing. This can be useful when looking to optimize parts of the program.

Benchmarking is a process used to compare two or more systems using a common measurement so you can identify which performs better against that specific test. This is useful for comparing different software solutions or for comparing the change in different versions of the same software.

At a high level, the difference between profiling and benchmarking is best explained by the question each practice answers respectively:

  • Profiling helps you find out why a piece of software or subsystem performs the way it does.
  • Benchmarking helps you assess how well a piece of software or subsystem performs.

Since both profiling and benchmarking measure performance in some capacity, these two definitions of “why vs how well” may still not fully clarify their difference. Looking at example situations should help convey the difference between them:

  • Let’s say we want to find ways to improve the performance of WordPress coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. as much as possible: For that, we would need to do profiling. We need to inspect which functions take the longest time to execute, so that we can focus on improving the performance of those that are the worst bottlenecks.
  • Let’s say we want to compare performance between two adjacent WordPress core releases, to see how the latter release cycle affected performance. For that, we would need to do benchmarking: We put them up against each other and compare, for example, the load time of certain content on the one WordPress version with the load time of the same content on the other version.

Essentially, benchmarking always involves some sort of comparison: If we benchmark performance of a single WordPress core release and never compare it with anything, it doesn’t really provide much value. We wouldn’t be able to say that the release performs particularly well or not as long as we don’t compare it to a reference point. Profiling however is valuable just from looking at a single version, we don’t have to compare the profiling results to another set of results to get a value out of it.

Another relationship between profiling and benchmarking is in their sequence: While in some situations, only one of the two may apply, generally speaking, profiling happens before taking action on code changes, while benchmarking happens afterwards. Consider the following situation:

  1. We want to improve the WordPress core server response time, so we profile WordPress core and identify that a certain function makes for 10% of the overall time. Therefore we decide to focus on improving the performance of that function.
  2. We improve the logic in that function to be more efficient, hoping to decrease its overall negative performance impact.
  3. After completing the code change, we benchmark WordPress core by comparing server response time before the change with server response time after the change, to validate how much the change (hopefully) improved it.

Now here’s the tricky part: based on the above, you may argue that if we used profiling to identify the function, we could just profile again afterwards to make the comparison. That is technically possible, but it depends on what you are trying to achieve with the benchmark. Since profiling tools track the performance of every single function while the application is executing, they add some overhead that slows down execution time, which can skew the relevant impact of a change. A better way to assess overall performance impact of a change is to take benchmarks using tools that measure general industry standards like TTFB. On the flipside, benchmarking tools do not track performance of every single function, but only specific pieces that need to be explicitly configured. Therefore they are not suitable to identify concrete performance bottlenecks to improve in your software or subsystem. In general, keep in mind to use the right tool for the job.

Regardless of profiling or benchmarking, it should be noted that tools produce different numbers between different runs, which is dependent on various factors like your machine’s capacity, the memory available etc. One of the best practices to account for this expected variability is to conduct multiple checks to validate that a specific observance was not just “random” due to environment specifics.

Common tools for profiling and benchmarking

Profiling and benchmarking tools exist in various shapes and forms. Profiling tools usually apply to a specific programming language, for example different profiling tools are used for PHPPHP PHP (recursive acronym for PHP: Hypertext Preprocessor) is a widely-used open source general-purpose scripting language that is especially suited for web development and can be embedded into HTML., JavaScriptJavaScript JavaScript or JS is an object-oriented computer programming language commonly used to create interactive effects within web browsers. WordPress makes extensive use of JS for a better user experience. While PHP is executed on the server, JS executes within a user’s browser., Ruby, etc. Benchmarking tools can be more generic since what they measure is at a higher level and therefore less dependent on the programming language, though even benchmarking tools are sometimes specialized in specific environments. For both profiling and benchmarking, it is furthermore worth noting that there are also WordPress-specific solutions available.

Common tools for profiling PHP are Xdebug and XHProf, while for profiling JavaScript a common approach is to use the profiler built into a browser, e.g. the Chrome Profiler or Firefox Profiler. While not traditionally a profiler, the WordPress pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the Plugin Directory or can be cost-based plugin from a third-party Query Monitor can also be used for certain WordPress-specific profiling.

A common benchmarking tool for PHP is PHPBench, which is an automated performance testing tool. For more ad-hoc benchmarking, any tool that uses the Server-Timing headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. can be helpful. The Performance Lab plugin includes a way to include this info on a WordPress site. For higher-level benchmarking, tools like Apache Bench (ab) can benchmark performance using HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. requests, and libraries like web-vitals can be used to benchmark overall performance of a website holistically, including Core Web Vitals measurement.

Please see the following articles that describe profiling and benchmarking with a specific toolset in more depth:

Additionally, the following articles provide context on important nuances that can apply to both profiling and benchmarking:

Both profiling and benchmarking can be effective tools when doing performance testing. By using both techniques at the right times, you can more effectively improve and measure the performance of your code, and have a positive impact on the experience of your users.

Props @flixos90 @joemcgill @spacedmonkey for contributing to this article.

Last updated: