This action will force synchronization from mirrors_sirupsen/napkin-math, which will overwrite any changes that you have made since you forked the repository, and can not be recovered!!!
Synchronous operation will process in the background and will refresh the page when finishing processing. Please be patient.
The goal of this project is to collect software, numbers, and techniques to quickly estimate the expected performance of systems from first-principles. For example, how quickly can you read 1 GB of memory? By composing these resources you should be able to answer interesting questions like: how much storage cost should you expect to pay for logging for an application with 100,000 RPS?
The best introduction to this skill is through my talk at SRECON.
The best way to practise napkin math in the grand domain of computers is to work on your own problems. The second-best is to subscribe to this newsletter where you'll get a problem every few weeks to practise on. It should only take you a few minutes to solve each one as your facility with these techniques improve.
The archive of problems to practise with are here. The solution will be in the following newsletter.
Below are numbers that are rounded from runs on a metal Intel Xeon E-2236 3.4GHz with 12 (virtual) cores.
Note 1: Some throughput and latency numbers don't line up, this is intentional for ease of calculations.
Note 2: Take the numbers with a grain of salt. E.g. for I/O, fio
is
the state-of-the-art. I am continuously updating these numbers as I learn more
to improve accuracy and as hardware improves.
Operation | Latency | Throughput | 1 MiB | 1 GiB |
---|---|---|---|---|
Sequential Memory R/W (64 bytes) | 0.5 ns | |||
-- Single Thread, No SIMD | 10 GiB/s | 100 μs | 100 ms | |
-- Single Thread, SIMD | 20 GiB/s | 50 μs | 50 ms | |
-- Threaded, No SIMD | 30 GiB/s | 35 μs | 35 ms | |
-- Threaded, SIMD | 35 GiB/s | 30 μs | 30 ms | |
Hashing, not crypto-safe (64 bytes) | 25 ns | 2 GiB/s | 500 μs | 500 ms |
Random Memory R/W (64 bytes) | 50 ns | 1 GiB/s | 1 ms | 1s |
Fast Serialization [8] [9] † |
N/A | 1 GiB/s | 1 ms | 1s |
Fast Deserialization [8] [9] † |
N/A | 1 GiB/s | 1 ms | 1s |
System Call | 500 ns | N/A | N/A | N/A |
Hashing, crypto-safe (64 bytes) | 500 ns | 200 MiB/s | 10 ms | 10s |
Sequential SSD read (8 KiB) | 1 μs | 4 GiB/s | 200 μs | 200 ms |
Context Switch [1] [2]
|
10 μs | N/A | N/A | N/A |
Sequential SSD write, -fsync (8KiB) | 10 μs | 1 GiB/s | 1 ms | 1s |
TCP Echo Server (32 KiB) | 10 μs | 4 GiB/s | 200 μs | 200 ms |
Decompression [11]
|
N/A | 1 GiB/s | 1 ms | 1s |
Compression [11]
|
N/A | 500 MiB/s | 2 ms | 2s |
Sequential SSD write, +fsync (8KiB) | 1 ms | 10 MiB/s | 100 ms | 2 min |
Sorting (64-bit integers) | N/A | 200 MiB/s | 5 ms | 5s |
Random SSD Read (8 KiB) | 100 μs | 70 MiB/s | 15 ms | 15s |
Serialization [8] [9] † |
N/A | 100 MiB/s | 10 ms | 10s |
Deserialization [8] [9] † |
N/A | 100 MiB/s | 10 ms | 10s |
Proxy: Envoy/ProxySQL/Nginx/HAProxy | 50 μs | ? | ? | ? |
Network within same region [6]
|
250 μs | 100 MiB/s | 10 ms | 10s |
{MySQL, Memcached, Redis, ..} Query | 500 μs | ? | ? | ? |
Random HDD Read (8 KiB) | 10 ms | 0.7 MiB/s | 2 s | 30m |
Network between regions [6]
|
Varies | 25 MiB/s | 40 ms | 40s |
Network NA East <-> West | 60 ms | 25 MiB/s | 40 ms | 40s |
Network EU West <-> NA East | 80 ms | 25 MiB/s | 40 ms | 40s |
Network NA West <-> Singapore | 180 ms | 25 MiB/s | 40 ms | 40s |
Network EU West <-> Singapore | 160 ms | 25 MiB/s | 40 ms | 40s |
†: "Fast serialization/deserialization" is typically a simple wire-protocol that just dumps bytes, or a very efficient environment. Typically standard serialization such as e.g. JSON will be of the slower kind. We include both here as serialization/deserialization is a very, very broad topic with extremely different performance characteristics depending on data and implementation.
You can run this with ./run
to run with the right optimization levels. You
won't get the right numbers when you're compiling in debug mode. You can help
this project by adding new suites and filling out the blanks.
Note: I'm currently porting the benchmarks over to Criterion.rs, so some are
in bench/
now. You can run those by uncommenting the relevant line in ./run
.
I am aware of some inefficiencies in this suite. I intend to improve my skills in this area, in order to ensure the numbers are the upper-bound of performance you may be able to squeeze out in production. I find it highly unlikely any of them will be more than 2-3x off, which shouldn't be a problem for most users.
Approximate numbers that should be consistent between Cloud providers.
What | Amount | $ / Month | $ / Hour |
---|---|---|---|
CPU | 1 | $10 | $0.02 |
Memory | 1 GB | $1 | |
SSD | 1 GB | $0.1 | |
Disk | 1 GB | $0.01 | |
S3, GCS, .. | 1 GB | $0.01 | |
Network | 1 GB | $0.01 |
This is sourced from a few sources. [3]
[4]
[5]
Note that compression speeds (but
generally not ratios) vary by an order of magnitude depending on the algorithm
and the level of compression (which trades speed for compression).
I typically ballpark that another x in compression ratio decreases performance by 10x. E.g. we can get a 2x ratio on English Wikipedia at ~200 MiB/s, and 3x at ~20MiB/s, and 4x at 1MB/s.
What | Compression Ratio |
---|---|
HTML | 2-3x |
English | 2-4x |
Source Code | 2-4x |
Executables | 2-3x |
RPC | 5-10x |
SSL | -2% [10]
|
c * 10^e
. Your goal is to
get within an order of magnitude right--that's just e
. c
matters a lot
less. Only worrying about single-digit coefficients and exponents makes it
much easier on a napkin (not to speak of all the zeros you avoid writing).[1]
: https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/
[2]
: https://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
[3]
: https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf
[4]
: https://github.com/google/snappy
[5]
: https://quixdb.github.io/squash-benchmark/
[6]
: https://dl.acm.org/doi/10.1145/1879141.1879143
[7]
: https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics#Seek_times_&_characteristics
[8]
: https://github.com/simdjson/simdjson#performance-results
[9]
: https://github.com/protocolbuffers/protobuf/blob/master/docs/performance.md
[10]
: https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
[11]
: https://github.com/inikep/lzbench
toplev
to find the bottlenecks. This is particularly
useful for the benchmarking suite we have here, to ensure the programs are
correctly written (I have not taken them through this yet, but plan to).toplev
.此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。