Protobuf vs JSON: The Compression Test That Changed My Mind

4 minute read

A few days back I was debating with a friend about REST vs gRPC in the context of client-server communication over the Internet. Where I was in favour of gRPC. During our debate, I jumped quickly to say protobuf are more optimised than JSON because they omit unwanted data. To which my friend replied that we don’t transfer plain JSON over the network anymore we always gzip them before sending them. He explained further, that all the optimisation that gRPC does is lost when we gzip the data. I couldn’t continue to argue my point because I didn’t have enough data to prove my point. Thus I decided to run an experiment to compare the size after gziping JSON vs Protobuf.

Since I was in the process of learning go language from scratch, I decided to use it for this experiment.

Code

var UserCounts = [...]int{1, 10, 100, 1000, 10000, 100000, 1000000}

func main() {
    names := readUsername(usernamesFile)

    usersList := UsersProto{
        Users: []*UserProto{},
    }

    writer := tabwriter.NewWriter(os.Stdout, 0, 0, 1, ' ', tabwriter.Debug)
    fmt.Fprintln(writer, "Users\tJSON Size\tGzipped JSON Size\tJSON Gzip size % \tProto Size\tGzipped Proto Size\tProto Gzip size %\tGzip Diff (JSON - Proto)")
    for _, num := range UserCounts {
        for i := 0; i < num; i++ {

            name := names[rand.IntN(len(names))]
            user := UserProto{
                Name:  name,
                Age:   rand.Int32N(91) + 10,
                Email: fmt.Sprintf("%s@gmail.com", name),
            }

            usersList.Users = append(usersList.Users, &user)
        }

        jsonData, _ := json.Marshal(usersList)
        protData, _ := proto.Marshal(&usersList)
        gzippedProtoSize := gzipDataAndReturnSize(protData)
        gzippedJsonSize := gzipDataAndReturnSize(jsonData)

        jsonReduction := float64(gzippedJsonSize) / float64(len(jsonData)) * 100
        protoReduction := float64(gzippedProtoSize) / float64(len(protData)) * 100
        diff := gzippedJsonSize - gzippedProtoSize

        fmt.Fprintf(writer, "%d\t%s\t%s\t%.0f%%\t%s\t%s\t%.0f%%\t%s\n",
            num,
            humanReadableSize(len(jsonData)),
            humanReadableSize(gzippedJsonSize),
            jsonReduction,
            humanReadableSize(len(protData)),
            humanReadableSize(gzippedProtoSize),
            protoReduction,
            humanReadableSize(diff),
        )
    }
    writer.Flush()
}

// gzipDataAndReturnSize gzips the input data and return the len of the data
func gzipDataAndReturnSize(data []byte) int {
    var buf bytes.Buffer
    gw := gzip.NewWriter(&buf)
    gw.Write(data)
    gw.Close()

    return buf.Len()
}

// humanReadableSize returns a human-readable size string.
// e.g. 1024 -> 1 KB
// e.g. 1048576 -> 1 MB
func humanReadableSize(bytes int) string {
}

// readUsername returns a array of dummy username
func readUsername(fileName string) []string {
}

I have intentionally kept only the important parts of the code and removed some not so important code. The full code can be found here.

In the above program, we are creating a list of user objects with fields name, age and email address. We pick a user name at random and use that in the name and email address fields. Once we have the data ready, we marshal them into JSON and proto, and then gzip them and print out the result.

We do this from 1 to 1000000 times with multiples of 10.

Result

Users	JSON Size	Gzipped JSON Size	JSON Gzip size %	Proto Size	Gzipped Proto Size	Proto Gzip size %	Gzip Diff (JSON - Proto)
1	75B	85B	113%	40B	55B	138%	30B
10	673B	232B	34%	398B	216B	54%	16B
100	6.3KB	1.4KB	22%	3.7KB	1.4KB	37%	12B
1000	62.5KB	11.8KB	19%	36.5KB	12.3KB	34%	542B
10000	624.9KB	115.7KB	19%	364.3KB	121.7KB	33%	-6.0KB
100000	6.1MB	1.1MB	18%	3.6MB	1.2MB	33%	-60.7KB
1000000	61.1MB	11.3MB	18%	35.7MB	11.9MB	33%	-603.8KB

Observation

When we have really small data, the gzipped size increases instead of decreasing. This is because, when data is gzipped it adds some additional metadata which will be used to decompress it. Here these metadata cause the size of the data to increase instead of decrease because the size of the data in itself is less.
JSON compression is far more efficient than proto as the compressed size ratio is consistently better than protos gzipped data. EG: for 10 users the JSON size is 637B and gzipped size is 226B which is 35% of the original data. But in the same place for proto the gziped size is 37% of the original data.
As the size of the data increases, the gziped sized of both proto and JSON remains consistent with minor difference.

The experiment was conducted for a list of Users. Which will have keys like “Name”, “Age” & “Email” repeated the same number of times as users. This must be making JSON more efficient. The real-world data will be different with limited key repetition thus the results could vary as well.

Conclusion

When we talk about REST vs gRPC in the context of client-server communication, where the client could be a mobile device which could face latency issues. The payload size advantage of gRPC making it faster doesn’t hold true because in today’s day and age gzip has become standard when sending data over the internet. However, there would be several other advantages that gRPC would provide over REST which is a topic for another day.

Share on

Twitter Facebook LinkedIn

Saran

Protobuf vs JSON: The Compression Test That Changed My Mind

Code

Result

Observation

Conclusion

Share on

You May Also Enjoy

Things to keep in mind before chossing gRPC for you mobile app

How we implemented a better approach than certificate transparency and pinning

Why nested LazyColumn is not allowed in compose and some solutions you should try

How I’m saving my 7-10 hours of productivity with a simple trick