Notes on "Netty in Action"

posted at February 9, 2018 with tags java, networking, programming

Those who had priviledge to read my frustration chronicles on intra-microservice communication would easily recall me pointing my finger to Java Platform SE guys for not shipping a proper HTTP client. There my fury went to an extent calling it as one of the closest candidates for the billion dollar mistake. Unfortunately screaming out loud in a blog post does not give much of a relief, because it doesn’t take more than a month for me to find myself in precisely the same technical mudpot. Indeed after a couple of months later I wrote that post, I was chasing yet another performance problem in one of our aggregation services. In essence, each incoming HTTP request is served by aggregating multiple sources collected again over HTTP. This simple fairy tale architecture gets slaughtered on production by 200 Tomcat threads intertwined with Rx computation and I/O threads resting in the shades of a dozen other thread pools dedicated for so-called-asynchronous HTTP clients for aggregated remote services. And I saved the best for last: there were leaking TIME_WAIT sockets.

All of a sudden the question occurred to me like the roar of rolling boulders down a steep hill in a far distance: What is the lowest level that I can plumb a networking application in Java without dealing with protocol intricacies. Put another way, is there a foundational abstraction exposing both the lowest (channel with I/O streams) and highest (HTTP headers and body) levels that are in reach? I rode both Java OIO and NIO (that is, old- and new-I/O) horses in the past and fell off enough to learn it the hard way that they are definitely not feasible options in this case. The first attempt in the search of a cure in Google introduces you to Netty. If you dig long enough, you also stumble upon Apache Mina too. Netty is popular enough in the Java world that it is highly likely you are an indirect consumer of it, unless you are already directly using it. I was aware of its presence like dark matter in every single network application that I wrote, though I have never considered to use it directly. Checking the Netty website after dealing with crippled network applications at hand revealed an enlightenment within me: Hey! I can purpose this to implement some sort of RPC mechanism using Protocol Buffers in HTTP 2.0 request payloads! Though further investigation swipes the dust from the footsteps of giants who had followed the same path: Google (gRPC), Facebook (Nifty), Twitter (Finagle), etc. This finding while crushing my first excitement, later on left its place to the confidence of getting confirmed that I am on the right path.

I have always heard good things about both Netty and its community. I have already been sneakily following the presentations and Twitter updates of Norman Maurer, the Netty shepherd as of date. Though what triggered me for diving deep with Netty has become the following tweet:

Challenge accepted! First step is done. Next: Cover to cover study. pic.twitter.com/Gnfhbi6Ko0
— Volkan Yazıcı (@yazicivo) January 19, 2018

Norman Maurer has always been kind and encouraging to new contributors. So my plan is to turn this into a relation with mutual benefit: I can contribute and get tutored while doing that so.

Netty in Action

The book (2016 press date) is definitely a must read for anyone planning to use Netty. It lays out Netty fundamentals like channels, handlers, encoders, etc. in detail. That being said, I have got the impression that the content is mostly curated for beginners. For instance, dozens of pages (and an appendix) are spent (wasted?) for a Maven crash course, not to mention the space wasted by Maven command ouputs shared. This felt a little bit disappointing considering the existing audience of Netty in general. Who would really read a book about Netty? You have probably had your time with OIO/NIO primitives or client/server frameworks in the market. You certainly don’t want to use yet another library that promises to make all your problems disappear. So I don’t think you can be qualified as a novice in this battle anymore, and you are indeed in the search of a scalpel rather than a swiss army knife. Nevertheless, I still think the book eventually managed to succeed in finding a balance between going too deep and just scratching the surface.

Things that are well done

I really enjoyed the presented historical perspective on the development of Java platforms’ networking facilities and Netty itself. Found it quite valuable and wanted to read more and more!
Emphasis on ByteBuf was really handy. Later on I learnt that there are people using Netty just for its sound ByteBuf implementation.
Almost every single conscious decision within the shared code snippets are explained in detail. While this felt like quite some noise in the beginning, later on it turned out be really helpful – especially while manually updating ByteBuf reference counts.
Presented case studies were quite interesting to read and inspiring too.

Things that could have been improved

I had big hopes to read about how to implement an HTTP client with connection pool support. I particularly find this feature inevitable in a networking application and often not consumed wisely. Though there wasn’t a single section mentioning about connection pooling of any sort.
As someone who had studied Norman Maurer’s presentations, I was expecting to see waaaay more practical tips about GC considerations, updating socket options (TCP_NO_DELAY, SO_SNDBUF, SO_RCVBUF, SO_BACKLOG, etc.), mitigating TIME_WAIT socket problems, and Netty best practices. Maybe adding this content would have doubled the size of the book, though I still think a book on Netty is incomplete without such practical tips.
Many inbound requests trigger multiple I/O operations in a typical network application. It is crucial to not let these operatins block a running thread, which Netty is well aware of and hence ships a fully-fledged EventExecutor abstraction. This crucial detail is mentioned in many places within the book, though none gave a concrete example. Such a common thing could have been demonstrated by an example.

Notes

I always take notes while reading a book. Let it be a grammar mistake, code typo, incorrect or ambiguous information, thought provoking know-how, practical tip, etc. You name it. Here I will share them in page order. I will further classify my notes in 3 groups: mistakes, improvements, questions, and other.

[p19, Listing 2.1] Why did we use ctx.writeAndFlush(Unpooled.EMPTY_BUFFER) rather than just calling ctx.flush()?
[p21, Listing 2.2] Typo in throws Exceptio3n.
[p49, Section 4.3.1] The listed items
- A new Channel was accepted and is ready.
- A Channel connection …
are an identical repetition of Table 4.3.
[p60] CompositeByteBuf has the following remark:

Note that Netty optimizes socket I/O operations that employ CompositeByteBuf, eliminating whenever possible the performance and memory usage penalties that are incurred with JDK’s buffer implementation. This optimization takes place in Netty’s core code and is therefore not exposed, but you should be aware of its impact.

Interesting. Good to know. I should be aware of its impact. But how can I measure and relate this impact? Maybe I am just nitpicking, tough I would love to hear a little bit more.
[p77, Table 6.3] channelWritabilityChanged() method of ChannelInboundHandler… How come an inbound channel can have a writability notion? I would have expected an inbound channel to be just readable.
[p78, Section 6.1.4] Starts with some really intriguing paragraph:

A powerful capability of ChannelOutboundHandler is to defer an operation or event on demand, which allows for sophisticated approaches to request handling. If writing to the remote peer is suspended, for example, you can defer flush operations and resume them later.

Though it ends here. No more explanations, not even a single example, etc. A total mystery.
[p79, Table 6.4] read() method of a ChannelOutboundHandler… Similar to ChannelInboundHandler#channelWritabilityChanged(), how come an outbound channel can have a read method? What are we reading that is supposed to be already originating from us and destined to a remote peer?
[p79, Section 6.1.4] It goes as follows:

ChannelPromise vs. ChannelFuture Most of the methods in ChannelOutboutHandler take a ChannelPromise argument to be notified when the operation completes. ChannelPromise is a subinterface of ChannelFuture that defines the writable methods, such as setSuccess() or setFailure(), thus making ChannelFuture immutable.

Ok, but why? I know the difference between a Future and a Promise, though I still cannot see the necessity for outbound handlers to employ Promise instead of a Future.
[p84, Listing 6.5] While adding handlers to a pipeline, what happens in the case of a name conflict?
[p84] A remark is dropped on the ChannelHandler execution and blocking subject. Just in time! Though it misses a demonstration.
[p86, Listing 6.9] Again a read() method for the outbound operations of a ChannelPipeline. I am really puzzled on the notion of reading from an outbound channel.
[p94, Listing 6.13] What happens when a ChannelFuture completes before adding a listener to it?
[p95, Section 6.5] Last paragraph goes like this:

The next chapter will focus on Netty’s codec abstraction, which makes writing protocol encoders and decoders much easier than using the underlying ChannelHandler implementations directly.

Though next chapter focuses on EventLoop and threading model.
[p102, Listing 7.3] Speaking of scheduling Runnables to a channel’s event loop, what if channel gets closed before triggering the scheduled tasks?
[p103] Page starts with the following last paragraph:

These examples illustrate the performance gain that can be achieved by taking advantage of Netty’s scheduling capabilities.

Really? Netty’s scheduling capabilities are shown by using each function in isolation. Though I still don’t have a clue on how these capabilities can be purposed for a performance gain. This is a common problem throughout the book: The innocent flashy statement hangs in the air, waiting for a demonstration that shares some insight distilled by experience.
[p104, Figure 7.4] The caption of figure is as follows:

EventLoop allocation for non-blocking transports (such as NIO and AIO)

AIO? Looks like a typo.
[p107] Chapter starts with the following opening paragraph:

Having studied ChannelPipelines, ChannelHandlers, and codec classes in depth, …

Nope. Nothing has been mentioned so far about codec classes.
[p112] It is explained that, in the context of Bootstrap, bind() and connect() can throw IllegalStateException if some combination of group(), channel(), channelHandler(), and/or handler() method calls is missing. Similarly, calling attr() after bind() has no effect. I personally find such abstractions poorly designed. I would rather have used the staged builder pattern and avoid such intricacies at compile-time.
[p117, Listing 8.6] The 2nd argument to Bootstrap#group() looks like a typo.
[p120] Check this end of chapter summary out:

In this chapter you learned how to bootstrap Netty server and client applications, including those that use connectionless protocols. We covered a number of special cases, including bootstrapping client channels in server applications and using a ChannelInitializer to handle the installation of multiple ChannelHandlers during bootstrapping. You saw how to specify configuration options on channels and how to attach information to a channel using attributes. Finally, you learned how to shut down an application gracefully to release all resources in an orderly fashion.

In the next chapter we’ll examine the tools Netty provides to help you test your ChannelHandler implementations.

I have always found such summaries useless, since it is a repetition of the chapter introduction, and hence a waste of space. Rather just give crucial take aways, preferably in a digestible at a glimpse form. For instance, use EventLoopGroup.shutdownGracefully(), etc.
[p121] I suppose Unit Testing chapter used to come after Codecs in previous prints and the authors have moved it to an earlier stage to establish a certain coherence in the introductory chapters. Though, reading Codecs reveals that there is close to 70% overlap in content, which feels like a poorly structured flow. I see the value in authors’ attempt, though there is quite some room for improvement via tuning the break down of chapters.
[p124, Section 9.2.1] ByteToMessageDecoder is used before explained. (See my remark above.)
[p127] The following bullets
Here are the steps executed in the code:
1. Writes negative 4-byte integers to a new ByteBuf.
2. Creates an EmbeddedChannel …
is a repetition of the descriptions available in Listing 9.4.
[p138, Listing 10.3] Comma missing after Integer msg.
[p141] Why do MessageToMessage{Encoder,Decoder} classes do not have an output type, but just Object? How do you ensure type safety while chaining them along a pipeline?
[p142, Listing 10.6] Comma missing after Integer msg.
[p145, Listing 10.7] Constructor of MyWebSocketFrame is named incorrectly.
[p151, Section 11.2] I think Building Netty HTTP/HTTPS applications deserves its own chapter. And a very important subject is missing: connection pooling.
[p157, Listing 11.6] While building the WebSocket pipeline, which handler addresses ping/pong frames?
[p159, Table 11.4] The first sentence in the description of WriteTimeoutHandler is identical to the one in ReadTimeoutHandler. Supposedly a copy-paste side-effect.
[p171] Check out the first paragraph:

WebSocket is an advanced network protocol that has been developed to improve the performance and responsiveness of web applications. We’ll explore Netty’s support for each of them by writing a sample application.

Each of them? Who are they?
[p177] The call to retain() is needed because after channelRead() … → The call to retain() is needed because after channelRead0() …
[p178, Table 12.1] Identical to Table 11.3.
[p181, Figure 12.3] ChunkedWriteHandler is missing.
[p183, Listing 12.4] There the shutdown of the chat server is realized via Runtime.getRuntime().addShutdownHook(). Is this a recommended practice?
[p189] Figure 14.1 presents a high-level view of the … → Figure 13.1
[p189] Listing 14.1 shows the details of this simple POJO. → Listing 13.1
[p190, Listing 13.1] received field is not used at all. Could be removed to increase clarity. Interestingly, the field is not even encoded.
[p191, Table 13.1] extendsDefaultAddressedEnvelope → extends DefaultAddressedEnvelope
[p191] Figure 14.2 shows the broadcasting of three log … → Figure 13.2
[p192] Figure 14.3 represents a high-level view of the … → Figure 13.3
[p192, Listing 13.2] A byte[] file and byte[] msg pair is encoded as follows:
```
buf.writeBytes(file);
buf.writeBytes(LogEvent.SEPARATOR);
buf.writeBytes(msg);
```
Later on each entry is read back by splitting at LogEvent.SEPARATOR. What if file contains LogEvent.SEPARATOR? I think this is a bad encoding practice. I would rather do:
```
buf.writeInt(file.length);
buf.writeBytes(file);
buf.writeInt(msg.length);
buf.writeBytes(msg);
```
[p194, Listing 13.3] Is there a constant for 255.255.255.255 broadcast address?
[p195] Figure 14.4 depicts the ChannelPipeline of the LogEventonitor … → Figure 13.4
[p196] Check this out:
The LogEventHandler prints the LogEvents in an easy-to-read format that consists of the following:
- The received timestamp in milliseconds.
Really? I did not know epoch timestamps were easy-to-read. Maybe for some definition of easy-to-read.
[p195] Now we need to install our handlers in the ChannelPipeline, as seen in figure 14.4. → Figure 13.4
[p205] Approach A, optimistic and apparently simpler (figure 15.1) → figure 14.1
[p206] Half of the page is spent for justifying Droplr’s preference of approach B (safe and complex) over approach A (optimistic and simpler). Call me an idiot, but I am not sold to these arguments that the former approach is less safe.
[p207] Type of pipelineFactory is missing.
[p210] There is a bullet for tuning JVM. This on its own could have been a really interesting chapter of this book.
[p213] Firebase is indeed implementing TCP-over-long-polling. I wonder if there exists any Java libraries that implements user-level TCP over a certain channel abstraction.
[p214] Figure 15.4 demonstrates how the Firebase long-polling … → Figure 14.4
[p215] Figure 15.5 illustrates how Netty lets Firebase respond to … → Figure 14.5
[p216] … can start as soon as byes come in off the wire. → bytes

[p217, Listing 14.3] Last parenthesis is missing:

rxBytes += buf.readableBytes(
                          tryFlush(ctx)

[p217, Listing 14.3] 70% of the intro was about implementing a control flow over long polling, though the shared code snippet is about totally something else and almost irrelevant.
[p223] In referring to figure 15.1, note that two paths … → figure 14.6
[p229] This request/execution flow is shown in figure 16.1. → figure 15.1
[p230] Figure 16.2 shows how pipelined requests are handled … → Figure 15.2
[p230] …, in the required order. See figure 16.3. → figure 15.3
[p232] That simple flow (show in figure 16.4) works… → figure 15.4
[p232] The client call is dispatched to the Swift library, … What is Swift library? Was not explained anywhere.
[p232] This is the flow shown in figure 16.5. → figure 15.5
[p234] This is a really interesting piece:

Before Nifty, many of our major Java services at Facebook used an older, custom NIO-based Thrift server implementation that works similarly to Nifty. That implementation is an older codebase that had more time to mature, but because its asynchronous I/O handling code was built from scratch, and because Nifty is built on the solid foundation of Netty’s asynchronous I/O framework, it has had many fewer problems.

One of our custom message queuing services had been built using the older framework, and it started to suffer from a kind of socket leak. A lot of connections were sitting around in CLOSE_WAIT state, meaning the server had received a notification that the client had closed the socket, but the server never reciprocated by making its own call to close the socket. This left the sockets in a kind of CLOSE_WAIT limbo.

The problem happened very slowly; across the entire pool of machines handling this service, there might be millions of requests per second, but usually only one socket on one server would enter this state in an hour. It wasn’t an urgent issue because it took a long time before a server needed a restart at that rate, but it also complicated tracking down the cause. Extensive digging through the code didn’t help much either: initially several places looked suspicious, but everything ultimately checked out and we didn’t locate the problem.
[p238] Figure 16.6 shows the relationship between … → figure 15.6
[p239, Listing 15.2] All presented Scala code in this chapter is over-complicated and the complexity does not serve any purpose except wasting space and increasing cognitive load. For instance, why does ChannelConnector extend (SocketAddress => Future[Transport[In, Out]]) rather than just being a simple method?
[p239] This factory is provided a ChannelPipelineFactory, which is … What is this factory?

Conclusion

In summary, Netty in Action is a book that I would recommend to everyone who wants to learn more about Netty to use it in their applications. Almost the entire set of fundamental Netty abstractions are covered in detail. The content is a bliss for novice users in networking domain. Though this in return might make the book uninteresting for people who already got their hands pretty dirty with networking facilities available in Java Platform. That being said, the presented historical perspective and shared case studies are still pretty attractive even for the most advanced users.

I don’t know much about the 2^nd author of the book, Marvin Allen Wolfthal. Though, the 1^st author, Norman Maurer, is a pretty known figure in the F/OSS ecosystem. If he manages to transfer more juice from his experience and presentations to the book, I will definitely buy the 2^nd print of the book too!