Akka Content-based Substreaming

In a previous blog post, a simple text mining application was developed using Akka Streams. In that application, the graph-building create() DSL method was used to build the stream topology that consists of the routing logic of the various stream components.

This time, we’re going to try something a little different. Rather than using the create() graph-builder, we’ll directly use some fan-out/fan-in functions to process stream data. In particular, we’ll process a messaging stream and dynamically demultiplex the stream into substreams based on the group the messages belong to.

For illustration purpose, the messaging element in the stream source is modeled as a case class with group and content info. The substreams dynamically split by message group will go into individual file sinks, each of which is wrapped along with the file name in another case class.

Next, we create a map with message group as key and the corresponding sink (which has the actual file path) as value:

Demultiplexing via Akka Streams groupBy()

We then use Akka Streams groupBy() to split the stream by message group into substreams:

Note that after applying groupBy() (see method signature), the split substreams are processed in parallel for each message group using mapAsync() which transforms the stream by applying a specified function to each of the elements. Since our goal is to create the individual files, the final ‘mergeSubstreams’ is just for combining the substreams to be executed with an unused sink.

Putting all the above pieces altogether:

Merging stream sources with flatMapConcat

Conversely, given the split files, we can merge them back into a single stream using flatMapConcat. The word ‘merge’ is being used loosely here, as flatMapConcat (see method signature) actually consumes the individual sources one after another by flattening the source stream elements using concatenation.

In case the files are large and processing them in measured chunks for memory reservation is preferred:

Line-feed (i.e. “\n”) is being used as the delimiter for each frame here, but it can be set to anything else.

2 thoughts on “Akka Content-based Substreaming

  1. bill

    I high appreciate this post. It’s hard to find the good from the bad sometimes, but I think you’ve nailed it! would you mind updating your blog with more information?

    Reply
  2. Pingback: Merging Akka Streams With MergeLatest | Genuine Blog

Leave a Reply

Your email address will not be published. Required fields are marked *