Spray and Akka HTTP
We're living in interesting times since Spray is no longer actively developed while Akka HTTP is not ready yet for prime time. It's a sort of limbo for people who need to decide what to pick for their projects.
Whenever I'm asked what to pick as a Scala toolkit to build HTTP/JSON microservices I recommend Spray and spray-json while following the development of Akka HTTP closely. I believe it's going to pay off pretty soon.
Start your journey with the official documentation of Akka Streams and HTTP 1.0-M2. You may be pleasantly surprised how useful it can be (yet there's so much to be missing in the docs).
In akka-http-sandbox project of mine you can find a complete Akka HTTP application.
Notes from many places
From [akka-user] Re: Feedback on Akka Streams 1.0-M2 Documentation:
Akka Streams is not about data structures, it is about streams. I mean real streams, like live TCP incoming bytes.
In Akka Streams on the other hand streams backed by collections is the rare case, not the common one (although many examples use collections as sources since they are easy to show).
In the same thread:
it always looks like
source.via(flow).to(sink)
For now, as the default all stream processing element is backed by an actor.
mySource.map().map().to(mySink)
= in the most common case there will be 4 actors each backing one of the parts (1 for the source, 2 for the 2 maps and 1 for the sinks)also
mapAsync
andmapAsyncUnordered
can be used that way too. In fact, since those work on Futures, you can combine them with the ask pattern and a routed pool of worker actors.
TODO Create a demo with mapAsync*
and a routed pool of worker actors
The main advantage of akka-streams is therefore that it has dramatically lowered the barrier of entry for writing Actor code with flow control.
TODO Code review of ActorBasedFlowMaterializer
The ActorBasedFlowMaterializer has an Optimizations parameter which is currently not documented. It will eagerly collapse entities into synchronous ones as much as possible, but currently there is no API to add boundaries to this collapse procedure (e.g. you have two map stages that you do want to keep conccurrent and pipelined). Also it cannot collapse currently graph elements, stream-of-stream elements, mapAsync and the timed elements.
A
mapAsync
/mapAsyncUnordered
will create these futures in a bounded number at any time and emit their result once they are completed one-by-one once the downstream is able to consume them. Once this happened a newFuture
is created. So at any given time there are a bounded number of uncompleted futures. In other words Future completion is the backpressure signal to the upstream.
mapAsync
waits for the result of theFuture
s and emits those (i.e. it is a flattening operation), and only keeps a limited number of openFuture
s at any given time.Internals might or might not be actors. This is all depending on what materializer is used (currently there is only one kind), and even currently we have elements not backed by an
Actor
(CompletedSource
and friends for example).
From the official documentation of Akka HTTP 1.0-M2:
- consuming services is to handle streaming data
- data stream is a necessity since it can be too large to be handled as a whole.
- actors handle a series of messages (message stream)
- taking care to not overflow any buffers or mailboxes
- actor messages can be lost and must be retransmitted
- a solution = Akka Streams - an intuitive and safe way to formulate stream processing setups
- limit the buffering
- back-pressure = be able to slow down producers if the consumers cannot keep up
- Akka Streams interoperate seamlessly with all other Reactive Streams implementations
- Reactive Streams API defines SPI
- Akka Streams an implementation of Streams API
- Stream processing is a different paradigm to the Actor Model or to Future composition
- A typical use case for stream processing is consuming a live stream of data
- extraction
- aggregation
- non-blocking streaming solutions
- internal backpressure signals to control a case where a subscriber is too slow to consume the live stream of data.
From A Gentle Introduction To Akka Streams:
- stream-based programming = a paradigm
- good ‘ol
|
operator in Unix - streams conceptual model to processing pipelines = functional w/ input and output and no or very little side-effects
- you can make these functions asynchronous and parallelize them over input data to maximize throughput and scale
- stream = data flows through functions
- Http server = the Request-Response model to map Request to a Response
- a dataflow
- (my thought) the idea is to build (the first step) a pipeline with functions that operate on a data stream and then run it (a separate step) on a thread pool/executors -- similar to what Apache Spark offers.
- Akka-Streams provide a higher-level abstraction over Akka’s existing actor model.
- The Actor model provides an excellent primitive for writing concurrent, scalable software, but it still is a primitive
- Can we use Actors as functions and actor messages as input/output params of a function?
- Can we treat Actor Messages as Inputs and Outputs to Functions, with type safety? That's exactly Akka-Streams!
- http://www.typesafe.com/activator/template/akka-stream-scala recommended as an in-depth tutorial on Akka-Streams
- TODO What's FlowMaterializer? A FlowMaterializer is required to actually run a Flow.
- A Source is something which produces exactly one output. If you need something that generates data, you need a Source. Our source above is produced from the connection.consume function.
- A Sink is something with exactly one input. A Sink is the final stage of a Stream process. The .foreach call is a Sink which writes the input (_) to the console via println.
- A Flow is something with exactly one input and one output. It allows data to flow through a function: like calling map which also returns an element on a collection. The map call above is a Flow: it consumes a Delivery message and outputs a String.
- In order to actually run something using Akka-Streams you must have both a Source and Sink attached to the same pipeline. This allows you to create a RunnableFlow and begin processing the stream. Just as you can compose functions and classes, you can compose streams to build up richer functionality.
- abstraction allowing you to build your processing logic independently of its execution.
- foreach acts as both a Sink and a run() call to run the flow.
- Is there a way to separate the definition of the stream with the running of the stream? Yes, by simply removing the foreach call. foreach is just syntactical sugar for a map with a run() call.
- By explicitly setting a Sink without a call to run() we can construct our stream blueprint producing a new object of type RunnableFlow. Intuitively, it’s a Flow which can be run().
- Q What would happen without
.to(Sink.ignore) //won't start consuming until run() is called!
indef consume()
? run
to run a stream = start processing data- pipeline separated into two parts - build and run ones
- you’d want to compose your stream processing as you wire up the app, with each component, like RabbitMqConsumer, providing part of the overall stream process.
- imperative style = flow is controlled by the while loop
- https://github.com/sstone/amqp-client = an Actor based model over RabbitMq
- In general, if we can model our process as a set of streams, we achieve the same benefits we get with functional programming: clear composition on what is happening, not how it’s doing it.