Ska La

An essay about expressiveness, behavior and conciseness in code

2013-09-08T16:37:00.000+02:00

This week I came across an interesting tweet, which stated that a code is read more often than it is written.

It can only be true, as any tautology can be. It's like a book, right? Written once, read many... until errata come in the game, or new editions.

That was my point when I replied, by trolling (I admit) that Java is becoming a pain in this area... because of its verbosity.
Hopefully, Nicolas Frankel is clever enough to continue on the discussion tone rather than entering an infinite war where none can win...
So we took the decision to express our point in a cross-referenced blog where you'll find your own way of coding.

Reading a code

Before starting, I would like first to wonder why would we have to read code?

From my point of view, there are several relevant cases:

learning a new library, code express more than doc that's why it must be opened
improving it by adding new features, in that case, we need to have the big picture before starting any hacks
debugging

So what I'll claim based on these three (more? poke me!) cases, is that

the two firsts can be eased by a clean code describing a behavior using the right level of conciseness but keeping the expressiveness at its best.
the later can be a real pain when the conciseness of the code has been badly chosen

Some words about conciseness and Java

Here I'll brush quickly some additions that Java received over the last couple of versions, a lot involved conciseness.

Let's take List and the for-loop.

See? Everything is about making the code more concise without removing any logic nor features.

However, everything is not that shiny, mostly when you need to add more logic, or let say when one need to enhance the behavior. Can you read the following two examples easily?

Don't know about you, but my feeling is that the behavior is not well represented, and thus we need to touch our in-brain JVM to catch it up.

The problem is that the conciseness introduce at the language level is not flexible enough to express advanced workflow, like early termination or filtering.
Now let's look at a Java 8 version of this example:

The same code, less lines... but is it really the interesting feature? Conciseness? Really!?
I don't think so, the most important part is that the behavior can be catch by combining two simple behaviors:

filtering
mapping (transforming)
limiting (taking)

Another good fact is that using the second version, the behavior can even be easily testable by simply testing the the predicate (dereferenced from the Test class) is returning the right value.

So there is no need to test if an array will be filtered correctly, it's asserted by the library.

Also, the behavior is easily extendable since now the filtering is no more explicitly hardcoded in the behavior -- however we're free to instantiate a specific behavior introducing partially applied method.

Now, it's true that we have to agree on an API and everybody must know it. But I think it's always the case when you're creating an API, you're fixing names and concepts.
The sad fact, in Java, is that they are trying to reinvent the wheel by renaming well know behavior like limit (take), substream (drop + take) in the Collection API. That's why semantic is hard when someone is creating his own taxonomy/ontology of concepts.
Moreover, they took the opportunity to leave some noise, like the stream() call (probably for backward compatibility) which result in a collect method (which is also used to reconstruct a List using a Stream).

Conciseness Expressiveness and Scala

In this section, I won't expand myself into much details about Scala, but I'll just try to show some advantages of the expressiveness that Scala offers... for instance, even implicits (in the call-site) are explicits (in the definition-site).

So I'll just mention two (among many) features brought by Scala which is really missing in Java, both being related to expressing a behavior based on an implicitly available context.

Implicit parameter

In Scala, a method/function can have several parameter blocks, like so:

def add(a:Int)(b:Int) = a+b.

Ok, fine, but the very last parameter block can be declared as implicit, this way:

def persist(user:User)(implicit s:DBSession) = ....

What an implicit parameter block of parameters is, is simply a bunch of parameter that the compiler should be able to find within the compilation stack. And if it can find, in a deterministic way, such parameters they'll be automatically provided at the call-site. An example?

What has just happened there? It's rather simple, when the compiler will fall on the persist call, it'll see that the last block is implicit and not provided so it'll search in the actual scope if there is an object matching the required type. In this case, it finds session.

Again, I don't know about you, but I prefer this at what Hibernate (for instance) does... which is declaring by injection by something using an annotation or what not a session somewhere. And if something goes wrong? I hope you have integration tests, because it'll be check @runtime.
In this case, the compiler will blow out if it cannot find a session object, that's it!

So it's explicit at one single place, and will be implicit at all call sites. And I think it's cool/powerful to have such duality.

For-comprehension

A for-comprehension in Scala is very similar to a for loop in Java when dealing with sequences, however it's more than this. But beofre going futher, here is what is possible using lists:

For those of you having already tried to use the Future API of Java should just find the following really pleasant because it'll be terse and straightforward to chain futures, without pain but with some implicit meanings...

What's going on there? We fetched a user in a future, when it has been fetched we fetched each friend, one by one, within a sequence then we yielded both results in a tuple.
The resulting fut variable is yet another Future that will hold this later tuple if all the fetches successfully returned, otherwise everything fails!
Afterwards, it's still allowed to adapt the contained tuple as a new tuple... Note that we're not dealing with the value yet, we're just describing what has to be done when the result will happen... or not.

I'm not enough courageous to write the code using the Future API in Java, it would be too painful for me, too many brain gymnastic for nothing... and I didn't even talked about the number of bugs It'd be prone of.

Oh yes, one last thing on this for-comprehension, fetch should have an implicit parameter, the storage session/metadata access and the Futures need an ExecutionContext instance to be executable... but it's not the role of this piece of code to create or even pass them!

Flaws of conciseness in Scala

Mainly the flaws are raising when the code tries to be concise at a such level the the expressiveness itself is penalized.

For instance, sometimes, I like to tweet implementation in 140 characters like this one; this is just fun to do... I won't expect to have such code dropped as is in a project unless there are a lot of reasons (I can't even image a single one).

Why? Because it breaks all good conventions that are becoming the de facto best practices in Scala. For those interested in, there is a very good keynote by Martin Odersky on that.

The most important is probably to track the status of the types chain by decomposing a multicalls line in several ones, the very next one being to not overuse the wildcard underscore for inline function.

So the code in the tweet can be migrated this way from:

To:

What you only need to understand are these 5 concepts:

map
flatMap
groupBy
mapValues
flatten

Which one couldn't you guess correctly? Make a guess than look at these rough and limited explained behaviors:

List<A> => List<B> OR Map<K,V> => Map<L,W>
List<List<A>> => List<A>
List<A> => Map<K,List<A>>
Map<k,V> => Map<K,W>
List<List<A>> => List<A> OR List<Map<K,V>> => Map<K,V>

I kept it cryptic on purpose because it has been explained so many times on the web!!

But also not that the readable code (that could even be more readable by still using non-verbose tools) is, with comments, less than 2 times taller. But it would require hundreds of line of Java, which I don't even have the energy to write... Or you need to convince me that I should...

Last note: just don't override, nor create operators that haven't a widely known semantic. Haskell is a bad example, even if this language is awesome!

Wrap up

I agree that conciseness is most of time relying on the fact that a common language must be shared by the coders, readers and writers. But isn't it the case for all APIs we're using?

Also, I tried to state that conciseness can really quickly be a pain when it shoots features out of the languages rather than introduce new concepts with more flexibility.

Conciseness is great for lazy man with a ROM-like brain, so in Scala, take care to use it carefully and when it's trivial or, maybe, not part of the behavior (like in debug prints).

Expressiveness that enables a code to only rely on a behavior is what verbose languages are missing the most... even if versions after versions they (Java) sneaks into the language with new embryos of solution.

And I didn't even talked about POJO or immutability.

Hope you enjoyed the read and you have a plenty of concerns/remarks on it.

Thanks again to Nicolas to bootstrap this metaphilosophistically rumination on what an readable code should present. Don't forget, if you didn't yet to read his own rumination here.

mid-2013, my Spark Odyssey... Rolling out Streams in the Spray River

2013-08-19T00:53:00.000+02:00

In this post, I'm going to relate the last bits of the project using Spark streaming that started discussing here and continued here.

Up to now, the project hosted on github, has been updated with a JavaScript web application that can show the streams that were created in the first two posts (see above). This web application is not the purpose of this post, it's purpose is about the way I was able to combine streams and present the results in a Spray server.

Goal

So what we will see here is how Spark Streaming enables two streams to be combine when they are, somehow, containing data about the same topics.

Then we'll see how it is possible to package to amount of data that is coming over the web using an aggregation method like a windowed summarization.

Since a data is only interesting when it can be consumed by yet another analytic tool, it's pretty helpful to set up a server that enables some access to them -- to be consume by an OLAP application for instance; rather than reflushing everything in HDFS.
For that, we'll use the Spray server that Spark helps us to create, and we'll add it routes that renders the aggregated value (RDD) as Json document to be sent over the wire.

Rollin' to the river

In the first blog, it was configured a DStream that carried on data for each tweet matching a list of stocks ; these data were associated with a sentiment score and packaged in a dedicated structure named TwitterData.

In the next blog, we created yet another DStream that was responsible to poll the Yahoo! finance API to fetch stock changes in real time, for that another structure was created: YahooData that contains information about the actual changes of a given stock.

With that in hand, we can easily figure out that both DStreams are continuously giving information about the same topic: a stock. That's why we can setup a parent trait that define only one thing, that is a stock.

Here are the structures and the related streams:

The asInstanceOf[Data] are required because DStream is not covariant (probably because it is mutable).

Thus, what we could do now it to merge both of them into a single one! I can't wait more, let's do it:

What? You feel a bit frustrated because it's so simple? Completely understood, I was astonished as well how easy we can combine streams of packets where the time was part of the problem without having to do anything...

Cool, so now we have access to a full river of data coming in real time for a bunch of given stocks, but in real life (and with even more streams) it can be problematic to keep track of everything, or redistribute the whole thing. Let's see what we could do.

Compaction (aggregation)

A common workaround of the problem mention above (the capacity one) is to compute aggregated value for a window of time that slides on the stream.

To give a simple example, if we have a stream of integers that arrive at a very high rate, what can be done is to compute several stats on them on a regular basis. That is to say, we can choose to compute the mean and standard deviation of the integers coming the last 30 seconds, and do that every 5s.

This would reduce the need to store only doubles every 5s, rather than, maybe thousands of doubles. Having overlapping sliding window being helpful when applying regression for instance (more component of the variance is kept).

So here, what we are going to do is to compute an aggregation value which an overall score for each stocks present in the stream.

This score will simply be the sum over the sentiment (as -1 or 1) for Twitter part, itself added to the sum of the Yahoo changes (converted to -1 or 1) -- I'm not a financial guy so I've chosen something easy and straightforward to compute... it's not related to the technologies. Let's see how we could achieve that using the Spark API for DStream.

Nah! Okay, I'll start by the trick map(x => (x.stock, List(x))). Actually, this is rather convenient for what will follow, that is we created a pair composed of a key being the stock and a value being the singleton of the initial data. This singleton thing will become really helpful when a "reduce phase" will happen.

Exactly, an aggregation over a sliding window being a classical use case, the Spark API has already integrated it for (thankfully should I say). So that, we only have to call the reduceByKeyAndWindow method with the reduce function (List.appendAll) and the window's duration, 60 seconds. Note that the sliding time wasn't given, I've chosen to keep the original slicing time from the DStream creation, 5 seconds.

I have to warn something, nevertheless, this function is only available here because we had adapted to something that looks like a KVP, that's why we can reduce byKey as well.

Since the key is the stock, we now have the list of all Data for a 60 seconds window grouped by stock.

Now, recalling what was said earlier, we only want to keep track of a score, not every Data. For that, we mapped the values only to something smaller, the overall score and the number of tweets and yahoo information that participated to this score.

Tada! We're done if the Spark thing... yeah actually, for this post at least (next will come the cluster deployment... but that's for the future!).

True, we're now collecting real time information about a list of stock associated with a score and the sources participation.

However, they're remain in memory... that's not really helpful right? Actually, once would request these data to perform their own analysis on them, so we need to open them to the outside world.

Delivering

Mmmmh, this part is a bit tricky because looking at the DStream API you don't have that much choice for that. We can persist (to memory, ...), we can save to an object file or a text file, but redistribute these persisted data would be costy for me, and I'm lazy.

But I like to read code, so I read the Spark code again, I felt again on the AkkaUtils class used in my previous post to create the actor system. And, in fact, this helper is able to create a Spray server as well -- you know this super power server thing that is able to deliver resources at an insane rate thanks to Akka!? If no, open this link in a new tab and read it afterwards (or now), you'll love it!

Given that, I thought to my self that it'd open some door to laziness, which I liked a lot.

The idea is to start such Spray server that would deliver an actor holding a representation of the aggregated information grouped by stock, with the value being the list of these aggregated data.

WARN: This is certainly not the most efficient way to do this because the state of the actor grows monotonically, but it sufficient for now.

To do so, I've reused the trick from the previous post, to create a class in the spark package that starts the Spray server using AkkaUtils. Afterwards, the route are declared to render the data accumulated in the following actor:

This actor will be simply populated by the DStream itself using the transform function:

I wanted to show it to you because of the FIXME, which is still not fixed. Actually, the function that is given to transform will be part of the lineage of the RDDs, so it has to be serializable. And if I simply reuse an actor as a closure, Akka will tell me (actually how Spray uses the Akka serialization) that it's not possible to make the actor serializable since it doesn't have any actor system in the context... (when Spray calls the serialization).

To counter that, I took the hard way, which is simply "recreating" actor system and actor within each RDD's transform call! And it's very bad. Sorry about that...

If you want more information about this service, just check out this little class here, we're you'll find the JSON serialization as well for the "REST" operations.

HTMLization

Since reading JSON on the fly is not my passion neither part of my resume (which should be somewhere there! or maybe there? Arf, I dunno, anyway...), I've created a very lightweight and not really user friendly interface that shows the results in real-time.

It's all located in the web folder and is intensively using Bootstrap from Twitter, AngularJs from Google and D3.js from Mike.

To start it, you'll need to start the whole thing as described in the README, afterwards you'll be show this kind of stuffs:

My fairly UI

What should come next...

... When I'll have some time.

The next logical step, for me at least, from here is now to deploy everything on a cluster. Because, for now, back in the first post, the context was created locally, so everything to going fine... maybe due to this fact.

So that'll be the next thing in this series, deploy on Amazon or on a local Vagrantized cluster of VMs ^^.

If any is likely to help, give me pointer or make any comments, Please do!

Hope you like the read, if only someone reached this point without getting bored. If you do, a small tweet to poke me would be awesome (@noootsab).

mid-2013, my Spark Odyssey... when Akka sparks Yahoo! finance

2013-08-16T00:44:00.002+02:00

This entry is the second of a series that relates some work/test I'm doing using Spark. All information can be found in the very first post of this series.

In the previous post, I've explain how easy it is to consume a Twitter stream using Spark, and seamlessly easy how to attach a sentiment score to each tweet passing a given condition (read: filtered).

In this one, I'm going to tackle another fold of my test, which is fetching the current stock information for a given list of companies.
These information will be provided by the Yahoo finance API...

...well, hum, not exactly an API. Actually, Thanks to @kennyhelsens who shown me this "hack", we'll use a Yahoo! URL that produces a CSV file containing real-time (mostly) information about companies' stock changes. Information are really hard to grasp for this URL's parameter however you can find some out there.

Storyline

In this series, we're trying to uses real-time streams rather that batch processing to give a real-time fluctuation of the health of given companies.

But we've just seen that we're going to consume CSVs that give a snapshot of the stocks changes.

A common solution to transform a static API to a stream is to poll it and to create a streams by appending the results one after the other.

Here, we'll try a pretty simple solution for that, we'll delegate to Akka the task to consume the CSV, to parse it, to split data by company, and to modelize each and finally to push everything to a DStream.

It's rather straightforward and we'll see that it's even simpler that we would expect since, Spark is already playing a good game with Akka by integrating out of the box an API to "consume" actor's messages.

Acting

In this scene, there will be several actors each playing its own roles (sorry Peter Seller's fans -- like me btw).

Here they are:

scheduler: this one is provided by Akka out-of-the-box and is able to send a "given message" to a given actor at a given interval.
feeder: this actor will simply fetch the CSV and produce the data to be sent to the below actor.
receiver: this one will receive a message whenever a data is available for a given company.
parentReceiver: this one will be created by Spark itself using the StreamingContext and is able to push the received Data to the right RDD in its holding DStream (see ActorReceiver).

Now the scenario:

Sequence from the scheduler to the DStream (current RDD)

Does that make sense? Let's have a short overview on the implementation.

Scheduling the poller

First we have to create a scheduler that will poll every 500ms the server... well using Akka and the separation of concerns and Message Passing Style, we'll just create a scheduled action that passes a message to a given actor:

But even before that, let's prepare the Akka field:

There are a plenty way of creating ActorSystem with Akka, but Spark is preparing some stuffs for us to create one that will respect some of its own internal conventions. This helper is available in the `spark` package, naming AkkaUtils; however it's package protected... That's why I've created my own utils to create the ActorSystem under the spark package -- I'm bad sometime, but I'm so often lazy :-D.

Actually, the Spark utility for Akka will create an ActorSystem using a host and a port for us, that'll be really helpful afterwards... because everything in Spark must be serializable and thus we cannot so easily package an ActorRef (needs an ActorSystem, ...). So that, we'll use URL to create refs rather than using actors picked within a closure in a Spark action.

Recall: that's how Spark manage to recover from a failure, it's take to lineage of the RDD or rebuild everything up from the initial data (not possible in Streams...). But specially, the actions (functions) are dispatched to machine where data reside. So the lineage contains the functions, and a function might contain ActorRefs!

Now that we have an actorSystem available, let's create the scheduler (see last section, Drawbacks, for more info about problems with that methodology):

Nothing relevant to say about that, only maybe the definition of FeederActor...

Feeding Spark with Yahoo data

Here is coming the feeding phase, we won't see how the CSV is read and parsed however we'll see what going on with the messages flow.

What's interesting in this actor is two-folds:

it can receive a message containing two data:

an actorRef that it will hold in its state: this one will be the receiver
a list of stocks to follow (by building the according call to Yahoo)

when consuming the data and producing the data by stock, it sends them one by one to the above actor

Actually, this feeder has two responsibilities which are fetching the data and pushing it to Spark.

But how is this receiver constructed and passed?

Receiver -- our spark stuff

The receiver is actually an actor that will be created internally by Spark for us, for that it must respect yet another trait, Receiver. And a Receiver can only be an Actor (using self definition)!

Four things to point here:

this actor is well-defined by extending the Receiver trait, which provide the pushBlock method.
it holds a reference to actor that it, it-self, create based on a give URL -- an actor URL... this is where AkkaUtils came handy! This Actor will be the feeder actor.
in the hook preStart we tell the feeder actor that this ActorRef is the Spark end-point where data must be sent to publish in the stream.
when data arrives, we check in the cache if it's already there (see below), if not we call the Receiver#pushBlock with it.

Why do we have to cache? Actually when the stock market has closed, Yahoo will continually return the last change before closing... so, here, we just avoid telling our health checker that the mood is monotonously increasing or decreasing (or even just flat if zero).

The pushBlock method is the clue to publish to the underlying DStream managed by Spark, actually it will wrap our data into an Internal representation before telling the parent actor to use it: context.parent ! Data(y).

But what is that context.parent?

parentReceiver -- not our spark stuff

This receiver is totally managed by Spark, and so I won't dig into more details than just saying it's being created underneath when doing the following:

Fairly simple, yes, it's true! Actually, the StreamingContext we created at the very start of the project contains everything to create DStream based on Akka, all it needs is a reference to a Receiver (defined in the previous section).

What is this actorStream doing is creating an ActorReceiver (with a Supervisor and blahblah) which will itself create and manage a worker for the given Props, but also it will hold the reference to the current RDD in the DStream.

The worker will be able to send him messages of the form Data(d) using pushBlock, and which it'll push to the DStream.

Concise huh? If only the document was telling us that, I wouldn't have to dig that deeper in the code... the good side-effect is that now I understand it pretty well.

So, I hope you too!

Where are we?

Up to here, we've created a DStream of data for each tweet paired with 'its' company, but also we have now some info about what the stock change of them.

What will happen in the next blog is that we'll create a multiplexed stream for which we'll compute a value for every window of 60'.

Drawbacks of the/my methodology

At this level, I can see two problems.

The scheduler

Using a scheduler is bad because we cannot ensure that each message are processed sequentially by Spark. So it would say that we push data to an RDD that correspond to the time where the whole CSV has been processed but also the messages sent by the actors!

I put this aside for the moment since I'm not building a real-life project (no-one will use this tool to take decision on the marker ^^).
Moreover, mathematically, let say at the limit, the value will be correct: reducing an amount by 2 at t⁽¹⁾ or at t⁽²⁾ is not really problematic, it's like inverting two ints in a list that we're summing over.

The consumer

I didn't shown its current implementation because its very simple and bad, that is I'm simply opening a connection on the URL, consuming everything until the end before applying the process...

That says: it's blocking and non-reactive.
Also, That says that we loose the power of parallelism since the Akka message dispatcher will have to wait a CSV to be completely processed before handling the next one.

However it's just an implementation detail that can be easily worked around without modifying the solution.

mid-2013, my Spark Odyssey...

2013-08-12T01:29:00.001+02:00

Short intro

In this post, I'll talk about yet another technology-tester project I've worked on these last days: Spark.

Long story short, spark is a handy distributed tool that helps in manipulating Big Data from several sources -- even hot streams.

TL;DR

In the first part, I'll put the context in place.

I'll start with a gentle bunch of words about Spark and Spark-streaming.
Then I'll give some bits about the project I've realized so far, and why.

The second part will discuss some implementation details.

Setup: spark streaming
Twitter stream filtered by keywords to DStream
Tweets' sentiment analyzer
Yahoo finance CSV data and the actor consumer
Actor consumer publishing on a DStream (back-ended by a dedicated actor)
Aggregation (union) based on a 60 seconds length window
Publishing the results using Spray

What're missing so far (among other business related things for instance):

cluster deployment...
... and configuration
probably a storing strategy for further consumptions (Riak, ...?)
a web application that shows the results or configure the processes

Context

Spark

I won't discuss Spark that much because there are already a plenty of good blogs/articles on that. However, I'll give you just the intuition that you got the whole thing about it:

Spark is a collection of functions that works on a Sequence (List) of typed data -- Dataset = Sequence ; operation = Function
Chaining these functions will define a workflow for your data -- combination
Each data will be processed only once by one or another Spark worker -- Distributed
If it fails for some external reason, the data can be fetched again from the source and the workflow to be replayed entirely -- Resilient

Spark defines an ResilientDistributedDataset which is very different that the classical MR -- because it is distributed by essence, a workflow can easily be iterative and doesn't require intermediate caching/storage.

Also, spark as a sub-project spark-streaming that allows manipulating streamed data. The idea is very simple by defining a DStream as being a sequence of RDDs containing the data for a certain slice of time.

Oh, RDD and DStream are typesafe -- the whole thing being Scala code.

The spark-streaming project is really well integrated and can consume a lot of streams like Twitter, File System, Kafka or even an Akka actor.

Project

This blog relates a small project publicly available on github (spark-bd) that helped me catch the big picture of Spark, after having followed the mini course here.

The project is based on another working group I'm participating in. This project, simply called project 2, has been initiated and is still supported by the great Belgian Big Data user group (http://bigdata.be/).

The idea was to co-create in several workshops an application that given a list of companies to analyse;

catches the Twitter stream and the Yahoo finance data using Storm;
manipulates both by either applying a sentiment analysis or simply the changes in price;
aggregates by window of time both information for each company
gives (constantly) the information about how is going a given list of companies on the market.

Fairly simple and really funny... the project is still ongoing but you can check its progression here.

It was really funny but I wanted to see how my feeling would compare by writing the same application using my actual preferred language (Scala) and the next technology in the Big Data field I was eagerly looking at (Spark).

IMPORTANT: since if I'm not there yet to make a real comparison (a niche for another blog, later on), I'll only give an insight on my actual work.

Internals

Setup

The setup is very minimal for a Spark application. All we certainly need is reduced to two things:

an SBT project declaring as dependencies Spark and Spark-streaming.
in a main class/object configure the spark streaming context.

The SBT project only requires an build.sbt in the root folder that contains these lines only:

Where the only lines that are under interest are the two last ones.

Then we can create an object that will start the spark context.

Here we just asked Spark to create a streaming context on the local environment, named Project2 and that will slice the DStream into RDDs of 5 seconds duration each.

Afterwards, we asked the context to start its work... which is nothing up to now, since nothing as been added yet to the context.
So let's do it.

Twitter stream

Here we'll consume the tweets stream provided by the Twitter's API. For that, we'll have to do only two things (a pattern emerge...?):

register the 4 needed information for an OAuth2 authentication to the Twitter API (so don't forget to add an application to your Twitter development account).
add a tweets stream to the spark context

Actually, the streaming context as convenient methods to add a twitter stream.

For that it'll use the Twitter4J library (it's definitively not the best, but at least we can achieve our goal).

This library needs several keys to be set in the System.properties.

There are a plenty of way to do that, in my project I took something rather straightforward: I've picked the Typesafe config library and added those four configuration keys in an application.conf file, then I load all value in the related System property.

As you may (or not) see, the application.conf file will look for environment variables giving values for the required key/secret/token.

That's done we can start listening for tweets:

Dead simple, after having created a model for a company/stock, we used the configured keywords as tags for the twitter stream.

Behind the sea, Twitter4J will access the stream end-point with these tags, and Twitter to pre-filter all tweets based on those keywords (in hashtag, username, text, ...).

At this stage we've only added the stream, but we didn't yet said anything about how to consume them... So we can simply call print() on the stream to ask spark printing the 10 first incoming event in each RDD.

However, that's not how we want to deal with those tweets right? Let's adapt our stream to add relevant information for our use case.

Sentiment analyzer

Now that we have a stream of tweets it's time to read them and to give them a sentiment score based on a sentiments file. The code that reads it and that assign a score for a tweet is not that important here, but the code is here.

However, what we would like to do is to adapt our stream with this score attached to each tweet.

As said earlier, Spark is a bunch of methods on Sequences of data, so we can now assert that it's true.

Look at what was done in the map, we picked each tweet, fetch the list of sentiment entries that match the text, and also we kept the whole original status for further computation. This results in a DStream[(List[Sentiment], Status)].

Since we only want those tweets that have at least one Sentiment that matches, we then filtered on the length of this matching list.

We combined both actions to get a resulting filtered and sentiment-analyzed tweets.

Okay great, but what we really want is to have such information grouped by Company/Stock right? So that we can compute the participation of the social network to the health of each company.

To do so, we have to group semantically (based on text's content and required stocks' keywords).

Note that it's not that straightforward because a tweet could relate several companies at once!

That's all! We're now done with the twitter part, we are now getting all Data that gathered the information of a Company/Stock, the tweet and its associated sentiment. Which opens doors for further aggregations/grouping/reducing.

Interlude

Let's now pause a bit, I encourage you to try your own stuffs and playing with the tweets.

For that, I also encourage you to fork/clone the original repo, where you can already start only the twitter feed and print everything in the console by running the appropriate command:

$> sbt

sbt> run-main be.bigdata.p2.P2 twitter print GOOG AAPL

This will run the start spark and filter the tweets based on the keywords associated to both GOOG and AAPL, whose tweets will be analyzed as said in the last section before being printed to the console.

WARN: don't forget to export the required environment variables

In the next blog posts, I'll continue with the explanation on how use Akka to construct Spark streams and how to combine several streams into a single one.

This multiplexed stream will then be used to compute an aggregated data by a window of time.

Having these data in hand, we'll be able to create a Spray server that expose them into a JSON format, helping us creating a client web application that simply consumes the JSON messages.

Function1[-T1, +R] -- What the heck?

2013-07-14T23:58:00.001+02:00

In this post, I'll try to cover an important notion of the Scala's Type System -- and a Java's Terra Incognita.

(generic) Type covariance or contravariance.

Ho yeah, I heard you... yet another post... right?
... right! but it seems that it needs a more gentle one, where Category Theory is left apart. Because, Category Theory must come when you want yet more fun, not to get basic concepts!

That's why we'll see what are these constraints in a more pragmatic way, using (important) examples.
Afterwards, I'll let you read the blogs using Category Theory ... if you like.

I thought that taking the problem from the Java side might be easier to explain, thus the following parts

Problem: Assignments failures using Java Generics

Solution: Covariance Type in Scala

Problem: Unuseful generics for method parameter in Java

Solution: Contravariance Type in Scala

Problem: Java tells me that Kids aren't Humans

Since Java 5, as Java developers, we were really interested in the new breaking feature that was added: Generic Types.

And it was rather interesting enough thanks to the java.util.Collection API, but also the syntactic sugar for (A a: as).

However, we also felt really quickly on the problem that the hierarchy of Generics doesn't span to upper level type (as explain in the Java tutorial).

Let's clarify the situation with an example.

First we define a model with an classical hierarchy:

Okay, with this setup, one might want to create a list of kids and assign it to a list of human. Why?

because they are...
because they grow!

Recall: in OO a model (should) represents the domain.

Fine! Here is several tries:

See? You just can't in Java! Because a ArrayList is not considered to be an instance of ArrayList. This will impact all your further algorithms, unless you use the tricky tip to anonymise your type and to constrain it with a the Human type as bound.

At least we spotted the real problem... Let's see the solution that Scala offers.

Solution: Covariance

In Scala, a type can be defined to be covariant to one (or several) of its generic type (roughly speaking).
We can simply check the Scala's List definition:

sealed abstract classList[+A]

Here, we tells the compiler that List is covariant with its generic type A.

In other words? List will vary accordingly with the variance of A!

And life gets simple (respecting the OO concepts):

Following the type of generic type is pretty cool and useful, but there are cases where it won't help... let's jump to the next section if you don't believe me...

Problem: Java tells me that Kids aren't Human and Integer isn't a Number...

To illustrate this case, we'll create two helper type that define no-arg procedure and an action (a function) with one argument.

Simple as simple.

We also created to action definitions, one that maps a single integer onto a single human, and another one that maps a number onto an adult.

This will help us defining an higher order operation that, for instance, can map a list of integers into a new list of humans. For that, we'll create the ListMapper class (see below) that will map an input type I to an output type O.

Using this ListMapper, we'll try to convert:

a list of integer into a list of humans
a list of numbers into a list of adults

We can easily imagine that the logic should remain in the actions introduced in above. Right?

Let's see how it goes in Java thus:

huh?

It seems that the mapper from Integer to Human is only able to deal with an action that takes exactly these types as input and output resp.

But, it's weird, because something that works on Number should be able to handle any Integer, isn't it?

And, if the returned object is an Adult, it should be ok to assign it to an Human? True?

For sure, but for the same reason as before, Java is not able to deal with such multi-levels hierarchies. crap...

Before going ahead, one might have noticed that the inheritance hierarchies are opposites:

Number is extended by Integer
Adult extends Human

But the mapper should be able to accept both pairs of input/output type! damn... the action type doesn't seems to vary the same way for input as for output!

Indeed, an action must be contravariant on its Input and covariant with its Output.

Here we are.

Solution: Contravariance

In the example above, we introduce action that acted as function taking a single argument. And those actions had to behave differently on the type of the input and the output.

Let's check this out in Scala, looking at what looks like such function:

traitFunction1[-T1, +R]

Bloody hell, a function type in Scala will vary in the opposite way as for the covariant types. That's why a minus sign is used on the T1 type.

Simple said: a function F will be extended by any function that respect either or both of these cases:

its argument is a super-type of the F's argument.
its type (output type) is a sub-type of the F's type.

Let's see everything in action now:

That was for illustration purpose because the types weren't optimize for maximal genericity, however we can see that we were able to use a function that takes an AnyVal argument and results into an Adult object where a function from an Int to a Human were used!

Conclusion

A lot for nothing? Maybe?

But it's a question that is very common for Scala newcomers. And I think it's one of the most important ones, thus it must be answered in the more precise and accessible way.

Furthermore, it also demonstrates that Scala is even more Object Oriented that Java can be.

Please let me know, if it can be more clearer, this blog is meant to be organic, so any comments, concerns, helps are more than welcome

MongoDB and Play2 at ease: using Play-Salat plugin and Embed Mongo plugin

2012-08-13T23:05:00.000+02:00

A while ago I've blogged about MongoDB with Play2 using Salat, here.

This post was describing how to integrate Salat easily with Play2 and gave some advice on actions to care of.

Preface

Play 2 gained in popularity and an amazing plugin as emerge for this purpose: play2-salat.

This plugin offers a lot of configuration to hit running instances including replicasets! It integrates very well by applying the advice I talked about in my post, but not only. It defines binders to enable us using Casbah stuffs in our routing and action definition (ObjectId, and so on).

This post is not dedicated to explain how to use it, I'd recommend you to browse the project page (play2-salat), plus the wiki that points to relevant URLs.

Goal

This post is dedicated to developers teams that follows (or not...) the convention of Continuous Delivery, especially the Single Command Environment pattern. That is, the environment must be set up in one single command... in Play2 => play run OR sbt run

Context

Create an application that uses MongoDB as (one of its) persistence backend service, use play2-sala to have access to the `ORM` for our object and easy collections connections.

When runnning in production, of course, a MongoDB instance runs somewhere that can be configured (or a replicatset).

But in Dev?

Embed Instance

When another developer is cloning the related repo, knowing that it's a play application, he's best will would be to enter the directory and launch the application. > BANG <

No running instance...

So I created a Play2 plugin that uses this amazing work which retrieves a mongodb installer, installs it and enable us to launch/stop it... Keep in mind that MongoDB is not JVM based!

Adding this plugin to the application, setup the dev configuration to starts an embed MongoDB and Play2-Salat to target it, will gives the satisfaction to our developer... Moreover if he is a Designer (the only kind of guy that add values to any app ^^) who don't care about MongoDB, at all!

How To

Add the plugin dependencies (used in PlayProject):

//MY OWN REPO where is deployed the following plugin

val skalaCloudbeesSnapshots = "Ska La SNAPSHOTS" at "https://repository-andy-petrella.forge.cloudbees.com/snapshot/"

//THE NEW PLUGIN => EMBED

lazy val embedMongoPlayPlugin = "be.nextlab" %% "play-plugins-embed-mongodb" % "0.0.1-SNAPSHOT" changing()

//THE WORTH ONE

lazy val salatPlayPlugin = "se.radley" %% "play-plugins-salat" % "1.0.8"

//DECLARE the deps

val appDependencies = Seq(

embedMongoPlayPlugin,

salatPlayPlugin

)

A bit of configuration (application-dev.conf)

embed.mongodb.start=true

embed.mongodb.version=V2_1_1

embed.mongodb.port=27017

mongodb.default.db=meinGot

mongodb.default.host=localhost

mongodb.default.port=27017

And the most only thing that requires a bit of explanation (in conf/play.plugins)

600:se.radley.plugin.salat.SalatPlugin

550:be.nextlab.play.mongodb.EmbedMongoDBPlugin

See? Yes, the Play2-Salat plugin MUST be started AFTER the embed plugin... of course (what an explanation huh).

Code

The one-single-file-of-33-lines plugin can be forked here.

That's All, Folks!

Gatling-Tool in SBT or Play : Sample Projects

2012-07-20T22:39:00.003+02:00

Content

This post is a direct follow up of this one where I introduced a bit what I did in order to integrate Gatling in SBT and in Play2.
Where this post was more about bits and bytes necessary to accomplish the task, this one will talk about how to use this mess.

Gatling for SBT

Now, that we've a dedicated SBT plugin in hand we can create a sample project that uses it (I already created one here).
In this new project, we'll need to create a file plugins.sbt which will contains the reference to the gatling-sbt-plugin.
Actually, it's the classical way to add a plugin to a SBT project (and the easier and the "semanticest").
We're now prepared to configure our build by using the pieces provided by the plugin.
At the end, we'll be able to write a first test and launch it.

Project Build

First of all, we must create a directory for your project with this basic structure:

Looking at the structure, we can see a Build.scala that will define our project build, a build.properties that defines the SBT version with a single line : sbt.version=0.11.3
And the latest file in the project folder is plugins.sbt which... declares the plugin.
Let's put the Build.scala aside for now, and have a look at the content of the latter.
Self descriptive isn't it? Yes, we've just told that SBT has to use our gatling plugin... that's all? Not yet.
Actually this line will provide us everything that have been declared in the plugin, such as the Gatling configuration keys, the command, and the basic SBT settings.
Now, we're gonna us them within our Build.scala.

Before going in specific details, note that we had to add my own repo (hosted at Cloudbees) in order to fetch the project... but you could also use the URI fetch provided by SBT...
What are really important to point out is the allSettings declaration and the import of the GatlingPlugin.
At first, we reuse the default one by using Project.defaultSettings which is being appended the gatling settings, using Gatling.gatlingSettings... that object comes from our plugin!
This will add all relevant keys for the "gatling-test" Configuration with default values.
At this stage we've almost finish our build definition... All that remain (and shouldn't... have to figure out any help is welcomed?) is to add two things:

Declare the test framework to have access to the gatling classes and the custom Simulation :
```
"be.nextlab" %% "gatling-sbt-test-framework" % "0.0.1-SNAPSHOT" % "gatling-test"
```
Declare the command :
```
commands ++= Seq(gatlingTakatak)
```

After having used everything in your project (don't forget to add the GatlingTest configuration...) you're so close to write your first test.

Gatling Conf

This section will cover the other folder src wherein reside the classical main and test folders.
But there is an other one, gatling-test that is unknown at this stage.
In fact, this folder will be the root for our Gatling tests, holding the configuration file (I provided a basic one in the sample app) and the future simulations that you'll write.

The first is dropped in conf/galing.conf, where the latter is simply simulations.

Write your first test

At this stage, I'll make an assumption that you either know the Gatling api... Nah, the Gatling DSL is sufficiently self descriptive (otherwise check the wiki).

As you can see, it's very very easy (that's why I like Gatling-Tool).
All we had to do, since we're benchmarking google, is to implemnent the apply method of a simple scenario within a class (with must have an empty constructor... a limitation for now) that must extend our custom Simulation (yeah! I know ... I'll choose another name for it that's why GSimulation is needed).
Note the dummy implementations of the interceptors whose aren't needed here.
This Simulation must be located somewhere under the simulations folder.

Now, we can launch it and see where are stored the results.

That's enough boy...
Let it shot the server and generate the result.
...
...
Done? ok.

Now, that all bullets have been shot, you can go in this folder gatling-project/target/gatling-test/result.
See? True! Every run will create a specific folder under this one, named run-${scenario-name}-time.
Make some jump in it and locate the index.html file, open it in your browser and see the magic happened.
NB: check this page for information about the reports.
So far, so good. But we're writing web apps and we want to stress them... we all know that the google server won't crash (that easy).
Let's move to Play2 (or you can write your own app w/o Play2 using Spray or another Java framework maybe, then mimic the following).

Gatling for Play2

For this part, I also wrote an application that makes use of the "plugin" introduced in the previous post.
It's quite complicated, not yet polished and might be a good topic for a future post, because I tried to make an application based on Event Sourcing with some workflows and usages of pure functional structure like Lens, Writer, State, Monad and so forth.

But let's stick with the actual topic, if you fork the project and browse a bit its build configuration you'll see that it is near from the one we just talked about. But still that some particularities are presents...

Definition

First of all, since Play2 has its own convention for sources folders (app, test under the root) we'll redefine those folders for gatling as well.
In order to do that, we must adapt the following configuration keys:

Actually, yeah, we've just skipped the "src" folder. Anyway.
So far, so good...
I heard you...
But, do you known that we're already done?
Allez, let's write a test.

Write

If you started from scratch (without forking the sample app) you can now stress test your index page that will render the default welcome page defined in Play2.
See the below Gist for that, or its follower for a more complicated one using the app.

Look what I've done Dad!
In the Simulations, all we had to do is to simply declare an available port on which the framework can start the server, and its needed FakeApplication to have access to the routing.
Then when we had to build a request path, we've just had to do the same as in our templates, that is, use the routes object under the controller package. Amazing.
Furthermore, we can also reuse our type checked Form to build our feeder. Waouw.
But, at this stage, there still some boilerplate (parameters). Looser.

Launch

And now, it's time to stress test our server and see how it'd behave on production. (Ok, that's not 100% true, since Gatling is running on the same machine...)

For that, just do the following:

Wait. Again.

Now, go to you result directory (the path hasn't changed). Open in your browser and let the shine comes in.

Wrap Up

At this stage, with small and few injuries we can use Gatling-Tool without having to install or trick stuffs in order to have information about our web app.
This by using either a full SBT application or a Play2 based one.
I know that there still a lot to do, but at least the basic features are there, and I'll enjoy anyone forking it, scrapping it if necessary and let me know.

Last words

I hope you liked these posts, otherwise I'd like to apologize because I did all projects and the blogs on the train, half asleep every morning and evening, back and forth from my actual mission's workplace.

Now I'm gonna sleep in this train...
Oh no, I can't...
I've to write my book now.

Gatling made easy for SBT or/and Play2

2012-07-20T22:36:00.000+02:00

Forewords

These last few weeks, I took some time to understand a level deeper how SBT works and what it can provide. Since this post is not related to this learning trip (which was along existing blogs and wikis), I'll jump directly to the idea this new understanding gave me.
A while ago I started (and paused) a work on Gatling-Tool to have it integrated with Play 2.0 (see these posts here and here), this work has been refactored to better integrate with SBT.

What we'll find in this post

In this post, I'm gonna give you all tools in order to stress test you Web based application, either built upon Play 2.0, either only using SBT. By directly starting writing your tests rather than having to configure stuffs or to start others... So this post is composed of two parts:

How I built these tools (can be skipped -- definitively)
How to use these tools (it's probably the useful part)

Gatling shot 3 aside projects... for goods

This part is related to what was necessary to have gatling-tool easily integrated with SBT, and after with Play2.

Gatling SBT Test Framework

This project (on github) has a sole and simple goal, to implement the Test Interface (see) used by SBT to integrate new Test Framework (like was done for ScalaTest, Specs, JUnit, and...). This project contains only 2 classes and one convenient trait.

GatlingBootstrap

This class is one of the helper that avoid boilerplates in the future, because it starts the GatlingConfiguration stuff needed by Gatling-Tool to execute. This class requires two things "the path to the gatling configuration file" and "the path to the folder where will be stored the stress results".

Simulation

This trait is another tool in hand that extends the com.excilys.ebi.gatling.core.Predef.Simulation provided by Gatling in order to add it two interceptors whose are pre and post. Those interceptors will be really useful in the future, in order to start/stop a server for instance (e.g. FakeServer in Play2).

GatlingTestFramework

This class is the real processor of the project. Actually, it's also the implementation of the interfaces declared by the SBT test framework which are Framework, Runner2 and some Fingerprints. Basically, what is done there are those tasks:

Create the gatling bootstrap based on sys properties
Declare fingerprint to discover Stress Test based on the parent class being be.nextlab.gatling.sbt.plugin.Simulation
Create the stress tests instances (only classes are handled for now) by reflection
Call the pre interceptor
Execute the stress test using the gatling api
Call the post interceptor
Generate the reports using the gatlin api

Gatling SBT Plugin

This project (here on github) aims to provide the basics to use the test framework easily within an SBT project.
Its intent is to have testers able to write stress tests directly, just after having imported the module in SBT.

It basically provides an SBT Configuration (gatling-test) that extends the SBT's Test one and one Command (takatak). The other goodies are settings which are common for all such stress tested projects using gatling.
Those settings are essentially the conf file path, the result directories, the libraries (gatling ones principally) and so on. The last important thing that it does is to add the gatling test framework to the already existing list of test frameworks supported out of the box by SBT.

Gatling Play2 Plugin

This misnamed project (no more a Play2 plugin, neither a SBT one... but it's ok for now) brings only one little thing. It comes with a base implementation the custom Simulation declared in the Gatling Test Framework.

This base implementation is Play2 specific while creating a fake server and starting it in the pre interceptor, stoping it in the post one.

This will ease further tests in the Play2 environment.

Save Point

So far so good, if you read this you can now navigate the small projects that were necessary to do what we'll discuss in the next blog...

This one is already too long...

Type-safed Composable Interceptors in Play2.0

2012-06-16T18:38:00.000+02:00

This post is just a quick follow-up of this post, which introduced my latest utility for Play 2.0.
I recommend you to have a quick overview on it.

Type-safe composition of interceptors : Premises

Briefly, we'll just see how the first "future work" as been addressed. That is, avoid boilerplate for interceptors composition.

A quick recall, when we had to compose such Interceptors we had to take care our-selves of the validation results and the combined result (tuple or whatever).
The real problem that is under the sea here, is that tuples are not so easily generalizable (no append method, roughly).

So I decided to use the Shapeless library (thx to @milessabin).

Shapeless has an amazing core structure that enables type-and-value chaining (somehow). The HList type is a kind of list but each element is one value and one type. For instance, it has the head value and the Head type on top of the Stack. Here is the kind of stuffs that we can do with Shapeless:

val | = false
val thisIsNotAPipe = "this" :: 15 :: false :: "a" :: | :: HNil

> thisIsNotAPipe: shapeless.::[java.lang.String,shapeless.::[Int,shapeless.::[Boolean,shapeless.::[java.lang.String,shapeless.::[Boolean,shapeless.HNil]]]]] = this :: 15 :: false :: a :: false :: HNil

>Type-safe list of stuffs<

In Action...

While trying to use HList generically I had problem with the implicits that are needed to prepend two lists, but StackOverflow has brought me the answer here.
I'm not gonna tackle here how I did, by I'll demonstrate what is now possible with the new composition functionalities added to Interceptors.

First of all, I had to declare an Interceptor for HList and the related implicit conversion. After what, I added three methods:

hlist this method on Intercept is able to convert a classical one to a HList one
~::~ this one is available to compose any interceptors that is not defined using HList with an HList one. It will create a new Interceptor with the new composed HList as result
~:::~ this one enable to compose two Interceptor defined with HList, the results will be the concatanation of the two HList.

Note: concatanation of HLists preserves the type sequence, actually we can see that as if it concat two lists of values and two list of types.

Let's see how we can deal with them:

Easy no? Combine interceptors and use compile-time type-checking to validate the required kinds of items.

Without boilerplate

An Attempt of Play2.0 Action Interceptor

2012-06-15T23:10:00.001+02:00

In this post, I'm gonna introduce a piece of code that I laid with the help of two stuffs.
The Security object provided by Play 2.0 out of the box and somehow the Secured one provided by the TypeSafe's plugin util.

The idea here is to leverage the actual functionality wich only enables to provide one Option[String] as result.
What I wanted at first is to satisfy my use case, which is : a lot of actions are secured and I need a username and its id. Where id is an Int.

Here we are, the actual Security and Secured don't permit me to have several extracted values, or to have an Int.

That's why I created this project : https://github.com/andypetrella/steal-play2.

Steal help

The idea is to enable any actions to be preceded by some interceptors that are stealing values either from the request, the cache, the database (or...) and set them as the parameters of a closure that outputs an Action.

But we must also care of the cases when something went wrong during the stealing.

Interceptor

This trait is the core of the solution, it defines:

the stealing operation: a function that takes (currently) the request and outputs a validated output (Validation from ScalaZ)
the err callback: a function that takes the request and the failure (when computing the stolen value) and outputs a valid Result to send back to the client.
apply: a closure that takes the result (not a Validation) and output an Action.

At this stage and looking at the code below (I've omitted the apply impl because not important here -- just wrapping and unwrapping), it's fairly simple to know how to define such interceptor.

I provided the simplest implementation of this trait that is a case class extending it by defining the two callbacks as fields. Like so:

So now we're armed to do stuff like that:

Where we defined two interceptors that takes a string from the cache and an Int form the cache also (just an illustration). After what, we combined them in an another Intercept using a for-comprehension.

So far, so good, now how to use such combined interceptor within an Action. Check this out:

Monoid

With the help of Monoid (from ScalaZ) and if the case permits it, I defined an implementation of Interceptor that append the successive results one after the another. Reducing and simplifying the composition. Like so:

Note: we used the classical case class. Thanks to an implicit def that wraps the Intercept into the Monoid implementation, using the TypeClass bound declaration. see

What will come next

The next step I've already started is to try using Shapeless to avoid the composition boilerplate. Things are ongoing. Stay tune for that.

One step further, I'll add another parameter to the steal callback, which will be the optional result of the previous computations of other interceptors. That in order to combine them at the function and result levels.

And probably, add all boileplate my self that creates the tuples out-of-th-box in the compose function.

That's all folks!

How Monad Transformer saved my time

2012-06-11T11:45:00.000+02:00

Context

These days (this week-end) I wanted to put some work on a Neo4J Rest driver that I'm writing for Play 2.0 in Scala.
The only thing I wanted, actually, is to have the current embryo more functional. I mean I was throwing exceptions (frightening... huh!).
Since this driver is meant to be fully asynchronous (man... it's http, it MUST be) and non-blocking (Play's philosophy), I was hardly using the Promise thing via the use of the WS api of Play.
This is the kind of thing I've got (briefly):

def getNode(id:Int) : Promise[Neo4JElement]

Where Neo4JElement stands for the wrapper of all Neo4J Rest response (Json). Hence, it can be either a Failure response (Json with stacktrace and message), it can be a Node, or .... throw an Exception (br...) when the service crashed (f.i.).

Hmm, not so intuitive and goes against the functional paradigm that orders: "you can't ever introduce side-effects, boy!". An exception that blows in my face, is one side effect (head-aches, ...).

Diego Validation to the rescue

Validation is a very simple thing, it holds either a value, either an error...
Ok, why not just Either then. Actually, you're right but Validation that I took in the ScalaZ library contains a lot of thing very helpful for the purpose of validation. But if you worry it, just replace Validation by Either in your mind from here.
Now, here is the getNode signature:

def getNode(id:Int) : Promise[Validation[Aoutch, Node]]

Isn't it more intuitive? For sure, you get back our relevant type in the signature : Node. Great!

So far, so good now what the heck is Aoutch : something that hurts... and what could hurt a runtime execution... exceptions yeah! Thus, Aoutch is just a shortcut for Either[NonEmptyList[String], Failure]. We can see that we represent with one single type an unexpected exception and a failure (missing node f.i.).

Monad

If you don't know what a Monad is, from here thing about a structure that can evolve in a for-comprehension (in scala it must implement flatMap and map).

Promise and Validation are Monads. And their used one over the another. But what really interests us is the leaf of the chain Node. That introduced some boilerplate code when you try to sequence actions like that:

See... yes we have to skip the first level (validation) to be able to work with the meaningful objects.
But still that we can extract some pattern... no?

Monad Transformer

The pattern that we can extract is kind of two-level composition. I did this composition my-self trying to figure out what we'll be possible.
It was successful but, I've have to introduce a new type and a new method, that was like a flatMap.
So I asked on StackOverflow ^^ (and it was my first question, yeepeee). You can find it here. So I want explain how I did, because the question explains it. But the real question was, is there a well-known functional construction for this problem?.
Thanks to @purefn, I knew that it was the case!

It was time to use Monad Transformer.

Monad Transformer

Briefly (and very roughly), a Monad Transformer is a construction that is able to transform a M[N[_]] into a P[_], where M, N, P are Monads.
I won't explain here how it does, because it would be long, but here is a good link (you've to understand Haskell a bit, sorry).
With the help of such transformer, you'll get you back the opportunity to use for-comprehension... with the wrapped-twice type as bound value.
Here is the the transformer for our Promise[Validation[E, A]]:

And how we can now link nodes:

Awesome! No! We get 3 async and non-blocking calls, totally typed checked and resistant to exceptions and failures... In 5 lines.

Future work

At the SO question, @purefn tolds me that scalaz 7 (snapshot) is defining (fully) this kind of Validation Transformer.
Why didn't I used it, yet:

I'm trying to use not Snapshot (not a good reason, but still)
In order to use ValidationT, I'll have first to create an instance of the Type Class Monad. Because the flatMap signature needs it in the context.

Conclusion

I love functional (even if I'm still learning -- back -- the basis ^^)

Gatling and Play2.0: continued

2012-04-22T17:03:00.000+02:00

This blog entry is a follow-up of this entry where I introduced a spike I did on Play 2.0 stress tested using Gatling-tool.

At the time writing the above entry, I had to quickly hack gatling to use Akka 2.0 as Play 2.0 uses it, and I didn't wanted to have clashes.

But, thanks to Stéphane Landelle, Gatling is now Akka 2.0 enabled (since 1.1.2).

So that, it was time to give the plugin's embryo a refresh. For that I used another project that aims testing the neo4j rest plugin for Play2.0 I'm also writing. In case of, I've also introduced what I did in this project in this post.

Content

Using the Gatling I first tested how I've been able to stress test:

simple (get) urls
mutating (post) urls using server form underground to compile excepted data.
duration tests (stress testing on a given period basis)

What I liked so much, even in this embryos+, is the ease to create stress tests when coupled with Play2.0 functionalities and Gatling's DSL.

Foreword

The Gatling plugin I'm currently building is located on github here and is based on sbt to build it.

But it's based on the version 1.1.4-SNAPSHOT version of Gatling's libraries (due to some fixes the Stéphane did "for me", while he was at devoxx fr, isn't he gentle !!!).

At the time writing, you'll have to build gatling and gatling-highcharts locally using maven (quick fast!).

How to

Set up

Having created a Play 2.0 app, you now have to powerful of sbt in hands (especially if you've installed the TypeSafe stack 2.0). So, to stress test your app, you'll have first to build to plugin I told above.

Plugin

First of all, clone/fork this library project on github plugin, after what, you'll just have to run sbt publish-local in the related folder. That's it, you now have the plugin in your local repo.

Project

In your Play 2.0 app, you now are able to define the plugin as `test` dependency, using the following

That's it...

Use

Specs

Personally, for testing I use org.specs2.Specification, for using it with the plugin (at the time writing) you'll have to create the following:

This listing is creating a fake server to enable urls to be tested, and some functions to deal with it.

To create the server, the plugin defines a Util object (be.nextlab.play.gatling.Util), which also defines rough helpers to be used in stress tests.

A full spec should look the following:

Simple Url

You saw, in the provided gist above, that Util also defines a way to simply defines a gatling Simulation (basically a Function0 that returns a scenario builder: Gatling DSL result).

Having that in hand, here is the fragment to stress test the root url:

As you can see, it's pretty simple, but nothing can really be checked in the body of the specs (I'm working on having relevant information to check).

But, at least you can run this test to hit 10 times the root url ramping the numbers of users by 2.

Running it (sbt test) you'll have a new gatling folder in your target folder that contains a results directory where are located all stress results in an html report (with great charts)

And all you had to do is to define the request headers and the url...

Mutating the server

If you have controllers that mutates the server, you should have define POST urls, which are using the Form feature provided by Play 2.0.

Having did so, you'll be able to stress it very easily using Map, JsObject or in the best case your Model.

Let's say we have a controller controllers.Stuffs that uses case classes models.Stuff. The controller defines a stuffForm and a createStuff action.

Your stress test can now be defined like the following:

In the gist, you can see 5 points to note, they are key-clues to create reusable stress tests.

Nothing is really hardcoded, neither the path to the http end point nor the parameters data.

That's http stress testing using type checked requests. CoOl isn't it? Hey, man, we got back our lovin' type cheking (one of the best scala feature).

Heavy check, duration based

This part is more a Gatling feature highlighting.

This last example is an heavy test that uses looping over a configuration for a given period. This gives you how many users could use your application.

Such test might be shaped the following:

Note

For now, it's NOT an official plugin neither a gatling nor a play 2.0 one, but discussions are on the way for that... stay tuned on twitter or here.

Still Playing... but new players are in

2012-04-18T00:11:00.000+02:00

Why?

Because I love to play, with Play 2.0 and scala (still learning).
But also because I'm currently investigating technologies that I might choose for a new product line currently building in my company, NextLab (in Belgium).

With ?

This time, I've more played with client side libraries or frameworks (no I won't post yet another entry on JQuery...), but I also tried how it is easy to create totally async code (so parallelized) using server side ones.
The technologies that will be quickly introduced in this post can be found hereafter, but everything has been packaged in a github repo, and a running instance on Heroku.

At first the current spike was dedicated to the slowly growing Play 2.0's Neo4J REST plugin, we are creating at NextLab. But, in order to demonstrate what could be done with the coupling of these two technologies, I've extended the spike's scope to something more funny.

So let's introduce the technologies:

Client Side

Twitter Bootstrap

An amazing toolbelt helping building responsive website without having to bother boilerplates in HTML, CSS.

Even if it is neat, complete and well thought, Bootstrap comes with another handy factor... it has been built using LESS. And by chance (I know, chance is not part of the equation) LESS is supported by Play 2.0 by default.

Just a note, LESS will let you reuse color codes, mixins, etc etc that Bootstrap has already defined.

Spine.js

As we'll below, we'll have to discuss with WS (json), upon which a REST interface has been added for meaningful resources.

That's where Spine.js comes in the game. This lightweight MVC library brought me the small tools that I needed for fetching, saving resources without having to write not even a single request by hand...

d3.js

Probably my favorite (that might be my mathematician part who's talking), this powerful Data-Drive Document toolkit has taken the right thing by the right end.

Its functional approach of decoupling data from the document, and link them using layouts helps you to concentrate on each part of the data usage independently:

the incoming/rought data mapping to a representative data
the mapping from represented data to document (most of the time, one data for one element)

Communication

The communication layer is of course HTTP, with a little help from the emerging HTML5 features. One in particular, Server-Sent Events (here is a great intro).

This, stable, HTML5 feature comes with the handy functionality to let the server sent events to connected client, without hacks; that said Comet or Polling.

Open a connection, push data, and that ONLY when the server needs to.

Server Side

Play 2.0 (Scala)

Of course... But I used some "advanced" feature like,

Async

Async is a Play 2.0 feature that let the server deal with the tasks it has to schedule.

That is, when you think that the server might have to wait for actions being executed before being able to respond a request, Play 2.0 let you, really simply, create Async request (non blocking).

Very handy when you have to call third parties services for instance...

Iteratee

The only way I would consider from now to consume data. Iteratee is a fairly difficult thing to understand (read this wiki) but it gives you the same smart decomposition that in d3.js, that is, decoupling the management of a sources, its useful representation and its computed result.

Akka

Powerful, actor-based, parallelizer, asynchronous task, scheduler library.

Actually I needed, a request to launch async tasks (you know like event generation and dispatching)... so what else!

Neo4J

A database for storing graph... let's use a graph database.

Within NextLab, we started a open sourced Play2.0 plugin for calling the REST api provided by a Neo4J Server (helpful on Heroku). It's still emerging, and continue to evolve a lot because features are implemented on the fly (need), and a re-pass is forecasted to add a meaningful DSL (like FaKod did).

Why do we make this choice to implement a distinct plugin instead of wrapping the java library?

yet another library, which brings me too much (I need REST only)
I want requests being async and under control

Libraries Repo

See below, we'll use Heroku. So in order to deploy this application wich uses our plugin, I needed to publish somewhere.

This is what can offer Cloudbees. Among other great things like git repo, CI and so on, Cloudbees provides you for free four maven repositories that you can make public if you wish.

So I used sbt to publish on my "snapshot" repo on cloudbees, letting heroky has access to it for downloading the plugin dependency.

Platform

Free, reliable, easy to use with scala and Play 2.0... which else than Heroku?

So what

I think that this post is already too long... However, I can let you play with the resulting app here.

Check out the code (and fork it) there.

Depending on the comment, I will expand this post to other ones to respond potential demands (if there is any ^^).

Good play.

Oh yes, one last note, the application is to Create Stuff which contains dummies. Stuff are created using a simple form that must be fulfilled. Stuffs can be linked one to another by clicking the graph.

Please create Stufs and links, it will be a good test for Neo4J, the plugin and Play 2.0.

Hope you've reach this.

SMAK. hehe

TypeSafe Stack 2.0 missing "play debug" like feature

2012-03-22T22:28:00.000+01:00

A quick one to help players that are using TypeSafe stack instead of the Play! 2.0 distribution package.
Because I discussed some points on the groups and I saw related StackOverflow entries, that this post might avoid in the future ^^.

TypeSafe Stack

With its second version, the TypeSafe stack stroke a hit integrating Akka 2.0, Play 2.0 and... its amazing console built on top of both technologies.

With the Scala IDE 2.0 (yeah a lot of 2.0), this stack is ready to tackle the SpringSource Tool Suite, but I don't want to make the comparison here neither explain all of these components... would be long and longer.

But once you've installed the stack and you want to Play! around, they recommend you to use the giter8 template from the typesafehub on github (it also contains a lot of plugins, which you might want).

SBT instead of Play launcher

Using the stack, you won't have the play tool in your hands to generate application and so on, because the way to go is to use g8 and sbt.

This is not an issue but there are some points you'll need to have in mind:

the secret is not generated at creation: they're not so far, because the secret is only a random string, and an issue on giter8 is on-going. So, you can create a random string by your own until it will be done. I've also proposed that a new command in sbt might be helpful to regenerate the secret.
play debug isn't available: when you need to debug your Play! 2.0 app you need sbt to activate the jdwp when running. For that, there is a MVN_OPTS like SBT_OPTS that comes in your help. Set it with the regular options (-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=9999).

HTH

Gatling-Tool Plugin for Play 2.0

2012-03-15T00:01:00.000+01:00

Goal

This post will be a kind of write-up of what I'm trying to do now. And for what I already achieved some tasks.

Since I'll soon start products in my company that will be based on technologies like Scala, Play 2.0, Neo4J, MongoDB, Heroku and misc, and that I'm a bit control freak; I wanted to be sure that what I'll build will mach the requirements in terms of capacity.

This is where Gatling-Tool comes in the equation, this is a very powerful (and scala based) stress testing tool, which the name recall.

We're about to have some words on it, but first let me tell you that a GitHub ready to be fork is available with my first step into a gatling plugin, find it here.

Gatling Tool

Gatling Tool is a cute, smart and intuitive stress testing tool for web application, which a neat DSL for http request and asserts.

The DSL written in scala, and following the good conventions for it will aim anybody to be able to write stress tools. In a way that even non-programmer peoples with a basic understanding will be able to do basic stress tests with a good learning curve.

It's integration with browsers (Firefox, through a proxy) ease much more the work because you'll be able to register like macros (or like badboy does) your scenario to be repeated again and again. This is what is called the recorder.

Scenarios could be written with a custom external DSL, but they're also available as regular scala code (internal DSL) and there is where the coupling with Play 2.0 scala should pay.

Integration with Play2.0

Akka 2.0

Fact: since the RC-5, Play 2.0 comes with the Akka 2.0 support.

Fact: Gatling being it-self based on Akka (and they're right for that) but on a previous version for the stable version (logical because Akka 2.0 is pretty new).

So an integration must go through an update of the Gatling-Tool to the same version of Akka 2.0 in order to be able to use them correctly using the same project (testing phase, but still).

That's why I decided to fork the Gatling-Tool on GitHub (aaah the great world of open source), in order to switch the support of Akka from their 1.x to 2.0.

Even if it is true that I did it roughly (at first), it remains that it works and is the two needed projects for the following are:

https://github.com/andypetrella/gatling
https://github.com/andypetrella/gatling-highcharts

So fork, clone them and build them locally using maven using the classical mvn clean install.

You'll have a brand new version of gatling 1.1.0-SNAPSHOT.

NOTE: I had some difficult choices when doing the migration, some are breaking the runtime behavior (a bit) and I'll have to discuss them further with the Gatling team. I've already been contacted by Stéphane Landelle who told me that we was interested by the work since it was planned the 1.2 release.

So don't be afraid, the official release will match the needs soon. (But ping me if you want more info and help me.)

Typesafe Stack 2.0

I recommend you to install this brand new stack that integrates all stuffs that you'll need for scala development, including Play 2.0 project.

Now, simply follows this link for further steps, and then create a play-scala project.

Sbt

Since Play 2.0 is using Sbt for building its project, and the custom gatling library we built is in our .m2 repo, we have first to add our local maven repo to the repositories list this way (updating your Build.scala):

Tada, now we have a Play 2.0 project having our custom gatling as dependency.

Plugin

Before going in further details with integrating Gatling as a Play 2.0 plugin, I'm gonna talk about an uncovered subject in Play 2.0 (or not easy to track); the Plugin feature.

Play 2.0 comes with a pretty easy simple Plugin integration, this through the specific file, in the conf folder, named play.plugins. This file is meant to contain one single line by defined plugin, shaped this way: {priority}:{plugin's class path}.

But what is a plugin finally, this is a classical class extending play.api.Plugin... simply. This Plugin trait only defines three methods which are:

onStart: this adds a hook when the application starts, helpful for initializing objects.
onStop: cleaning the fields.
enabled: helpful to disable the plugin in some specific cases.

Another point, that I have to highlight is that it seems that such Plugin's constructor must have an argument being the application it-self.

Test only

Hey wait gatling should be available in tests only ?! Right!

The first way to achieve this is the easies also, while implementing isEnabled, you can use the application (remember it is part of the constructor) which has a method isTest that should toggle the plugin.

The second way is to create a Specification that starts a FakeServer (since we'll stress the entire Play 2.0 flows) and give it a FakeApplication which is defined with the Plugins you wish. 1000 words replaced by one Gist:

Gatling integration

Now you're wondering what the heck is that Gatling Plugin class, don't you?

Gatling Plugin

The Gatling plugin class, located in my repo under the test/gatling folder, is extending the play.Plugin class (which defines dummy implementation of the three methods), this way I can concentrate on the only method I need, onStart.

Actually I need some initialization in order to use gatling, it needs some folder to be defined, including the one interesting use the more: results.

So, the Gatling#onStart method is creating ephemeres folders under the target directory (that can be cleaned) and also the needed Gatling configuration file. And that's it.

We can now stress test our app.

How?

Gatling is able to understand Scala written scenarios, those Scala script have only one constraint: being an instance of com.excilys.ebi.gatling.core.scenario.configuration.Simulation.

This trait is actually a Function0 and thus defines only one helpful method which is apply(). The latter method is the container for building the stress test that we want to execute.

What is very common now is that you can write Gatling scenarios using the Scala Type System being checked in your favorite IDE, and they will be compiled by Sbt it self when requested (and hot swapped ^^). Where the classical Gatling workflow is to compile them on the fly, using their internal routines.

Run 'em

Having simulations written (example here), you can now ask Gatling to run them by creating a gatling runner instance on them. I won't go into deep details because it is not the purpose here, but here is how you can do.

See 'em in action

That's the easiest part, entering the play environment using sbt in console, you can launch the tests by typing test.

What'll be done is:

enter the specification
create the fake application
load the additional plugin
create the gatling folder and conf
configure the gatling system
create the fake server on 3333
create the simulations
run them
generate reports on them (located in the target folder => look 'em in your browser and you'll see how they're cute)

Problems

There are problems for now when executing several tests, because streams are closed (while generating further reports), that comes from a choice that I've have to do and which is commented on github here. This is mainly related to a feature that is no more available in Akka 2.0 (for good reasons, I'd say).

To be continued

If you want to help me going further, don't hesitate to contact me on my mailbox, or comment this post or on twitter.

What I like to have in the future is :

clear Specification for Gatling (preventing the need to define each time the server and plugin)
website for enabling test one by one, or any, or...
redirecting to results reports
more
and more

Play 2.0 and Salat (MongoDB DAO provider)

2012-03-11T13:12:00.001+01:00

Play 2.0 using MongoDB document storage through Salat

In this post, I'll cover some points and library that ease such use case.

I won't go into deep details about Play and MongoDB, instead I'll jump straight to the MongoDB usage.

Let's talk a bit about Salat

Salat

This free library available on github here (and deployed on ivy repo, so that easily usable with sbt and Play) is able to deal with case classes and MongoDB as we'd do with JPA.

That say, case classes can be used directly for storing document by simply declaring a DAO and with the help of some annotations (not mandatory).

Grater

Salat as the notion of Grater that is responsible to the de/serialization of a case class. This through two simple functions, let have a Grater for the type T:

asDBObject(t:T) : returns a MongoDB representation of t
asObject(dbo:DBObject) : returns a T instance based on the provided MongoDB content

The latter thing to note is how to create such Grater? What do we have to do? The answer is almost nothing, in the package com.novus.salat is available the very handy method grater[Y <: AnyRef].

What that grater method does is to parse your case class provided as the method generic, and create the related Grater instance. The important thing to note at this stage is that the latter Grater is cached for further needs.

So now, you're already able to deal with you case class. What is missing is the DAO part, that will ease again your job.

SalatDAO

Salat provides another handy structure named SalatDAO. This trait defines all lifecycle operations that you might need for your domain objects.

Its usage is very simple, you just have to define an object that extends this trait by giving the Domain Specific class and Id type for your structure. The last parameter it needs is a reference to the collection.

Here is an excerpt I pulled from the salat wiki:
object UserDao extends SalatDAO[User, ObjectId](collection = DB.connection("users"))

Play 2.0

Having covered the basic of Salat, it's time to use it in our Play 2.0 application.
There is a cool wiki page, that I wished have discovered before which explain how to use salat with Play 1.2.x. I recommend you to read it, even if I'm gonna cover some of the important steps here too.

Dependencies

This step is more easy that for the previous version of Play. Because now the only two things required are:

add the novus repo
add the deps to salat

Both in the Build.scala in the project folder of your Play 2.0 app.

**Edit (on 21th June 2012)**
Before you continue, this post that explains some basics about how to deal with Salat and Play, you can now choose to simply move to this module https://github.com/novus/salat/wiki/SalatWithPlay2.
It introduced the tricks that I'll explain below, and add amazing functionalities for Model and DAO creation.
So starting at this point, it's no more mandatory to read this post... unless you're curious ;-)
**end of EDIT**

Context
Here is the main thing I have to discuss here: Salat makes intensive usage of a structure named Context which holds a reference to what have been processed along the classes and structures.

Such instance is created by default in the package object com.novus.salat.global, and the quick start of Salat recommend to import it along with salat and annotations. Don't!

Doing so will fail when using Play 2.0 in DEV mode (only) because of the cool-hot-reload-on-change feature of Play. The specific case where it will fail is the following:

If you want to keep a static reference to the Grater instance of a specific case class.

Why? Because of the (needed for sure!) graters' cache which keeps a reference to the Class instance.

But this class instance might change on bytecode refresh, moreover the fact that Play has a specific ClassLoader for that (ReloadableClassLoader).

The result is incomprehensible errors when you change your code, which errors saying the there is a mismatch between classes that you even not change yet...

The solution (which is referred in the wiki page I told above)

Instead of importing com.novus.salat.gobal._ which only contains an implicit definition of the Context object to use in Salat core system, create a new one using the correct ClassLoader.

In the above Gist, we can see that we simply created a Context that will refer to the provided ClassLoader by the Play app itself.

That will keep enabled the class reloading, without impacting caches instances.

That's all folks!

Now you're ready to use both Play 2.0 and Salat without messing around with conversion between DSM and MongoDB and so on.

Have a gode work.

Neo4J with Scala Play! 2.0 on Heroku (Part 9) :: Final

2012-02-29T23:03:00.001+01:00

Note

This post is a continuation of this post (which started there), which is the last part of a blog suite that aims the use of Neo4j and Play2.0 together on Heroku.

What have been accomplished so far:

install Play 2.0
install Neo4J
use Dispatch
create model
create persistence service in Neo4J
create views and controllers

Ok, where almost done. Let's see how to deploy the whole app on Heroku.

Heroku, here I come

But wait, who're you?

Heroku is simply one of the best cloud players of the moment, I won't talk too much about it, because i'd have to talk a lot otherwise.

But here are some very interesting features and paradigms followed by Heroku.

Process Centric

Where almost all other cloud providers are binding their services to server instances, CPU flop count, memory usages, and other similar metrics, it is a fact that their aren't easily forecastable and hard to track in development phases (even if I encouraged to do it, though).

Heroku comes with a much more easy concept, that is, Web Dynos. A Web Dynos is simply a process that can handle requests. So, what if the requests are too numerous? Just add Dynos. Note that Dynos are existsing for background process, one Dyno by worker.

Costs are very simple too, you have one free Dyno by month, and the rest is billed at low cost by hour.

Thanks for simplicity.

Remote CLI

We've just ask how to handle more requests in an efficient way, and answered by adding dynos.

So far so good, but how? That's where comes the Heroku remote CLI that is able to operate remotely on a deployed application behavior.

Thus, adding a dyno is doing that in console : $> heroku dynos 1

Now, alerts on performance are quickly resolved.

Thanks for rapidity.

Continuous Deployment

The paradigm followed by Heroku to deploy their app are based on Continuous Deployment.

Having that, you app should define how you app must be deployed using their Procfile.

And it will be deployed automatically when the sources are pushed to the Git repo that is created for each application.

This ensure you to at any time be able to retrieve the sources related to the running instance (for example).

Thanks for debugging ease.

Add-Ons

What to say? A good sdk to create add-ons, a good architecture and service level. It makes a pleiade of powerful add-ons including Neo4J running instances.

Thanks to be open.

Can I Play! with Heroku?

Of course, you can!

Actually, Heroku has already integrated Play! starting with its first version, and has also added the scala support some time ago.

And finally the Play 2.0 wiki is explaining how to do...

Ok, let's Go then.

Getting started

First of all, you must have been registered too Heroku. Hopefully, it's free and fast. So go on, and create your account here.

Having your account, you can now install the Heroku toolkit belt. This will gives you acces to your Heroku CLI that can manage your account, apps, and app configuration.

When you're done with the installation, you just have to login using the console command: $> heroku login

Play! app side

What is needed for your app is to have a Git repo and to contain a Heroku process description file.
Since everything is already explain there, I won't go into deep details.

Create Heroku app

Since we are using Play! and scala, we need a JVM, that stack at Heroku is called Cedar.

So, to create your app, open a shell and do the following:

heroku create my-playing-app-with-neo4j --stack cedar

Now, you have an up and running environment to setup and deploy your application. And the application will be named my-playing-app-with-neo4j.

Neo4J add-on

Ok, but I've to use a Neo4J database, not embedded (too heavy for a cloud). Do I have to install it somewhere and host it myself. Na!

Neo4J's team is actually working on an integration in the Heroku platform, and a beta test add-on is available at the time writing.

That says that to have a running database that we can use, we just have to open a shell (in our app folder) and drop the following command: heroku addons:add neo4j

You don't believe it, huh?

Since you'll need to retrieve the database url and credentials, either you go to the Heroku site and...
Na, just keep your shell and do: heroku addons:open neo4j

Ta da!

App update

In my previous post, for the sake of simplicity I left the Neo4J database hard coded to localhost:7474.

But now, we have to update this to use our deployed Neo4J instance and credentials.

We should have (must) define an application configuration property for such paramater, but It is not what I want to illustrate here so let's keep it simple and hard coded.

But we have to add something to the Neo4J's Dispatch url, the credentials. For that we just have to do the following:

SSH key

Just a note, to remember you to add your ssh key to Heroku. This is simply accomplished (after you've have created 'em) using the CLI: $> heroku keys:add

Beginning to love this CLI, no?

Procfile

This is the Heroku configuration file that tells the continuous deployer (if I can say) how the application will be deployed and are its needs.

This file is located at the root of the Play's application folder and only contains one line:

web: target/start -Dhttp.port=${PORT} ${JAVA_OPTS}

This simple line tells that we need a web process for the staging application located under target/start

Actually, this folder will contains the staging Play! application after Heroku will run sbt clean compile stage on it.

Aaaaand Deploy! (push)

Getting closer to the end!

After having added all the necessary sources to the local git repo for your app, (including last update and the Procfile), we can now commit everything and push it to the git repo that Heroku holds for our application.

Actually, when created the Heroku app, the CLI has updated the git local configuration to add the related remote repo called heroku.

So, the only thing that left to do is to push: $> git push heroku master
To test if it is ok: $> heroku ps. This will display the proceses running on Heroku.

If the process is shown, let's open the application in our default brower (leave your mouse alone and...): $> heroku open.

I hope that I didn't made too many mistakes and you are now able to see your application running and using Neo4J.

At least, here is the one I succeed to deploy: http://scala-plays-with-neo4j.herokuapp.com/.
I've also shared this app on Heroku's GenSen that is meant to share project template on Heroku.

Now, you should love the CLI, but also Heroku, and Neo4J and Play! 2.0 and Scala and Dispatch and arbor.js and...

Thanks for reading, if someone do have ^^.

Neo4J with Scala Play! 2.0 on Heroku (Part 8) :: Scala template+Arbor.js to browse Neo4J via Play 2.0

2012-02-27T23:43:00.000+01:00

Note

This post is a continuation of this post, which is the seventh part of a blog suite that aims the use of Neo4j and Play2.0 together on Heroku.

Viewing Neo4J Model Object in Play2.0

Goal

In this post, I'll talk about some functionalities that Play2.0 offers to create web application/site.

The main goal will be to have html views that enable us to create User, Group and link them, but not only, we'll use arbor.js to view what's being created or linked in Neo4J as a... graph of course.

Basically, it will consist into one html page, containing several forms for creating model instance (or link) through AJAX call on Json controllers.

So let's begin by explaining how to define a querying and persisting controllers using Play 2.0 Form.

Controllers

In that case, we'll take basic needs for our use case, that is, to retrieve the users list stored in Neo4J or create a new group.

Get Users

Briefly, Play 2.0 as the notion of controllers to handle server request, such controllers are bound to urls using a route configuration.

So what we have to do here is to create a controller, let's say Users, with a handler named j_all for list of users rendered in Json.

Using what we've discussed in previous posts, such controller and definition are rather simple, check this out:

As we can see, we have simple call the Model persistence utility object to retrieve all defined User in Neo4J. Which we are rendering directly in Json thanks to their Formatter. And finally, we stream the result in the http response.
Mmmh, simple no ? Here we did:

send a Json request to Neo4J requesting all nodes that are linked to the root using the kind users (found using the User's ClassManifest)
retrieve the Json response from Neo4J and un-marshall them in a List[User] (using the User Formatter)
re-render them into the expected Model Json Format (again using the Fomatter)
generate the String representation
append it in the response body
define the content type as being Json

In one single line.

To test it, roughly, just use this url http://localhost:9000/users.json. This will return a Json encoded response.

Create Group

Now, we want to add the possibility to create a new group remotely. For that, we'll create a controller Groups which defines a create handler.

This handler expects to receive a group name. After what, it creates the group instance and persist it in Neo4J.

To recover such request parameter (in a POST since we are creating something and changing the server state), we use a Play 2.0 construction play.api.data.Form that offers a lot of helpers to parse the body into a map of values (can be embedded).

In the following example, the goup name is extracted form the request's body (url encoded) as a nonEmptyText mapped as name. This is a helper mapping for extracting String that cannot be empty.

As we can see, the Form can be directly rendered in the Model instance by giving an apply and unapply functions after the mapping definition.

Javascript Routing

Using static urls are cool... no ok, let's try to use what some calls Web 2.0, you know Ajax.

The problem comes when you have to deal with Urls within Ajax calls. How to keep track of your urls changes for instances.

Pretty hard, so let's forget about hard coded urls in your javascript and use a routes file that can be downloaded client side. This routes file contains all your controllers' url mapping that you want to be exposed in javascript (if I can say).

How it works is simple:

Use Routes.javascriptRouter to define a javascript object and the controllers to be remoted
For each of them, you must use the following object controllers.routes.javascript..
This object is created at compile time when defining the controller in the route conf file
defines a handler (in the Application controller f.i.) that return the result of the javascriptRouter as being javascript file
route this new controller to what you want (like /js/routes)

Having done that, you are now ready to use the created object in the javascript part.

If we take the controller controllers.Users.j_one (returns a User base on its given id), we'll have in our javascript access to a js function playRoutes.controllers.Users.j_one(id) that can takes an id.

By using this js function, you'll have in return a js object that defines at least two useful properties:

url: the formatted url for the controller (having compiled the parameter in the url)
ajax(c): a jquery (by default) ajax function that takes a payload object, but already defines the url and the method.

So far so good, but to use all of these stuffs, let's see in a coffeescript (thanks Play 2.0) example:

In the previous example, I wrote the ajax call my-self using jQuery... so I could have simply use the ajax property. But nevermind, I love sometime to be control freak.

C'est chic! No?

Arbor.js

For browsing our model graph, I've used arbor.js as the rendering framework, because it's the best one for graph... that's it.

Since my intent here isn't to explain it, I'll leave you alone with that part. But I recommend you to browse its site here.

So what I did is simply using Users as nodes, all linked to a central root node. Clicking one them will show you their inter-relationships.

I've also added a select box that helps you showing all users in a chosen group.

Taking that the next post will be on how to deploy the whole stuff on Heroku. I don't have at this time any instance in the wild, but if you wish you can clone (and fork) my repo on github for this posts' suite.

But here is a preview of what has been achieved.

Fun but not so cute -> I'm not a designer... :'(

Next post, the last, will talk about how to deploy this whole thing onto the Heroku PaaS.

Neo4J with Scala Play! 2.0 on Heroku (Part 7) :: DSM+DAO+Neo4J+Play

2012-02-25T00:40:00.000+01:00

Note

This post is a continuation of this post, which is the sixth part of a blog suite that aims the use of Neo4j and Play2.0 together on Heroku.

Using Neo4J in Play 2.0... and simple DAO

What I'll intent to show is a way to use a Domain Specific Model, persisted in a Neo4J back end service. For such DSM, we'll have an abstract magic Model class that defines generic DAO operations.

For simplicity, we'll try to link each category of classes to the root/entry node. For instance, all the Users will be bound to the entry node by a reference of kind user.

Model

I'll choose the very common use case, that is, Users and Groups. Here is its shape:

A User has a first name
A Group has a name
A User can be in several Groups
A Group can contain several Users
A User can know several Users

Let's keep the classes definition aside for a few, and stick to the persistence service.

Graph Service

The Graph Service is an abstraction of what is needed for a Graph Persistence Layer. It is bound to a generic type that defines the model implementation and defines traversal and persistence operations of graph's nodes.

Graph Service for Neo4J

Let's update now, the service that has been used in the previous post, for Neo4J persistence, in order to have it able to deal with model instance.

Let's start with the saveNode operation to see what is needed in the model and elsewhere.

In this Gist above, I've enlighted some points that must be found around the Model construction. (A) and (C) are composing a Json Format (as SJson propose), (B) is more related to model abstraction.

(C) has a special need when used with Dispatch, we could have a Dispatch Handler that can do both action parsing/unmarshalling and direct use in a continuation.

Model

Now, we are at the right point to talk the Model, since we've met almost all its requirement. So let's build a Magic Model class that can be extended by all concrete model classes.

Skeleton

That's the easy part, we just define the id property that is an id (part of the Rest Url in Neo4J).

Formatter

Ok, this part is simple too in this abstract Model definition because, a Format implementation must be part of the concrete DSM classes. That is, User that extends Model must define a Format[User] instance, and put it in the implicit context.

So, at this stage we have Model and User like this:

Class -- Relation's kind : F-Bounded

As we saw in the saveNode method needs to associate the concrete class to a relation kind. But what I wanted is to have a save method in Model, that implies that we cannot (at first glance) give the saveNode the information needed, that is the concrete class.

For that, we'll use a F-Bounded type for Model, that way we'll be able to give the saveNode method what is the really class... Mmmh ok, let me show you: But that's not sufficient, the saveNode method will need to use such available ClassManifest to find the relation it must create.

I choose a very common and easy solution, which is having a function in the Model companion that helps in registering classes against relation kind.

Model Dispatch Handler

Now we'll discuss something I find really useful and easy in Dispatch, create a Handler that can handle a Json response from Neo4J into a Model instance.
For that, we have already defined in previous post a way to handle json response into Play's JsValue.

Now, what we need is to use the implicit formatter of all model concrete classes to create instances. And it'll be the way to reach the goal, except that a problem comes from the Json response of Neo4J: the data is not present at the Json root, but is the value of the data property.
So it breaks our Format if we use it directly.

That's why the above definition of the Handler takes an extra parameter which is the conversion between JsValue to JsValue, that is to say, a function that goes directly to the data definition.

saveNode

Finally, let's gather all our work in a simple implementation of a generic saveNode function: As we can see, it's very easy to handle Neo4J response as DSO and use them directly in the continuation method of the Handler.

Usage

having all pieces in places (check out the related Git repo here). We can now really simply create a User and retrieve it updated with its id, or even get it from the database using its id.

In the next Post, we'll create some Play template for viewing such data, but create them also.

Neo4J with Scala Play! 2.0 on Heroku (Part 6) :: Dispatch+Play 2.0

2012-02-21T00:29:00.000+01:00

Note

This post is a continuation of this post, which is the fifth part of a blog suite that aims the use of Neo4j and Play2.0 together on Heroku.

Using Neo4J in Play 2.0

In this post, we'll create a Dispatch Handler that handles Neo4J Rest Json calls in Play's Json object.
Having this in our hands, we'll be able to create a really simple service for dispatching Neo4J operations and use them in Play's views.

I've compiled the Play app on github, fork me.

NB: we'll use Dispatch but in case you wish to, you could use the Play's WS feature that might help you a lot (check this out).

Declare Dispatch deps

First of all, we have to update our Play app with the Dispatch dependency. For that, we have to update the sbt configuration file in order to add the related line.

Now, that we have updated the project, let update the application by reloading the configuration (if you're already in sbt console) and rebuild our IDEA project.

Play's Json Handler

Our goal is to use the Neo4J Rest Api that returns responses Json encoded.
So here, I'll show how we could have such response directly unmarshalled in Json object. In further posts, we'll use such handling feature to get Model instances directly (which is far more interesting).

What is necessary for that is to create a piece of code that is capable to take a subject and convert it to a JsValue. And since we love functional programming, let us have this method taking a continuation that accepts a JsValue.
In this listing, we see that we use the text parser to consume the response payload, then we ask the Play's Json parse function do its job.
Finally, we use the continuation applied to the parsed result.

Neo4J Service

Let's gather some utility urls to retrieve node, relations. In other words, urls for common usages. This service is left simple for further enhancements (next post). We see that most functions are there to create urls based on ids, but there is also the root one that directly fetches the entry node.

A Controller To Rule Them All

For the sake of this basic usage of our Handler with Neo4J, here are some examples of such requests. (full controller here).
As we can see, all we had to do is to create the correct url by using id, or Neo4J path conventions, then using the Handler operator ( >! ... how it's Play, no ?!), we have the facility to use directly JsValue instance to consume the result.
Okay, it's repetitive and the Json traversal is not shared. Let's us put this aside until the next post.
And before going ahead, I've created a pretty simple and naive view and url mapping. So check the sources on github, play it and tests the /rest et al. urls.

Next post: Enhance the handler and service to manage Domain Object.

Neo4J with Scala Play! 2.0 on Heroku (Part 5) :: Dispatch

2012-02-19T22:32:00.001+01:00

Note

This post is a continuation of this post, which is the fourth part of a blog suite that aims the use of Neo4j and Play2.0 together on Heroku.

Intro

In this suite, we're gonna use Neo4J graph database through its HTTP REST api, which is quickly introduced in this post.
So that, we'll have to communicate with the server using HTTP, here comes the Dispatch scala library.
Neo4J uses Json as the resources representation, we've already discussed this subject in Play 2.0 in this post.
How to stick them together will be discussed in the next post.
Here we'll concentrate on some introduction to Dispatch's DSL for making HTTP requests and on a powerful abstraction of the body parser, that is, the Handlers.

Dispatch

The DSL

Dispatch is a very powerful library for communicating through the HTTP protocol, offering a DSL for making such queries, but also for using their responses.
Where we all know, that response can be of different content-type, the DSL presents easy handler for them.
Getting back to queries, an HTTP request has been given a method like GET or POST, when dealing with RESTful services, we'll see PUT and DELETE in the game.
When data must be provided we'll have to pass some arguments/parameters in the request payload (or url).
Let's see how those actors compose in the Dispatch's DSL.

URL

The url of the HTTP request is basically composed of the host and port, followed by path elements. Here is how to create an url such url http://dispatch.databinder.net/URLs+and+Paths.html

Attributes and Headers

After having build a Request (see above), you now have access to some modifier on it.
The probably best way to learn all of them will to check the source code here.
But here is some examples:

Method

The method is the quite more simple... juste append the name to the request, in order to change the method from 'GET' to another:

Payload

As for headers, you better check the code for know every tools, the library put in your hands. Here is some conventions I understood from the code:

Executors

Dispatch works with executors that execute the queries and accepts handlers for handling reponse.
Such response waiting is configurable by choosing among several implementations, the documentation is well suited on the wiki.
But in our case, what we need is synchronousity, because the RESTful service is also out backend service. So the executor comes with the `dispatch` package, this way:

Handlers (Response's Body)

Here's come the sfun... Handling responses.
Actually, this piece of code comes in the game just before we apply the request to the executor, but ok, let's takle it now.
A handler can also be called the response parser, that said, it is responsible to parse the whole content into a new form.
Some existing handlers are:
Basically, a handler defines an operator, a result type and takes a block that deals with such result type instance.
Some examples of such handlers are made on the wiki, especially for Json or html (here, here and here).
Others handlers exists by default, for redirecting to outstream, to compose or chain handlers.

In the next post, we'll see how to use Neo4J Json Rest Api with Play! framework through Dispatch.

Neo4J with Scala Play! 2.0 on Heroku (Part 4) :: Play 2.0/Json

2012-02-19T22:19:00.000+01:00

Note

This post is a continuation of this post, which is the thrid part of a blog suite that aims the use of Neo4j and Play2.0 together on Heroku.

Play2.0 - Scala - Json

I'm about to write a quick wrap up, of some Play20's wiki entries and stackoverflow that were all related to Json in Play2.0.
For that, I'll take some usage examples, but also present the underlying libraries (Jerkson) and the used paradigm, SJson.

Scope

Giving that the wiki pages are really clean and self-explaining, I'm not gonna enter deeply in how Json must be used with Play 2.0 albeit I'll give some prerequesites in order to help you understand how I'll use the Neo4J REST API.

play.api.libs.json

This package contains everything you'll need in order to work with Json in Play 2.0.
It defines important structure like JsObject, JsArray and even some like JsUndefined.
They usage is very easy since they are based on classical scala's Map and List of JsValue(parent type).
Here is an example of creating a JsArray and iterating items. To test it in a REPL, I recommend you to enter the Play console by using play in your repo and the console in sbt (to have all libraries loaded).

For arrays, it's quite easy (maybe wrapping could be annoying but a little of pimping can resolve that).
The Jerkson library powerful comes with JsObject usages. It has defined a very clean DSL for querying Json. Here is an example for querying a property or catch a descendant property.

play.api.libs.json.{Format, Reads, Writes}

Until now, you saw that Json is usable with Play. But, I guess that you hope more than that, since this framework is here to ease the work.
And you're right.
Play is coming with a SJson flavor for serialization and deserialization of DSO.
Three traits come in the game.

Reads

Reads defines a simple method reads:
Having a Reads defined for a type T, we can now extract such instance from Json.
With the object Reads that defines an implicit Reads instance for most common types like Option, String, Short ...

Writes

Writes, like Reads, is very simple and defines a simple method writes:
Obviously, its purpose is to convert a value of type T into its Json representation and is the inverse function reads.
A Writes object is defined too with conversion to common types.

Format

This trait is there to put the pieces together
With the help of this trait, we can now have a serializer of custom domain object from/to Json.
A good practice is to define such Format into the companion object of your DSO, so that it will come in the scope at once when using it.
Here is an example:
Let's see how to use this Format easily to go back and forth from DSO instances.

play.api.libs.json.Json._

This object is an handy one, that defines four convenient methods. Two are here to play with String and JsValue. The other are to play with types and JsValue.
Back to our simple DSO class, we can now do this:

Enhancement

In order to have more control on effects of your serialization, I would recommend you to consider the Scalaz library's Validation construct.
Indeed, it will help you having more relevant information and all at once if you reads is wrong.
Here is a talk about this (but not in Play).

In the next post, we'll talk about the Dispatch library. See it here

Neo4J with Scala Play! 2.0 on Heroku (Part 3) :: Play 2.0/Idea

2012-02-19T22:04:00.000+01:00

Play!2.0 Scala and Idea CI

This post is a continuation of this post, where I’ve introduced Neo4J and how to install it. But it requires that you already followed this post.

Goal

Event if the ScalaIDE 2.0 has just been released. I still prefer Intellij for editing Scala, because even it's community edition has a powerful and polish Scala support (after having installed the plugin...).
And while, playing with the 2.0 version of Play! you'll enjoy the use of CoffeeScript to heal your JS head aches, you'll also loved the CoffeeBrew plugin.
Here the goal is to help you create a Play! 2.0 RC1 scala project, update the sbt configuration for sbt-idea, and finally generate the IDEA module.
Having done such easy tasks, you'll have the full IDEA powerfull in your hands.

Create the project

First of all, we need to create the project for what we'll try to do (using Neo4J through it's REST API).
For that, you should have the play executable in your PATH setup, and able to run the following in console.
play new Play20WithNeo4J Which will prompt you some questions, that you'll answer the following (no color question, don't worry but a little remark later if you're using Windows ^^):
What is the application name? >
Play20WithNeo4J
Which template do you want to use for this new application?
1 - Create a simple Scala application
2 - Create a simple Java application
3 - Create an empty project
> 1
OK, application Play20WithNeo4J is created.
Have fun!
Basically, it asks you the name of your app, and it's language (note that we've chosen the scala way).

Setting sbt-idea

Play! in the new 2.0 version is using sbt to configure the project and run tasks (such as deploy start/run and so on). The sbt version in used is the 0.11.2 at the time writing, and is embedded with the install.
So that, you can already launch the sbt console by either run play or play console to have all Play! deps on classpath.
Before running any command, let's modify some sbt conf files to have sbt-idea being able to create an IDEA module for our project.
Play 2.0 comes with a default configuration for sbt, so that, those files are already present under the project folder:

Build.scala : contains the app information
build.properties : contains the sbt version
plugins.sbt : contains default resolver and play 2.0 RC1 Snapshot deps

Starting from there three actions are required in order to import sbt-idea.
Create build.sbt
We have to create a new file name built.sbt in which you'll add the TypeSafe (Scala company) as a searchable repository.

resolvers += Classpaths.typesafeResolver

Update the plugins.sbt
Add the needed reference to the plugin sbt-idea and add it to the plugin list of sbt.

resolvers ++= Seq(
    DefaultMavenRepository,
    Resolver.url("Play", url("http://download.playframework.org/ivy-releases/"))(Resolver.ivyStylePatterns),
    "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
)

addSbtPlugin("play" % "sbt-plugin" % "2.0-RC1-SNAPSHOT")

resolvers += "sbt-idea-repo" at "http://mpeltonen.github.com/maven/"

addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "0.11.1-SNAPSHOT")

libraryDependencies += "play" %% "play" % "2.0-RC1-SNAPSHOT"

Update the Build.scala
Add the sbt-idea plugin repository in the resolvers' list.

import sbt._
import Keys._
import PlayProject._

object ApplicationBuild extends Build {

  val appName = "playbasket"
  val appVersion = "1.0"

  val sbtIdeaRepo = "sbt-idea-repo" at "http://mpeltonen.github.com/maven/"

  val appDependencies = Seq(
  )

  val main = PlayProject(appName, appVersion, appDependencies, mainLang = SCALA).settings(
    // Add your own project settings here
    resolvers ++= Seq(
      sbtIdeaRepo
    )
  )

}

Create IDEA module

Now everything is quite simple. Get back to the project root and use play. When entered the console, the module will be build by simply running idea.
Having the iml created, all you have to do is to open Idea, create a project and import the module file right after.
Note, if you're encountering problem with the scala environment (that happens at first time). You'll have to create another module (short-lived), then add it the Scala facet where you'll be able to configure the Scala compiler.
When done, you can delete the module, and getting back to the original, you'll be able now to refer to the Scala compiler/libraries.

Let's move to Json and Play

Neo4J with Scala Play! 2.0 on Heroku (Part 2) :: Neo4J

2012-02-19T21:58:00.000+01:00

Neo4J

This post is a continuation of this post, where I've introduced for what is dedicated this suite.
In the current post, we'll talk a bit of Neo4J and why I've considered it as a back end service for storing data in one of my later spikes.

Some works on Graph DB

A graph database is a kind of NoSQL, that stores neither data as KVP nor as Column even nor as Collection of document, but as... Graph.
Ok, ok, it's quite obvious but what to say more, maybe that it's querying is very interesting because it relies on a notion of traversal, that would require joins on joins in classical RDMBS.
The main purpose of such graph storing is high dergeed inter-connected data, as Social data are.

Neo4J

I quickly fall on this product for handling my test domain model that figures the social network use case, where users are connected to users, and participates in groups.
Neo4J is written in Java, where a lot of NoSQL database are C++ based. Moreover, where most of NoSQL databases requires standalone installation to work, Neo4J is able to create Embedded databases running instances (look like the jetty revolution for servlet containers).
Finally, my choice was arrested when I saw its RESTful interface; still in working status but which is promising, with url patterns auto-discovery using the service root response.
Final Finally (I promise), I saw that Neo4J community was huge, that Spatial was already taken into account (GIS has highly inter-connected data, and it'll be probably my next spike). But also, it offers good usage of Lucene as back end indexing provider.

Install It, Start

Quite simple, and I wouldn't expand my self to much on the topic rather than pointing the extractable http://neo4j.org/download.
Drop the extracted folder wherever you want, and set your PATH to target the Neo4J/bin folder where is located the neo4j executable.
When it's done two choices are up to you:

install as a service: http://docs.neo4j.org/chunked/snapshot/server-installation.html
run it by hand by simply do the following in console: $> Neo4J start

Sanity check...

The server will run listening on your localhost:7474/. Using your browser you'll directly be redirected to the webadmin interface.
Since I can talk further on this web admin, I'll introduce some features I loved.

... WebAdmin (is your friend)

The Neo4J web amin interface offers a way to query your graph very easily through a simple string using tags (like Lucene query string) in order to match nodes, relationships, paths, indexes and so on.
So far so good, and? Ok it presents you the result in an editable table...
Mmmmh exiting... Ok and you can see your graph using their arbor.js based viewing tool. Ha ha!
The other tool I like is the web based console for trying Cypher or Gremlin queries or even to try the HTTP REST interface.

Let's create the project and configure IDEA

Neo4J with Scala Play! 2.0 on Heroku (Part 1)

2012-02-19T21:54:00.000+01:00

Neo4J with Scala Play! 2.0 on Heroku

In this new posts series I’ll try to gather all steps of a spike I did building a prototype using scala and a graph database.

Chosen Technologies

Play! Framework as the web framework, in its 2.0 version built from sources.
Neo4J as the back end service for storing graph data.
Scala for telling the computer what it should do...
Here is an overview of what will be covered in the current suite.

Play 2.0 Framework

This post intent is not to explain how powerful is Play framework (2.0) is. For that I'd recommend this wiki page.
However we'll explain all needed steps to build it from sources.
When I first wronte this post, the released version wasn't suficient for what I needed to do with Neo4J. But now you could just download the RC2 and unzip it somewhere, find it here.

Prerequesites

In this section, we’ll assume that you’ve already setup your scala and git environment (oh yeah and the JDK as well, and not the JRE only! we’ll need javac). If not please refer to those sites:

http://www.scala-lang.org/node/201
http://help.github.com/set-up-git-redirect

G[e/i]t sources

First of all, open your preferred Git tool and retrieve the sources (warn: choose a unix like path, otherwise you might encountered problems with spaces for instance).

Use git clone git://github.com/playframework/Play20.git. And wait for having all sources downloaded.

First step in sbt

Seconds later, open a console and do the following to run the built tool used by Play 2.0, that is sbt.
cd Play20/framework
build

This will launch the embedded sbt (0.11.2) which needs some libraries automatically fetched.

Build and fetch

While being in the sbt-console, you can now ask sbt to build the framework and fill in the local Ivy repository with needed libraries (Play2.0 runtime deps).

Enter build-repository in the console and hit enter.

Minutes later, you”ll be able to quit the console by CTRL+D, and to check what happens in you Play20 folder.

Actually, aside the framework folder, you have now a folder named repository that contains every needed deps (includind play).

Let’s check by listing all files in play: ls Play20/repository/local/play and find libraries such play, anorm, template for the scala 2.9.1 version.

Done!

Great!

You’ve just finished the Play2.0 installation.
You can, for convenience, update your PATH to point to the Play20 folder (where resides the play executable)

Let’s move to the Neo4J setup