czwartek, 20 stycznia 2011

Slicing the Cake

If you are interrested in Scala, you've probably heard about Cake pattern. If not, there are good sources available online.

Cake is often considered as the most idiomatic way to do Dependency Injection in Scala. Yet surprisingly, until now nobody (to my knowledge) has stated publicly, that it is actually a composite pattern, and that DI is only one part of it.
The result is that people, who just want DI apply Cake pattern with its full complexity. Usually that complexity is totally unnecessary and only makes code harder to understand.

Some people have already noticed, that something is wrong with that. I've also had a gut-feeling, that something is a bit too clever for me, when we were starting a messaging gateway project for GSM operator in Scala. Therefore we've sticked with old-fashioned yet proven manual constructor injection, something along the lines of DIY-DI. Although people expressed their criticism of the pattern, nobody stated precisely what the problem was, though. Neither me, until recently. Just few days ago it struck me. I've done a little research, re-read the original paper, and got it. Let me slice that Cake to its constituent layers to see, what it has inside. Then you will clearly see what's the problem with that pattern's common usage.

Self-types as a mean to inject dependencies
What is the heart of Dependency Injection? With DI, components neither lookup nor create their own collaborators. They simply declare, what are their dependencies and rely on external code (usually some framework or manual factory) to provide that dependencies for them. Component can declare its dependencies in many ways, chosen by developer and constrained by DI framework used. For example dependencies can be injected via constructor, via setters, via private field. The actual injection is either done manually (in a factory class), or via framework, that usually is configured via a mix of annotations, XML, plain code and some defaults.
The main benefit of so-achieved DI is freedom to independently test individual components in isolation. As a bonus you also get an explicit view of all components' dependencies and gain an order in component lifecycles, not to mention some frameworks' additional benefits, like AOP.
Scala adds one more method for a component to declare its dependencies: it's self-type annotations.
Technically, declaring a trait's (let's say BuyerAgent) self-type as (let's say) WebCrawler is similar to declaring all members of WebCrawler in this trait. The self-type (in this case type WebCrawler) becomes the dependency of our BuyerAgent, because now we can refer to WebCrawler's members from our BuyerAgent's methods.

trait WebCrawler {
def goTo(target: URL)
def pageSource: String
}

trait BuyerAgent { this: WebCrawler =>

def buyItem() = {
// all WebCrawler members are accessible
// thanks to declared self-type
goTo(new URL("http://ebay.com"))
val ps = pageSource
// if item price is ok, click "buy"
}
}

Note, that this dependency is in a sense more intimate, than one declared (let's say) via constructor parameter and kept in a private field. Our example trait does not HAVE its WebCrawler accessible via a field, it IS the WebCrawler. In practice, this enables you to skip the references to WebCrawler field when using WebCrawler's features (because there is no field in the first place).

/* Traditional, constructor-based DI
(note only classes can have constructors) */
class BuyerAgent(crawler: WebCrawler) {

def buyItem() = {
// all references to WebCrawler members
// are explicit
crawler.goTo(new URL("http://ebay.com"))
val ps = crawler.pageSource
// ...
}
}

This is both good and bad in my opinion. It's best, when dependencies declared in that way really model parts or layers of that component (trait). Used this way, self-types allow you to conveniently split one, big component into many independently-testable and replacable layers, at the same time stressing the tight connection between them. Extracting them to separate classes would make them look as more independent, than they are. Though in my opinion, self-types can be also abused, when component uses it to obtain a dependency, that certainly IS NOT its part. That is pretty much a matter of taste probably.

So going back to Cake, let's look at it. What does it have to offer in terms of DI? Idiomatic usage of self-type. What other Cake's features do you need to implement DI? Absolutely none. If you applied the whole Cake only in order to get DI, you can safely remove Cake's other elements. Your code will immediately get more clarity and simplicity with benefit for you and your team.

So, what are other Cake's constructs (like nested types) for?

Two things.
Unwanted interchanging component's parts prevention.
First, nested classes in Scala work slightly differently than in Java. If you create more than one instance of your component, you won't be able to interchange their inner parts. For example, if your component has inner class named Part, you won't be able to take a single Part from it and pass it to your second component's method (even if that would be no problem in Java, since the types look the same). Such constraint has been enforced in the Scala compiler code. If you can benefit from such protection, Cake's nested classes are for you. Personally I haven't seen a need to use that yet (I see no problem in mixing inner parts between different instances), though I would be interrested to see a case, when presence of that feature makes a real difference. Maybe in Scala compiler code it really did, I don't know. It just doesn't attract me much enough to make me find it out for myself.

Modeling families of types that vary together covariantly
Second, nested classes have access to all members of enclosing trait. If you move them outside the enclosing trait, access to that members is lost. In the original paper, two nested classes of SubjectObserver type refer to its common type members - type of Subject and type of Observer. Both type members have variance annotation. This way, when you extend SubjectObserver refining type members, your updated types immediately propagate to both nested classes and compiler helps you get the variance right. That lets you enforce some domain constraints when you design reusable components for others. In the SubjectObserver example, that forces all extending types to refer to each other in a consistent manner.

So besides of DI, we got two distinct features of Cake pattern. Each of them allows us to enforce some design decisions with the help of compiler. In one case, this is increased integrity of component instances, in second, ability to model type families, that change together. None of that features is so widely popularized like the Dependency Injection aspect and probably they deserve a better exploration.

Nevertheless, Cake is a composite pattern and DI constitues just a part of it. If you consider using it to get DI, just use only self-types and you'll be better off. Nested classes' part of Cake will only obscure your code, when you don't plan to leverage their specific features. As you can see, you probably aren't going to need them most of the time, so it's much better to keep things simple.

Why is the pattern most often applied in its full form, when usually only DI matters? I think, that many factors come into play here. Many new concepts at once after coming from Java, insufficiently thourough understanding. Whatever the reaons are, usually we are much better off sticking to the KISS rule, though :)

Originally I planned to add code samples, that could easily show similarities and differences between DI implemented with Cake pattern and with plain, old, constructor injection. Because I'm kind of new to blogging and don't want to spend additional time integrating syntax highlighting now, I'm publishing this post without it. If you are interrested I will add them.
Please share your thoughts on this topic, I'll gladly learn something new and discover new points of view. BTW, if you are native English speaker, I'll appreciate any corrections of my sometimes odd grammar and vocabulary ;) Thanks!

8 komentarzy:

  1. I think code samples very a lot to understand what you have in mind :). Especially on more complex topics.

    As for the self-types usage, I think the whole purpose of the MyServiceComponent with a nested MyService class is to achieve proper namespacing. That is, when you depend on a component, you don't want to get all of the service members, but you want to have access to the service class.

    Moreover, there's also the question about which instance of MyService would these methods refer to (if they are all pulled in). The method to obtain the instance can either create a new instance, lookup a scoped instance (in case of web apps), create an instance based on arguments etc.

    Also there's the good question "what is DI" ;).

    Adam

    OdpowiedzUsuń
  2. @adamwtw Thanks for your feedback, I hope my samples will help to see my point.
    Is the question about DI definition really that good? I'm using the obvious definitions from http://en.wikipedia.org/wiki/Dependency_injection and http://martinfowler.com/articles/injection.html here.

    In what sense is the namespacing from your example (http://www.warski.org/blog/?p=291) "proper"? It's fairly arbitrary too me. You could easily change all of your inner types into external with only a minor adjustment - members obtained via self-type should become declared as constructor parameters of the external types - for example:

    class UserAuthorizationImpl(userRepository: UserRepository) {
    /* previously, constructor
    was parameterless */
    }

    Then you got no namespacing at all, while still having all access you need (and no others unnecessary members).
    The namespacing in Cake serves completely different purposes - I've mentioned it in the post. It is unnecessary in your example, in fact it is harmful, because it obscures the code.

    Your question about instance of MyService used is unrelated to namespace concerns - you have total freedom of implementation choice regardless you use full Cake Pattern or self-types only.

    OdpowiedzUsuń
  3. Hmm well DI is really:
    1. using abstraction in the form of constructor parameters/setters/etc
    2. (more psychological) remembering no to use "new" too much

    What is really nice about DI are the containers, which give you features like scoping, auto wiring, AOP, interceptors, etc.

    For me the *Components+self types in cake are just a mean to get auto wiring. And a pretty constrained one, as run-time configuration options are nearly 0.

    So changing the classes to use constructors would really be a pain to maintain once the number of components gets larger.

    Adam

    OdpowiedzUsuń
  4. @adamwtw
    If you use DI only to be able to use DI containers, you miss its biggest benefit. The key value of DI is separating object-wiring logic from application logic, so you can test them independently. DI gives you that, no matter whether you use containers or not.

    Used wisely, containers can get you a lot of good, as they are powerful tools. It's a double-edged sword, though. They can easily cause more problems than they solve, especially when you strive for runtime flexibility (most often not needed - hence the current trend to use annotations and static code instead of XML).
    It's also quite common, that a DI framework actually makes testing of an application harder than with POFs (Plain Old Factories), where you have the full spectrum of language features (including real polymorphism) at your command. In addition to that, you need no special IDE plugins to handle POFs with ease.

    So in some settings DI containers are not the best available option.
    Cake pattern (or strictly speaking, self-types) is most useful in those settings. Trying to compare it with DI containers misses the point.
    So I agree with your note on run-time configuration options - it's not what Cake is for. I'd say more - even the auto-wiring you mention is not that "auto" it may appear to you. Note, that you have to choose your dependencies at compile-time, by mixing them in via "with" keyword. All run-time flexibility is gone - disaster!

    In fact, you still can get all the run-time flexibility you might need, but as with other features of DI containers, you're on your own - Cake will not force any specific choice on you. Nor will it help you to achieve that - it just will not get into your way.

    The main point of my post however is not about comparing Cake with DI containers. I have pointed out, that Cake is actually a composite pattern, and if you want DI only, you don't need the whole funny (and/or confusing) namespaces stuff. From the whole Cake, only self-types are required for DI. People often don't see that, which leads them to write unnecessary, confusing code. This unfortunately applies even to people I highly regard for awesome blogs.

    To asess, whether manual DI or container-supported DI is better for you, see the relevant post on the blog of Miško Hevery.

    When POFs are better in your context, Scala makes them significantly easier to maintain thanks to its shorter syntax (no need to write stupid this.a=a; this.b=b; this.c=c constructor stuff for example), support for lazy vals and self-types among others.

    OdpowiedzUsuń
  5. Hmm somehow I'm not getting e-mail notifications for new comments...

    Anyway, I never wrote that I'm only using DI to use DI containers - just that the containers are a very nice addition, esp in bigger projects. They simply spare typing, and I'm no different from an average programmer - lazy :).

    Can you give an example where testing when using a DI container over a POF is harder? I can't imagine how it differs in code, probably because I don't understand exactly what you mean.

    About the Cake - that you don't need the namespacing - agreed (I never wrote I didn't agree ;) ). In fact you don't need self-types also (but then it's not Cake) - some form of interfaces/types and parametrized construction is enough. As I wrote, DI is simply using abstraction and kind of an approach to programming.

    Also, I think that auto-wiring and run-time/compile-time impl choice are orthogonal.

    And btw. what's "real polymorphism"?

    Adam

    OdpowiedzUsuń
  6. @adamwtw
    Agreed, I should have written "If you'd use DI..." instead of "If you use DI...". I just wanted to stress the important distinction between the core benefit of DI against its minor, although nice additions.
    Containers indeed can save your typing and time, though it happens, that opposite is true - it depends on how you use them (this probably deserves some other blog post).

    POFs make testing easier because you can extend a factory, selectively overriding chosen definitions, like in:

    class AppFactory {
    /* ... */
    def buyer = new BuyerAgent(crawler)
    def crawler = new WebCrawler()
    /* ... */
    }
    class CustomFactory extends AppFactory {
    override def crawler = new MockCrawler()
    }

    You can do that very quickly each time you need a custom configuration while having full compiler and IDE support for potential changes in future.
    Try to get such flexibility with popular Spring XML approach, even with context splitting, includes and classpath tricks. I deliberately don't mention Guice and Spring's recent features, because they (finally) make it good enough. Unfortunately, the old XML way is still much too common.

    With the "real polymorphism" term I wanted to express an idea, that certain combination of Spring's features gives you some kind of "emulated polymorphism". For instance it's when you split a large applicationContext.xml into 2 pieces, and then provide a modified version of one of them on the test classpath.

    Of course you don't need self-types to do DI. And I agree, that without self-types that's no longer the Cake pattern. Similarly, it's no longer Cake should you omit the namespaces thing. That's not the point, let's leave Cake what is it.
    What is more troubling for me, people tend to use it for DI without reflecting on forces engaged, making suboptimal choices when simpler alternatives fit better.
    Maybe that's because the pattern is lacking a description in a framework that accompanies the popular GoF patterns for example? My blog post tries to accomodate for that a bit.

    Speaking of namespaces, thanks to our conversation I've realized, that I was too quick to dismiss your point about saving keystrokes for constructor stuff. Of course there's much less typing when you nest a class inside a trait and use the declared dependencies inside. That makes the third benefit of nesting types in Cake and I should probably update my post to reflect that.
    Keeping that in mind, there's still no excuse to nest types when it doesn't make sense. For example, nesting a top level, independent trait inside another trait just to follow the pattern's overall scheme completely misses the point.

    You've correctly noticed, that auto-wiring and compile/run-time aspects are orthogonal. Worth remembering.

    OdpowiedzUsuń
  7. So I guess we agree on most points :) The confusion mainly came from the fact of me wrongly understanding the overloaded factory etc terms ;)

    Anyway, interesting post and discussion, thanks! And see you at java4people.

    Adam

    OdpowiedzUsuń
  8. You saved me from the Cake monster ;-) Thanks a lot for the sober explanation.

    OdpowiedzUsuń