Cornelia's Weblog

my sporadically shared thoughts on, well, whatever is capturing my attention at the moment.

Posts Tagged ‘cloudfoundry’

Changing quota definitions

Recently I’ve had several customers ask about whether quota definitions can be changed, both in the OSS product and in PCF.  The short answer is “yes”. Here’s the longer:

First, for those of you who might be unfamiliar with quotas, they are assigned to organizations and basically limit consumption within that organization.  A quota definition sets limits in certain areas, such as memory, number of service instances and so on.  The Cloud Foundry deployment has any number of named quota definitions defined, and an organization is assigned a single quota_definition.

Viewing the defined quota_definitions

Currently the quota definitions for your elastic runtime deployment are only viewable through the command line client.  As you surely know, we currently have two clis, the old (v5), which I will henceforth refer to as “cf” or v5, and the new (v6), which I will henceforth refer to as “gcf” or v6.  To see the list of quota_defitions you can use the following commands:

cf -t set-quota

That is, execute the command with no arguments and you will first be shown a list of the available quota definitions.

Or with the new cli, first set the env variable CF_TRACE=true and execute:

gcf quotas

In both of the above cases we’ve run with tracing turned on so that you can see the details of the quota definitions. This will be key as you will need to know the format and key names when issuing curl commands described below.

Setting org quotas

Given existing quota definitions, assigning one to an organization is done very simply with the following (v5 and v6) CLI commands:

cf set-quota --organization myorg --quota-definition trial

or

gcf set-quota myorg trial

The final argument in each of these is the name of an existing quota definition.

Creating, deleting or updating quota definitions

Quota definitions themselves are created one of two ways; either at the time that you are deploying the elastic runtime, and/or post deployment, using the api.

At deployment time

This is done by simply including quota definitions in your deployment manifest properties, for example:

properties:
  quota_definitions:
    free:
      memory_limit: 0
      total_services: 0
    paid:
      memory_limit: 10240
      total_services: -1
    runaway:
      memory_limit: 102400
      total_services: -1

PCF does not yet expose quota definitions in the Operations Manager (deployer), however, the good news is that regardless of how you deploy the Elastic Runtime, you may edit the quota definitions to your heart’s content after deployment.

Post deployment

To create a new quota definition using the API you simply need to issue a POST to the quota_definitions resource.  The simplest way to do this is with the (v5) cf curl command, as this will include your current token (obtained with a cf login) in the request; if you wish to use your favorite REST client you will have to futz with the auth token yourself.  Note that v6 does not yet (!) support gcf curl.

To craft the appropriate request you will need:

  • The method: POST
  • The relative path of the resource: /v2/quota_definitions
  • The body which will contain the details of the new quota definition:
    {"name": "default2",
     "non_basic_services_allowed": true,
     "total_services": 100,
     "total_routes": 1000,
     "memory_limit": 10240,
     "trial_db_allowed": true}

Putting it all together, the request looks like this:

> cf curl POST /v2/quota_definitions -b '{"name": "default2",
                                          "non_basic_services_allowed": true,
                                          "total_services": 100,
                                          "total_routes": 1000,
                                          "memory_limit": 10240,
                                          "trial_db_allowed": true}'

And, voila!, you have your new quota definition.  Go ahead and check.

Updating an existing quota definition is just as easy.  Recall from above that you can use v5 or v6 commands to get a list of the current quota definitions, with the details by having tracing turned on.  You need to find the relative path of the specific quota definition resource by looking at those details.  Then craft the appropriate request:

  • The method: PUT
  • That relative path: /v2/quota_definitions/286b4253-95de-442e-a2e8-d111c2adb2e2
  • The body; simply include the properties you wish to change:
    {"memory_limit": 20480}

Putting it all together, the request looks like this:

> cf curl PUT /v2/quota_definitions/286b4253-95de-442e-a2e8-d111c2adb2e2 -b '{"memory_limit": 20480}'

In response you will get the updated quota definition:

{
  "metadata": {
    "guid": "286b4253-95de-442e-a2e8-d111c2adb2e2",
    "url": "/v2/quota_definitions/286b4253-95de-442e-a2e8-d111c2adb2e2",
    "created_at": "2014-01-14T19:26:43+00:00",
    "updated_at": "2014-01-14T19:28:43+00:00"
  },
  "entity": {
    "name": "default",
    "non_basic_services_allowed": true,
    "total_services": 100,
    "total_routes": 1000,
    "memory_limit": 20480,
    "trial_db_allowed": true
  }
}

We are planning to add cli commands to facilitate this (stories: here, here, here and here), and are adding these capabilities to PCF Operations Manager, but in the mean time you have at least this level of control.

And one more note – we are just about to officially release the v6 cli (it’s been in beta) at which point we will rename it to “cf” – so just to keep you on your toes, in my next post on the subject, the new will be v6 and called “cf”. ;-)

Running a Local (as in, on your laptop) Cloud Foundry Instance

More than two years ago on the CF blog the availability of Micro Cloud Foundry was announced – this was a version of Cloud Foundry that would run on a laptop.  It came in the form of a vmdk that you could run either using VMWare Workstation or VMWare Fusion.  Later posts talked about a release lifecyle that would provide frequent updates to the Micro release, so as to keep it in synch with the constantly-being-updated public offering running on cloudfoundry.com. It was an interesting experiment that had legs for more than a year, but didn’t end up panning out all that well; I can share a personal anecdote that illustrates one of the reasons. More than a year ago when I was with EMC, I was working with some product groups that wanted to use CF, but not via cloudfoundry.com, but rather with a customized version of CF.  They deployed a private cloud in their data center but then wanted to enable their developers with a matching “micro” instance running on their personal workstations.  We looked into the process of getting a customized version of Micro CF but in the end the answer was convoluted enough to keep us from making any progress. As it happens, we weren’t alone.

And now, here we are on the verge of our first supported release and I’m happy to report two way cool things.  First, there are now numerous, very viable options for getting a local CF instance up and running, and second, the CF community has become so vibrant that two of the three options I’ll describe have come from outside of Pivotal. In this post I’ll share with you my personal experiences with and opinions on these three options.

I’ll go through these in the order in which they appeared on the scene, which is, incidentally, the order in which I began using them.

CF Vagrant Installer from Altoros

In April of this year the good folks at Altoros released the cf-vagrant-installer.  This project required you have vagrant and a hypervisor (no license fees if you use Virtual Box), as well as ruby installed on your host machine (laptop), and then allowed you to clone the git repo and run the “vagrant up” command.  Worked really well if you timed it just right, but because the standing up of the CF components was done via chef scripts that are a part of the project, keeping the installer up to date with the CF releases has proven challenging.

This was the first of the three options that I used earlier this year and I was incredibly grateful to have it.  That said, because of the aforementioned challenges in keeping this in synch with the evolving CF releases, and dwindling activity in the project, I no longer recommend it.

CF Nise Installer from NTT Labs

Just a month after the release from Altoros, NTT Labs released the cf_nise_installer.  The install can be run with the help of vagrant, or you can create an Ubuntu 10.04 machine yourself (virtual or physical) and run two commands WITHIN that VM.  You don’t need to install ruby on your host (really nice if your laptop is a Windows box) – in fact, if you run the installer in your own Ubuntu machine then there are absolutely NO dependencies on software running natively on your laptop.  The basic mechanics of how the Nise installer does its thing is that it leverages a BOSH emulator, also created by NTT Labs, and the install then 1) clones the cf-release repository (which is the official release vehicle for the OSS Cloud Foundry), 2) creates a release using the BOSH cli and 3) deploys that release onto the VM via this BOSH emulator, called Nise BOSH.  Because the deployment leverages cf-release directly, it doesn’t get out of synch with the CF release – that is, it IS the CF release.  By default the cf_nise_installer will install the latest “final build” as found in the releases directory.  It’s true that the BOSH emulator has to keep up with changes to BOSH and the deployment manifest must also be updated, but yudai is remarkably responsive and it’s a rare day where someone reports an issue on one of the mailing lists and doesn’t have a resolution from him in just a couple of hours (I’m honestly not sure that the guy ever sleeps ;-) ).

In spite of the third tool I am just about to describe, the cf_nise_installer remains the option I most often recommend for standing up a local cloud foundry quickly, because, well, it just works – and in about 30 minutes.

BOSH-lite

While the commit history dates back to June of this year, it was more like August when bosh-lite matured enough for consumption.  This project was initiated from within Pivotal and, most simply put, it is an implementation of a Warden CPI for BOSH. If your not thinking this, let me seed this in your consciousness…

A warden CPI. Oh, now, That. Is. Cool!

BOSH is a product that allows you to deploy and manage complex distributed applications, like the Cloud Foundry Runtime and Services, over the top of any IaaS.  BOSH just needs to be able to provision virtual machines and then it lays things down onto those machines and monitors them too.  We’ve had support for VMWare, Openstack (provided in partnership with Piston Cloud) and AWS for some time, and NTT recently announced one for CloudStack (and there are more in the works too). Warden is a system that serves up light-weight Linux containers – that is, super-lean virtual machines.  The CF runtime provides application isolation by hosting deployed applications inside of warden containers, and now bosh-lite uses that same technology for the VMs into which CF components are deployed.

With a Warden CPI for BOSH, you now have the ability to run full BOSH on your laptop, and hence you can do a full BOSH deployment of CF on your laptop.  I know I said it already but, whoa, that is way cool.

That said, there are a few gotchas.  First, if you clone the repo today you will find that by default, the vagrant config uses 6GB of RAM – pretty hefty (I have been able to do it with 3 or 4 Gig).  You will also need to install a number of things on your laptop to be able to do the install.  All told, I’ll confess that it took me the better part of a day to sort through all of the dependencies, clean up my laptop sufficiently and get it all up and running.  And I recently witnessed a partner have roughly the same experience.

One of the best things, however, is that once you have bosh-lite installed, then the deployment of CF proceeds in the same manner as it would if you were deploying CF into the cloud. This has the obvious advantage of allowing you to gain necessary experience with an important toolset, BOSH, but that is, admittedly, a fairly significant burden for the newbie.  Another advantage that the bosh-lite approach has over CF Nise is that because the CF components are isolated from one another in the Warden containers, you won’t have issues with various CF pieces bumping into each other – CF Nise cannot, for example, deploy the CF runtime and most of the services on the same box.

To summarize:

  1. I no longer recommend the vagrant installer from Altoros.
  2. If you are looking to get the simplest CF instance stood up on your laptop, don’t need CF services right away (but you might be working on creating your own service broker – that’s okay), then I highly recommend cf_nise_installer.
  3. If you have enough RAM and are looking for a full-blown deployment of CF with services, and/or you are primarily interested in learning BOSH, then go with bosh-lite.

I have options 2 and 3 operational on my laptop and depending on the task before me I choose one over the other.

Canaries are Great!

(cross-posted from the Cloud Foundry blog)

First a little background, and then a story. As Matt described here, Cloud Foundry BOSH has a great capability to perform rolling updates automatically to an entire set of servers in a cluster, and there is a defensive aspect to this feature called a “canary” that is at the center of this tale. When a whole lot of servers are going to be upgraded, BOSH will first try to upgrade a small number of them (usually 1), the “canary”, and only if that is successful will the remaining servers in the cluster be upgraded. If the canary upgrade succeeds, then BOSH will parallelize up to a “max in flight” number of remaining server upgrades until all are completed.

And now the story.

For the last few weeks I’ve been pairing on the Cloud Foundry development team here at Pivotal. I’ve had a chance to work on lots of cool things, I’ve seen the continuous integration (CI) pipeline in action, and today I got to be one half of the pair that did a deploy to production – that is, to run.pivotal.io. As a brief aside, the CI pipeline is way cool, with a number of different systems automatically running test suites with passing tests automatically promoting things. But when it comes to production deploys there is still a person that pushes the “go” button. Of course, just as any other thing we do here at Pivotal, that production process has been tooled so that, it really is usually a matter of pushing a metaphorical button. Today, however, we did do a bit of an upgrade to the tooling before we used it.

And we goofed.

Yes, we had tests for the tooling code. Yes, the tests were passing. But when we ran things with the production manifests as input, an authorization token was wrong. BOSH did tell us of the change, but we got a bit overzealous and didn’t catch that change until after we said “yes” to the “are you sure you want to deploy this” question. Once we realized our problem, BOSH was already upgrading our marketplace services broker. Doh.

Could have been a very bad day. But, thanks to canaries, my heart didn’t even skip a beat.

Our production deployment runs multiple instances of the marketplace services broker. Enter the sacrificial canary. When BOSH started the upgrade it only took down one of the brokers, upgraded the bits and tried to restart it. In this case the air in the coal mine was toxic and our little bird did not survive :-( . As a result, we sent no additional souls in, leaving the other service brokers fully functional :-) .

We fixed our problem, pushed the “go” button again and this time the canary came up singing, BOSH upgraded the remaining service brokers and our production deploy was complete.

It was a good day. A really, really good day.

Intel Developer Forum Hackathon

During a conference-rich week two weeks ago, Pivotal was invited to participate in a coding contest that was a part of the Intel Developer Forum (IDF) held at Moscone.  The event hosted 15-20 college students, forming two teams that would compete against an Intel team; the task: build an innovative application for junior high math students. The main thing the applications would be judged on was cloud-readiness, and this is where Pivotal came in – the applications would be deployed to Cloud Foundry.  I had the great pleasure of representing Pivotal, Pivotal Labs and Cloud Foundry at this event and it was a blast!

The event spanned two half days with official coding hours from 11-3 each day, though I heard that an all-nighter might have transpired between the two. The kids (and given that most are roughly my son’s age, I will call them kids ;-) ) almost all came from Contra Costa College where they have had the great fortune of being taught by a fantastic Professor Tom Murphy, who was also in attendance. These budding programmers had prior experience with HTML 5 and javascript, and the teams included designers as well. Intel, who have been engaged with Cloud Foundry for some time, brought with them some brand new, just announced at the IDF, hardware, onto which they deployed Cloud Foundry – that is, they stood up a private cloud for the event.

The team’s submissions would be judged on certain criteria including:

  • Design for failure
  • Stateless computing
  • Scale out (not up)
  • Event driven
  • Web services
  • Security
  • Prefer eventual consistency
  • DevOps/NoOps

I’d like to share several reflections:

One of the main things that I learned from the experience is that thinking in terms of distributed systems not is immediate.  Well, duh – of course it is not.  It made me think back to my days in school where the initial classes were introductory programming, data structures and algorithms, and computer architecture.  In chatting with these students, those are the same courses they have been taking.  I think it’s inarguable that a course in distributed computing MUST be a part of any undergraduate computer science curriculum today (but I’m not sure that it is??).  And it needs to come early. When I was in graduate school at IU I often TA’d the first course that computer science majors took, an Intro to Programming course taught in Scheme. This was brilliant because, in part due to the nature of the language, they are taught recursion in week 3 (as opposed to week 13 when I took it in my intro Pascal programming course) and it turns out not to be hard (in week 13 it was hard for a lot of my fellow students). Recursion became a foundation for them, not an add-on. I’d like to see distributed computing be a part of that foundation today.

Because of former training and experience the students all built their applications to run HTML 5 and javascript in the browser. The UIs these kids built were way cool, with spaceships shooting answers at meteors that carried math problems, for example, and because they are standards-compliant, will work in pretty much any browser.  Major Kudos to Tom Murphy for laying this incredibly valuable foundation.

In the bucket of “stateless computing” we detailed that state should not be stored within the compute node. While the game play ran entirely in the browser, things like user profiles and high scores would be stored in some database.  Intel had stood up their private PaaS with several choices for persistence, including both relational and NoSQL databases.  It probably won’t surprise you to hear that the Intel team stored in relational and the student teams both chose MongoDB. I wish I had been able to dig in a bit more on why they made this choice, but by the time the question came up we were singularly focused on get things working. (If you were on one of the student teams, please post a comment and tell us why you made that choice.)

In the end, neither student team was able to connect their browser-based application to cloud-based persistence (entirely our fault for not anticipating what would be needed here), though they did get the applications pushed into the cloud, which on it’s own is pretty cool.  The first team to get their app pushed into Cloud Foundry, using the cf cli, was literally jumping up and down and whooping and hollering.  They didn’t have to stand up a VM, load the OS, install an app server and then deploy their app – they just wrote their app and pushed it to the cloud.  That is PaaS.

Finally, I want to extend congratulations to Cathy Spence and the whole Intel team for the event. While I was there I came to understand that Intel does these types of events quite regularly and while it’s probably a bit tricky to quantify the ROI, there is no question that Intel benefits in terms of positive PR and recruiting.  And the benefit to these students is great!  I’d really love to see my parent companies, EMC and VMWare, take a lesson here, and while Pivotal is much smaller and may lack some resources, I’m certain there are ways we can creatively engage in a similar manner.

Tools for Troubleshooting Application Deployment Issues in Cloud Foundry

Our standard demo for Cloud Foundry has us in a directory where either some source code or an application package (war, zip, etc.) is sitting and then we do a

cf push

A handful of messages will appear on the screen, like:

Uploading hello... OK
Preparing to start hello... OK
-----> Downloaded app package (4.0K)
-----> Using Ruby version: ruby-1.9.3
-----> Installing dependencies using Bundler version 1.3.2
Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin --deployment
Fetching gem metadata from http://rubygems.org/..........
Fetching gem metadata from http://rubygems.org/..
Installing rack (1.5.2)
Installing rack-protection (1.5.0)
Installing tilt (1.4.1)
Installing sinatra (1.4.3)
Using bundler (1.3.2)
Your bundle is complete! It was installed into ./vendor/bundle
Cleaning up the bundler cache.
-----> Uploading droplet (23M)
Checking status of app 'hello'....
1 of 1 instances running (1 running)
Push successful! App 'hello' available at http://hello.cdavisafc.cf-app.com

This shows that the application was uploaded, dependencies were downloaded, a droplet was uploaded and the application was started.  And that is all fine and good, but what happens when something goes wrong? How can the application developer troubleshoot this?

The answer is multi-faceted and in this note I will try to organize things a bit.

First, let me list the different tools someone might have at their disposal, and briefly what app troubleshooting things it offers:

  • the cf cli
    • the cf apps command – This should be very familiar, it simply shows you the apps you have deployed and an indication of their health
    • the cf logs command – This will show you the contents of the files found in the logs directory of the warden container – these contents will vary depending on where in the app deployment process you are when investigating
    • the cf files command – This will show you the filesystem contents of the warden container – these contents will vary depending on where in the app deployment process you are when investigating
  • the bosh cli
    • the bosh logs command – This will tar up and download the files found in the /var/vcap/sys/logs directory on the targeted VM.  In general, the logs from the dea will probably be the most helpful (dea logs and warden logs), with perhaps something of note in the cloud controller logs.
  • ssh into CF VMs
  • wsh (warden shell) into the warden container for the application
    • this is only possible if the application was entirely staged and is up and running. In the event that the application is “flapping,” the warden containers are likely getting killed and recreated on some pretty short interval and it will be hard to get much from wsh-ing in.

Here’s the thing… ultimately your application developer will only have access to the first of these things (the cf cli) and once your cloud is stable, this should be sufficient.  While you are getting the kinks worked out of your PaaS deployment, however, the other tools can be very helpful.  One other thing to note is that if your developers are enabled with some type of micro-cloud foundry on their workstations, then while they may not have bosh, they would be able to ssh into that machine and poke around, for example, getting to the dea logs directly.  I do this all the time on my laptop.

Okay, so now with this list of tools, I’ve crafted the following diagram to give some guidance on what tools will help when investigating things during different stages of the application deployment process.  There is definitely a bit of a trick to figuring out where in the lifecycle something went wrong, but even trying to use a prescribed tool for something will give you a hint.

App Restarting in Cloud Foundry

A couple of weeks ago, right before going on what turned out to be a glorious vacation in the sun, I stood up a local Cloud Foundry on my laptop using the cf-vagrant-installer from Altoros.  Turns out there was a bug in a couple of the configuration files (pull request has already been merged) which offered a beautiful learning opportunity for me and I want to share.

Here’s what kicked it all off.  I went through the cf-vagrant-install and pushed an app.  Sure enough, it all worked great.  Then I shut down my vagrant machine, started it back up and expected my app would similarly restart, but it didn’t.  Instead when I ran a cf apps it showed my app with 0% of them started.

cdavis@ubuntu:$ cf apps
Getting applications in myspace... OK
name    status   usage      url
hello   0%       1 x 256M   hello.vcap.me

Hmm.  Okay, so let me try something simpler – who knows what the cf-vagrant-installer startup scripts are doing, maybe the left hand can no longer see the right after a restart.  So I cleaned everything up, pushed the app and it was running fine.  I then went and killed the warden container that was running the app (a separate post on how I did that coming soon) – and again, the app didn’t restart. And it stayed that way. It never restarted. Yes, it’s supposed to. So I dug in and figured out how this is supposed to work:

There are four Cloud Foundry components involved in the process, the Cloud Controller, the DEA, the Health Manager and NATS.  The Cloud Controller (CC) knows everything about the apps that are supposed to be running – it knows this because every app is pushed through the CC and it hangs on to that information. To put it simply, the CC knows the desired state of the apps running in Cloud Foundry. The DEA is, of course, running the application. The Health Manager (HM) does three things – it keeps an up to date picture of the apps that are actually running in Cloud Foundry, it compares that to the desired state (which it gets from CC via an HTTP request) and if there is a discrepancy, it asks the CC to fix things. And finally NATS facilitates all of the communication between these components. (BTW, Matthew Kocher posted a nice list of the responsibilities of the Cloud Foundry components on the vcap-dev mailing list).

Here is what happens.

The DEAs are constantly monitoring what is happening on them – they do this in a variety of ways, checking if process IDs still exist, pinging URLs, etc. If the DEA realizes that an app has become unavailable, it sends a message out onto NATs, on the droplet.exited channel with details.  The HM subscribes to that channel and when it gets that message does the comparison to the desired state. Note that an app instance could have become unavailable because the CC asked for it to be shut down – in which case the desired state would match the actual state after the app instance became unavailable. Right? Assuming, however, the app crashed, there would be discrepancy and the HM would tell the CC that another instance of the app needed to be started.  The CC would then decide which DEA the app should start on (that is (part of) its job) and lets that DEA know. The DEA starts the app and all is good.

That’s a bit confusing so here’s a picture that roughly shows this flow – you shouldn’t take this too literally, especially the sequencing, for example, the HM asking the CC for the desired state is something that happens asynchronously, not as a result of the DEA reporting a crashed app. This picture is just intended to clarify responsibilities of the components.

Another place to see this in action is by watching the NATS traffic for a given app.  (I’m writing another post to talk about this and other tools I used in my investigations, but for now, just enjoy what you see.) What this shows are the heartbeat messages sent out by a dea showing the apps that are running. Then we get a droplet.exited message that starts the whole thing going. Eventually you see the heartbeat messages again showing the app as running.

06:03:53 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61007
06:03:58 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61007
06:03:58 PM dea.heartbeat            dea: 0, crashed: 0, running: 1
06:04:03 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61007
06:04:08 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61007
06:04:08 PM dea.heartbeat            dea: 0, crashed: 0, running: 1
06:04:13 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61007
06:04:13 PM router.unregister      app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61007
06:04:13 PM droplet.exited            app: hello, reason: CRASHED, index: 0, version: b48c1871
06:04:14 PM health.start                app: hello, version: b48c1871, indices: 0, running: 0 x b48c1871
06:04:14 PM dea.0.start                  app: hello, dea: 0, index: 0, version: b48c1871, uris: hello.172.16.106.130.xip.io
06:04:16 PM dea.heartbeat            dea: 0, running: 1
06:04:16 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61011
06:04:18 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61011
06:04:18 PM dea.heartbeat            dea: 0, crashed: 1, running: 1
06:04:23 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61011
06:04:28 PM router.register           app: hello, dea: 0, uris: hello.172.16.106.130.xip.io, host: 172.16.106.130, port: 61011
06:04:28 PM dea.heartbeat            dea: 0, crashed: 1, running: 1 

One last thing you might ask is this. What if somehow the message sent by the DEA that an app has crashed goes missing? We are NOT depending on durable subscriptions (which would be a grind on performance) so what is our mechanism for ensuring eventual consistency?  Remember that I said that the HM does three things, including keeping track of the actual state of the system. It can do this because every 10 seconds each DEA sends a heartbeat message (as you can see above) out onto NATS reporting how the apps that are running on them are doing.  If the HM doesn’t get the direct message that an app has crashed, from the heartbeat messages it will eventually see an actual state that doesn’t match the desired state. At that point it will contact the CC just the same as described above.

I’ve not yet grown tired of killing apps in every which way, destroying the warden container, going into the warden container and killing the app process, restarting the DEA and so on, and watching the state of the system eventually (within a few seconds) come back into equilibrium. Way cool. So very way cool!

The Intuition of Installing BOSH

We have Cloud Foundry running in the lab, BOSH deployed, in a vSphere environment.  While I worked with a colleague to deploy that Cloud Foundry, and I focused on understanding what was going on with services deployments, I was perfectly content to have him get us to the point where we could do that Cloud Foundry install.  In other words, he installed BOSH itself.  Recently, however, when he and I were trying to track down an issue we were having with BOSH, I found I really wanted to understand how BOSH is installed.  The short answer is, as reported on many blogs, that it’s just like installing Cloud Foundry – that you install BOSH using BOSH.  But what does that really mean? What are the details?  In this blog I’m aiming to explain WHAT happens when you go through the install steps. You can find those steps here or here, and my goal is not to repeat those instructions, but rather to explain a bit further what’s going on.

The first thing that you do is do some setup on the IaaS, in our case vSphere; networks, folders, etc.  The folders will be used by various installation processes to hold files used during the install. That is, the IaaS itself is used to facilitate the installation, not just host the installed system.  This is a key point that I haven’t found stated elsewhere, but if you watch what is going on in your IaaS (i.e. watching your vSphere console) during a “bosh deploy” you’ll see machines getting created and ultimately deleted – they are in service of the deploy itself.

Okay, so now you have an IaaS environment ready that you can install BOSH onto.  BOSH is itself a distributed application, running on a multitude of machines working in cooperation.  In theory you could create the VMs you need (here it says you’ll need 6) and install the appropriate bits onto each of those machines – put the bosh director on one, the health monitor on another, the message bus on another, and so on. But wait, isn’t that what BOSH is meant to do, provision VMs and install a distributed application onto them?  It is, and that is why the BOSH team has provided something that allows you to install BOSH using BOSH.  The trick then, is how to bootstrap the thing.

Check out the video behind this image - very cute

This is done with two things.  First, the BOSH team has essentially created a virtual machine that has all of BOSH already preinstalled on it. That’s right, BOSH director, the health manager, workers, message bus, etc – all of it on a single VM.  And second, they have given you a tool to install that virtual machine into the IaaS environment that you set up in the first step.  So the next few steps in the instructions have you download the micro-BOSH stemcell (essentially the VM) and install a bosh cli & deployer. The cli has the protocol for interacting with the IaaS built into it, so you can just issue a few cli commands and it does the micro-bosh deployment for you. Some instructions have you create a virtual machine from which you do these steps, but this isn’t strictly necessary; you just need an Ubuntu machine with ruby, git and a few other things on it.  I personally always have Ubuntu VMs I am using for various development activities and are perfectly suited to this set of steps.  So after you’ve installed micro-BOSH following the instructions here or here, you actually have a BOSH up and running.  You can now take “releases” which contain things like packages, jobs and deployment manifests and deploy them with BOSH. And that install was really easy!

But, hang on, the instructions go on – they have me install BOSH.  But, didn’t I just install BOSH? Yes, you just installed (micro) BOSH. And yes, you now need to install (full-blown) BOSH.

Remember that micro-BOSH is deployed on a single VM.  If you are deploying (and monitoring!! – but I’ll talk about that more some other time) a distributed application that isn’t too big or complicated, you could use micro-BOSH and it would work reasonably well.  But ultimately our goal is to deploy more significant applications, like Cloud Foundry, and for that a single VM BOSH just couldn’t cut it.  But the good news is that we now have micro-BOSH to help us install the more sophisticated BOSH deployment.

One of the components in that micro-BOSH deployment is something called the BOSH director, which, as the name implies, orchestrates all of the things that BOSH does for you. So to deploy anything with BOSH you basically tell the BOSH director what you want to do, where it can find the bits it needs for the job, and what environment you want to install those bits into. Let’s take those things in reverse.

Remember that BOSH will stand up VMs onto which it then installs things.  What does that VM look like?  This is where stemcells come in again.  The instructions have you download, and then upload via micro-BOSH, the bosh stemcell.  The BOSH stemcell is different from the micro-BOSH stemcell in that it contains only a base operating system plus a BOSH agent.  Recall that the micro-BOSH stemcell has all of BOSH on it.  I find it a bit ironic that the micro-BOSH stemcell is actually much “bigger” than the (full) BOSH stemcell.  Really! Have a look at the file sizes for the two stemcells you’ve downloaded.

The bits that are needed for the application installation are bundled in a release.  You find the BOSH release in the bosh repository on git.  The structure of a BOSH release has been covered in numerous blogs and videos, and I won’t cover the details here. The instructions will have you git clone this repository to get all the bits you need; you’ll have to modify a few settings for your vSphere environment.

Finally, you need to tell the BOSH director what you want to do, and this is accomplished with a series of bosh cli commands, as covered in the installation documents.  What is important to note in the instructions is that when you are installing BOSH (using micro-BOSH) you target the BOSH director that is a part of micro-BOSH.  Once the (full) BOSH install is done you need to change which BOSH director you are targeting.  If you don’t and you try to do something like a Cloud Foundry deployment you’ll be attempting that with the micro-BOSH and that is not likely to go well.

Bonus track: Deploying the Echo Service via the service_broker

Part IV in a three part series ;-)

In Part II I went into a little bit of detail on how the echo service gateway and node work, in collaboration with the actual echo server. Or rather, in this case, how the node does not work with the echo server at all.  The echo_node implementation simply looks up some metadata values from it’s config file and returns them, via the gateway, to the requesting party (the dea).  You might question why the node was in there at all.  The answer is that in this case you don’t really need it.

Included in the vcap-services repository is an implementation for a service broker which can be used to act as a gateway to services that are running outside of cloud foundry.  Recall from part I in this series that the echo server, the java application that listens on a port and parrots back what it hears there, really doesn’t have anything to do with cloud foundry; again, not even the echo_node communicates with it.  So really, you can think of it as a server running external to cloud foundry.

The service_broker is a service_gateway implementation that stands up a RESTful web service that implements all of the resources that any service gateway resource does; resources that allow you to create or delete a service instance, bind or unbind it from applications, etc.  In this case, the gateway implementation simply offers a metadata registry with entries for each of the brokered services. In a “normal” cloud foundry service implementation, in response to these web service requests the gateway will dispatch a message to NATS, a node will pick it up, fulfill the request and communicate back to the gateway, which in turn responds to its client.  In the case of the service_broker there is no node and hence no need to dispatch messages onto NATS.  The web services of the gateway simply look up  the values in its registry and return them.  This begs the question of how things get into that registry; the service_broker gateway offers a RESTful web service resource for the registry itself, supporting POST to add things and DELETE to remove them.  You can either issue those service invocations yourself or you can use the service broker cli.

So here’s what I did to go through the exercise of deploying the echo server as a brokered service:

Step 1: Start with a vcap devbox, following the instructions you find in the vcap repsitory, EXCEPT that you also need to start up the service broker; see this stackoverflow thread for how to do that.

Step 2: Register the echo service with the service_broker.  I used the service_broker_cli which will already be on the devbox you set up in step 1.  Running the cli with no arguments will register what is found in the services.yml file found in the config directory, so I ran it as follows:

bin/service_broker_cli -c config/echobrokeredservice.yml

with the contents of the echobrokeredservice.yml as follows:

---
service_broker: http://service-broker.vcap.me
token: "changebrokertoken"
service:
  name: echo
  description: cloud foundry sample service
  version: "1.0"
  options:
    - name: service
      acls:
        users: []
        wildcards: [*@emc.com]
      credentials:
        host: 192.168.1.150
        port: 5002

Notice the “credentials” section – there are the same values that were in the echo_node.yml file in part I but now instead of the echo_gateway dispatching to NATS and the echo_node simply returning the values from the echo_node.yml file, the service_broker_gateway just looks those values up in its registry and returns them.

When you now run a

vmc info --services

you should see the echo service listed.

Step 3: Deploy your client app and bind to the echo_service.

Seriously.  That’s it.

But before you think, “why did I go through all of those other hairy steps with the node and gateway and BOSH, etc.?,” remember that this echo server is just a sample, and a very, very simplified one. Servers that run within cloud foundry (and are hopefully deployed with BOSH) benefit from management, monitoring, logging and other cloud foundry capabilities, and the gateway/node combination is a loosely coupled mechanism for offering lifecycle operations on those services.  A very valuable part of the cloud foundry services story.  That said, there is clearly value in brokering external services and that’s why you have this bonus track. :-)

The Echo Client

So just a bit more on step 3 from above. The Echo client application that was posted all the way back on the original support forum article is a bit rigid.  It accesses the details of the bound services through the VCAP_SERVICES environmental variable, and that’s fine, but it then looks for a service type named (exactly) “echo” and an instance name of (exactly) “myecho.”  This hard coding always bugged me but because of the way that brokered services are named as a part of their configuration, I finally did something about it.  It required some changes to JsonParseUtil.java where I now take in a string and look for a service type name and instance name that contains that string.  So in an updated version of the echo client source code that you can find here, the client will look for a bound service with the type and the instance name containing the string “echo”; so the service type of “echo_service” with a name like “echo_service-a302″ fits the bill.  Oh, and one other slight add to the app – this one now prints the contents of the VCAP_SERVICES environmental variable – helps when you are first getting started with such things.

Using the Cloud Controller REST interface

When starting out with Cloud Foundry you are likely to use vmc for most of your communications with the cloud, but under the covers, vmc just fires off HTTP requests to the cloud controller REST interface.  This REST interface is completely legitimate for you to use, either “by hand” with something like curl, or from within your applications using an HTTP client.  As is the case with much of Cloud Foundry at the moment, documentation on this interface is somewhat (okay) completely lacking, so you have to be a bit creative in figuring it out.

I’ve just spent a few hours sorting through some of the idiosyncrasies of this Cloud Controller REST interface and here’s what I’ve learned.

  • There are a couple of people who did start to document this interface as a community effort; it hasn’t been updated in some time but it seems to still be accurate.  It’s here.
  • One of the best ways to figure out the REST resource is to run vmc with trace turned on.  There is a decent post on stackoverflow that describes this in a bit of detail.
    • One “gotcha” on this is that the trace doesn’t show you everything – in particular some of the details I list below.
  • There are a handful of resources that do not require authentication.  Things like:
    • Info and the services and frameworks sub-resources; you can GET these resources at /info, /info/services and /info/runtimes, respectively.
    • User token:  By POSTing to the /users/{userid}/tokens you can get back the user token that is needed for many other resoruces.  For example, the following command will return my token.
      curl -X POST -d '{"password" : "mypassword" }' api.vcap.me/users/cornelia.davis@emc.com/tokens

      Note that this returns something of the form: “bearer …. and a whole bunch of characters ….” – the “bearer ” is part of the token.

  • Notice that the POST to get the user token didn’t require any particular headers – this will change as we progress through some of the other requests.
  • Now try doing a GET on /info, but with the header “Authorization” set to the user token.  You will notice that the json returned is slightly different, namely you will see that you are now authenticated.
    {
      "name": "vcap",
      "build": 2222,
      "support": "http:\/\/support.cloudfoundry.com",
      "version": "0.999",
      "description": "VMware's Cloud Application Platform",
      "allow_debug": false,
      "frameworks": [...snip...],
      "authorization_endpoint": "http:\/\/uaa.vcap.me",
      "user": "cornelia.davis@emc.com",
      "limits": {
        "memory": 2048,
        "app_uris": 4,
        "services": 16,
        "apps": 20
      },
      "usage": {
        "memory": 1792,
        "apps": 4,
        "services": 2
      }
    }

    Notice that no other headers need be set at this point – just the Authorization one.

  • With the Authorization header set you can now GET a bunch of different resources (this is just a sampling):
    • GET /services
    • GET /apps
    • GET /apps/{appname}
    • GET /apps/{appname}/stats
    • GET /apps/{appname}/crashes
  • As soon as you want to get more details on a particular service, however, you will need another header, or two.
    • Don’t ask me why – as I said some idiosyncrasies – but in the case of services there is this bit of code that I finally dug up in the cloud_controller/app/helpers/services_helper.rb file that validates that the content type header is set to JSON for services requests.  By the way, the error message that is returned if you don’t have this header is:
      {"code":100,"description":"Bad request"}

      So set the Content-type header to application/json and you will now be able to access the service offerings which you could not before.

      • GET or POST /services/v1/offerings: This gives you a list of the system services but with a bit more information than you get with /info/services.  In particular, you will get the URL for the service gateway.
    • And finally, there is a bit more of a drill down into the details of a particular service but in order to get there you need one more header: the service token.  This value is provided in the X-VCAP-Service-Token HTTP header and what the value is depends on how you’ve configured tokens for your service; in a BOSH deployment it is probably a part of your deployment manifest, in a vanilla vcap install, it is probably set in the cloud_controller.yml file.  So now you can access
      • GET /services/v1/offerings/{service-label}/handles: Note that this gives information about the service nodes, including things like the host IP address and the other things (credentials) returned by the service node.

    So my final curl in all it’s glory is:

    curl -H Authorization:bearer ...the rest of my token... -H Content-type:application/json
         -H X-VCAP-Service-Token:changeme api.vcap.me/services/v1/offerings/cassandra-1.0/handles

Deploying a service to cloud foundry via BOSH

In this last of a three part series on learning how to add services to a Cloud Foundry cloud we’ll deploy the echo service into a BOSH-based deployment.  In part II you’ll find a more detailed description of the parts of a system service implementation, and also a description of and link to an updated version (updated from here) of the echo server itself.  If I’m doing my job right, with this post you should have an “ah ha” moment or two – as I already mentioned, I went through the exercise of learning about cloud foundry services in exactly the order mirrored with this series of blog posts, and a lot of things came together for me in this last step. So, let’s get started.

I’m going to roughly follow the instructions posted here – a BOSH release for the Echo Service. As I went through this exercise I was working off of an older version of this repository, with an older version of the documentation, where after cloning the repository you copy things from this directory into your cloud foundry release.  The latest instructions point out that BOSH now supports having multiple releases for a single deployment, a way to modularize a deployment, so you no longer have to copy things into a single directory structure for the cloud foundry deployment.  There is, however, something to be learned by copying things, so I’ve decided to keep this post in the older style to allow me to sprinkle the process with some explanation – I’ll refer to the steps as described in the older version of the docs.

Step 1: We already had a BOSH-based deployment of cloud foundry running in our lab.  We started with the cf-release posted here and modified it so that it consumed a few less resources (you would think as EMC that we would have all the vBlocks we need, but then you would be wrong ;-) ) ; before adding the echo service we were running 34 vms.

Step 2: Clone the repository (https://github.com/cloudfoundry/vcap-services-sample-release).

Step 3: Copy the job and package directories into your cloud foundry release.

cp -r vcap-services-sample-release/jobs/* cf-release/jobs/
cp -r vcap-services-sample-release/packages/* cf-release/packages/

If you haven’t already dug into the primary portions of a bosh release, here’s a brief explanation:

  • Packages describe all of the bits that will make their way onto the VMs that will run the service.  Every service I have looked at or built myself has had at least a spec file and a packaging file.
    • The spec describes what is required for that service component – dependencies on other cloud foundry packages (like ruby or sqlite) or files that are a part of the cloud foundry release. This tells bosh during deployment to copy these artifacts onto the VM that will run this component.
    • The packaging file is a script that runs after all of those bits have been delivered to the newly provisioned VM.  It usually will involve things like untarring a file and moving the resultant bits into the appropriate location on the VM.
    • Some packages will also have a prepackaging script that is run during the compiling of a package, before the VM is even provisioned.
  • Jobs represent the things that will be run on a VM and the files are generally start scripts and configuration files.  What is really interesting here is that those start scripts and config files are found in a subdirectory of the jobs directory called “templates.” The fact that these are templates allows you to instantiate them with values at run time, allowing you to do things like supply IP addresses of running machines at the point where that IP address is actually known.

There are two other major pieces of a BOSH release: 1) the blobs (which I’ll get to in a moment) and 2) the source tree containing code bits that make up the pieces of a package (mentioned in the package “spec” above).  I won’t say much about the latter in this post except that for the echo service, and all the base cloud foundry services, those bits get into your cloud foundry release via some git magic – it’s all in the ./update command that you do after cloning the cf-release repository. This draws the pieces for those services, the node and gateway implementations, from the vcap-services repository.

Step 4: In this step you are asked to put metadata for the echo server blob into the …/cf-release/config/blobs.yml file. This step isn’t needed at the moment, and in fact, the latest version of the docs for this sample release does not include it.

Step 5: Add echo to the list of built in services by modifying the cloud_controller.yml.erb file adding ‘echo’ to the line that starts with “services =”.

At this point the instructions tell you that you can do a bosh create release and a bosh upload release but there is one critical step missing – what about the actual EchoServer-0.1.0.jar? How do we get it running on one of the BOSH managed VMs?

I mentioned above that in addition to the packages and jobs portions of a cloud foundry release, there are also blobs.  For cloud foundry services these are generally the tar/zip files that contain the actual servers that will provide the service capabilities; the postgresql-9.0-x86_64.tar.gz file or the redis-2.2.15.tar.gz, for example. For our sample service this is the EchoServer-0.1.0.jar file. There are a number of ways that you can structure your cf-release leveraging git to make this perhaps a bit more elegant, but for now we’ll just do the brute force:

  1. Create the echoserver directory in the …/cf-release/blobs directory.
  2. Drop the EchoServer-0.1.0.jar file from part II in this series into that new echoserver directory.

(Looking into several of the echo service files that were copied over into the cf-release you can find reference to that jar file in places like …/cf-release/packages/echoserver/spec, …/cf-release/packages/echoserver/packaging and …/cf-release/jobs/echo_node/templates/echoserver_ctl.)

During the bosh create release this jar will then get included in the tar ball that is subsequently uploaded to and deployed into the cloud.

Okay, so now for the good stuff. In part II of my series I promised you that some of the ugliness around coordinating the command line arguments for running the Echo Server with values in the echo_node.yml file would get better with BOSH.  You see, BOSH is now responsible for running both the Echo Server (starting it with a java command) and the echo_node, so there must be a way that we can coordinate these two things.  There is.

The single place that we will put values that will then be used by the Echo Server and the echo_node is in the deployment manifest.  Under the properties: section you need to include the following:

  echo_gateway:
    token: changeme
    ip_route: ***.***.***.***
  echoserver:
    port: 5555

Then you have to see to it that the Echo Server and the echo_node pick up the port value appropriately.

Echo Server

In the …/cf-release/jobs/echo_node/templates/echoserver_ctl file you will find the java command that runs the echo server:

exec java \
    -jar EchoServer-0.1.0.jar \
    -port <%= properties.echoserver && properties.echoserver.port || 8080 %> \
    >>$LOG_DIR/echoserver.stdout.log \
    2>>$LOG_DIR/echoserver.stderr.log

Enclosed in the <%= %> is a template expression (using ruby’s erb feature) that pulls values from the deployment manifest.  But our Echo Server also takes in an IP address so we need to update this execution to the following:

exec java \
    -jar EchoServer-0.1.0.jar \
    -ipaddress <%= spec.networks.default.ip %> \
    -port <%= properties.echoserver && properties.echoserver.port || 8080 %> \
    >>$LOG_DIR/echoserver.stdout.log \
    2>>$LOG_DIR/echoserver.stderr.log

echo_node

In the …/ cf-release/jobs/echo_node/templates/echo_node.yml.erb file you will find the port for the echo server specified; recall from Part II that the echo_node oversimplified so as to just return the port number that is specified in the _node.yml file.

port: <%= properties.echoserver && properties.echoserver.port || 8080 %>

Of course, now you can see that the port in this config file is drawn from the same source as the port supplied to the Echo Server when it is started.  Something you had to coordinate manually is now handled by BOSH.  Coolness.

Step 6: NOW you can do the bosh create and upload.  Because we were updating an already deployed release we need to do:

bosh create release –force

Followed by

bosh upload release

Step 7: And while we have already updated the deployment manifest with the properties for the node and gateway, you also have to update it to include the two jobs that will be part of our cloud foundry deployment. Note that each VM gets a single job, but that the echo_node job launches two processes, the echo_node implementation and the actual Echo Server. The following are roughly those parts taken from our deployment manifest; your mileage will vary depending on how you configured your cf-release deployment. Under the jobs: section:

- name: echo_node
   template: echo_node
   instances: 1
   resource_pool: infrastructure1
   persistent_disk: 128
   networks:
   - name: default
     static_ips:
     - ***.***.***.***

 - name: echo_gateway
   template: echo_gateway
   instances: 1
   resource_pool: infrastructure1
   networks:
   - name: default
     static_ips:
     - ***.***.***.***

Oh, and we increased the size of our “infrastructure1” resource pool by 2.  Of course, you’ll have to update the ***.***.***.*** IP addresses appropriately.

Now do Step 8:

bosh deploy

You should now be able to push the same echo app as posted in Part II of the series.

Have fun!