Welcome

What works and what doesn't work in software development? For the first 10 years or so of my career, we followed a strict waterfall development model. Then in 2008 we started switching to an agile development model. In 2011 we added DevOps principles and practices. Our product team has also become increasingly global. This blog is about our successes and failures, and what we've learned along the way.



The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The comments may not represent my opinions, and certainly do not represent IBM in any way.

Friday, May 16, 2014

Bootstrap scripts for SoftLayer: Chef, Eureka, Openstack Devstack, Tomcat

I've published sample scripts that can be used to bootstrap some popular software packages.  The samples are AS-IS open source (not supported by IBM).  

The samples out there right now are: Chef workstation, Chef Solo configuration (solo.rb, node and role files), Eureka (NetflixOSS), Devstack (OpenStack developers' distribution), and Apache Tomcat 7.  The Eureka and Tomcat bootstrap scripts also use Chef Solo, while the DevStack script is just a straight .sh script.

If you tweak the scripts a little to use your own information and provide pointers to them in SoftLayer's "provision script URL" field, your systems will be automatically configured for you when create them or re-image them.
https://github.com/amfred/softlayerbootstrap << See the README.md file for details.

Wednesday, August 21, 2013

How to make life easier for your remote employees

I've already written one post about setting up teams with remote workers.  However, I didn't really focus on cultural changes that employees who are in the regular office can accept to make life easier and more efficient for their remote colleagues.  Cultural changes are always difficult, but these are worth the effort if you want your entire team to be productive:


  • Be available for communication.  People are going to have questions, and if they're remote, they can't walk down the hall to ask them.  Be available via chat, instant message, text message, and phone.  Check your email.  Respond to messages from remote employees at least as quickly as messages from locals, and remember that remote employees can't just stop by if it's urgent.  It's great to set aside a couple of hours each day when you won't be interrupted, but make sure there are plenty of hours when your remote workers can reach you.  Set up your "do not disturb" time so it's either a time when your remote employees are not working, or when they are also on their "do not disturb" time.
  • Be available in off hours, especially when working across time zones.  Most remote employees will be very respectful of your working hours, and will only call you in off hours if they're stuck, and then they'll keep it quick.  If they do not have the freedom to call or text message you in off hours, they will lose hours of productive time on a regular basis.  They will learn to stop asking questions, and either spend the time trying to figure it out themselves using the Internet, or implement something that may or may not be what you want, and ask for feedback on it the next day.  Worried that you'll end up working too much in off hours?  Cross that bridge if you come to it, and allow for flexible schedules.
  • Allow for flexible schedules.  If someone ends up working late one night, they should be able to leave early or come in late another day that week.  Also, consider whether people in different time zones should work a shifted schedule.  For example, I've worked with teams in India that worked from 12-8 PM every day, so they would be at work for at least a few hours when the U.S. employees were at work.  Now that I'm living in California, I'll work an hour very early in the morning to sync up with people in Europe and on the east coast of the U.S., then I'll take a couple of hours off to get the kids off to school, then I'll finish up my work day.  My "do not disturb" time is in the late afternoon, when my colleagues are not working.  I also try to schedule all of my personal appointments late in the day.  Sometimes I work from 10-11 PM to sync up with people in China.  Flexible schedules are great as long as people are available for communication when the rest of the team is working.
  • Be careful about setting meeting times.  Keep meetings short, small, and focused.  Set meetings during everyone's working hours, or take turns having meetings during your off hours and during others' off hours.  Consider whether it's better to schedule a meeting or pick up the phone.
  • Be careful about who you invite to meetings.  Again, keep your meetings small.  But if you're having meetings where your remote colleagues have something to add, then invite them and make sure you provide facilities for them to join in (such as a good conference calling phone system, plus screen sharing).  Retrospectives and planning meetings, meetings where you set team policies, and town hall meetings are good examples.
  • Avoid using the mute button during conference call discussions.  If it's a one-way presentation, then it's fine for everyone else to be on mute to block background noise.  But if it's a discussion, don't put people on mute while you have a side discussion.  It's disrespectful, and people know it's happening because the sound of the background noise changes.
  • Be very clear in your communication.  Write carefully, especially if it's an email message rather than real-time communication.  Also, be explicit about work that is required, and work that is optional.  Are you assigning a task, or are you tossing around ideas?  Do you need a lot of help with something, or do you want to be pointed in the right direction?  Is anything blocking you?  Are you getting pulled away from your main tasks to work on something else?
  • Be smart about communication modes.  Use email when you have to carefully consider what you write, or to report on your status and hand off work at the end of the day.  Email and mailing lists are terrible ways to have involved discussions.  Use the phone (either an impromptu call or a meeting) when you have much to discuss.  Use instant messages or texts for quick questions.  If your instant messages or email messages are getting long, it's time to switch to the phone.  Use mailing lists for broadcast messages that need to go to a group of people, but if it turns into a discussion, move the discussion to a forum or wiki, send the link to the mailing list, and politely move the discussion off of the mailing list and into the forum/wiki.  If discussing a work item (defect, feature, etc.), discuss it via comments on the work item, for future reference.  Get all of your tips, setup information, and troubleshooting information into a forum or wiki, or write documentation that stays with the work item.
  • Have blameless retrospectives.  Every week or two, get together as a team to discuss what is working and what is not working, in an honest, blame-free environment.
Anyone have more ideas to add?  As always, I welcome your comments!

Wednesday, November 14, 2012

Test automation got you down?

It's here - my full article on Making Your Automated Tests Faster, on the Enterprise DevOps blog!  Thank you again to the dozens of people who contributed their ideas at DevOps Days Rome.

This reminds me of my earlier post on "death by build", and how a build that takes too long, due in part to slow test automation, can really hamper a project: Is Shift-Left Agile? And Death By Build

Tuesday, November 6, 2012

Interview about DevOps and SmartCloud Continuous Delivery

Here's an interview I recently gave to a fellow colleague of mine, Tiarnán Ó Corráin.  This seems like a good time to reiterate that these views are my own, and not the official views of IBM:


What is DevOps?

You can think of it as an extension of the principles and practices of Agile.  Where the Agile methodology breaks down the barriers between development and business goals, DevOps is breaking down the barriers between development and operations.  It's not all about tools; it's about people and processes as well.  Both Agile and DevOps have a goal of delivering reliable software to the business more quickly, and ultimately, making more money!

How does DevOps do that?

Well, traditionally there has been a problem between going from a development system to a production system: installing new machines, installing the software, scripting and so forth. Getting from a working development system to a working production system involved all of these manual steps, and introduced many points of failure.

In addition, because setting up test machines was so time-consuming and error-prone, developers would often assemble their test machines in a quick and dirty way, on a single server, with the cheapest, simplest components.  Production systems, on the other hand, would have multiple servers, configured for clustering and fail-over, with firewalls between some components.  This meant that the developers weren't testing the software in an environment similar to the one where it would run in production.  And because of that, some bugs were never found until the software was deployed into production.

One of the primary tenets of DevOps is that you should automate every step in creating production systems.  Everything from preparing the machine, to installing the latest software, to starting the services, and testing them, should be fully automated and repeatable.  And when creating production systems is automated, you can also use production-like systems for development and test work.

How does virtualization help that?

When we're working in a cloud environment, deploying a virtual machine is something that can be scripted and automated.  We use infrastructure code to automate deploying the machine, installing the software, and (re)starting the services, and then we check that code into our source code repository and version it just like the application code.  So effectively the process of deploying a new system becomes part of the development process.

Presumably that makes testing easier?

Very much so.  Our virtualization technology means we can deploy production like environments as part of the development process.  It's the way the development process has been trending recently.  We already have continuous integration: RTC (Rational Team Concert) triggers automatic builds when changes are submitted, and we run unit tests against those builds.

Now, take that to the next level with continuous delivery: after changes trigger builds, those builds trigger deployments, and when the deployments are complete, the builds trigger automated tests.  What it means is that as part of the development process, we have production like servers running the latest code.  This allows us to run automated tests including performance verification against a production like environment as part of the development process.

Taking Agile to the next level?

Yes.  It accelerates development, because it takes away some of the uncertainty about deployment: if I can capture every part of the deployment process in my development and testing process, I have more confidence about what I'm going to deploy.

Deploying test systems automatically also saves developers and testers a lot of time!  On my own development team, it's normal for us to deploy dozens of new servers every day, and delete the old ones just as often.

Can you tell me a bit about your own role?

I'm on the advanced technology team that works on DevOps.  We are driving an internal transformation within IBM, to encourage our own development teams to adopt DevOps principles and practices.  In addition, we are creating tools to help IBM's enterprise customers adopt DevOps themselves.  The first tool we developed to sell is SmartCloud Continuous Delivery v2.0, and it's shipping to customers this week.  SmartCloud Continuous Delivery is currently targeted at customers who want to improve their dev/test cycle.    We believe this is the easiest place for our enterprise customers to start taking advantage of these new technologies.  We have other tools to help with production deployments, like SmartCloud Control Desk.

How is it going down in the market?

These ideas are gaining real traction, both within and outside of IBM.  Internally, we already have several adopters of continuous delivery for dev/test.  For instance, Rational Collaborative Lifecycle Management is using our code, and other teams like SmartCloud Provisioning 2.1 have custom continuous delivery solutions that are very similar to ours.  And of course, we're using it ourselves -- SmartCloud Continuous Delivery is self-hosting.

What would you say to any teams that are interested in this approach?

If anyone would like to evaluate the SmartCloud Continuous Delivery product, please check out our website.  We have free trials available.

Even if you're not a good candidate for SmartCloud Continuous Delivery, your team may be able to use several of the DevOps principles and practices.  Check out our Enterprise DevOps blog for ideas, and feel free to contact me about that as well.  IBM even offers DevOps consulting workshops.

Monday, October 8, 2012

DevOps Days Open Space: Making Your Automated Tests Faster


One part of my job is helping other teams adopt DevOps in general, and continuous delivery in particular.  But I have a problem: many of them have a suite of automated tests that run slowly; so slowly that they only run a build, and the tests that run in the build, about once per day.  (Automated test run times of 8-24 hours are not uncommon.)  There are several reasons why this is the case, including:

  • The artifacts that are produced from the build, and then copied over to the test servers, are very large (greater than 1 GB in size).  Also, sometimes the artifacts are copied across continents.
  • Sometimes there are multiple versions of the build artifacts that must be copied to different test servers after the build.  A typical product I deal with will support at least a dozen platforms; a few support around 100 different platforms, when you multiply the number of supported operating system versions times the number of different components (client, server, gateway, etc.) times 2 (for 32- and 64-bit hardware).
  • Often, the database(s) for the product must be loaded with a large amount of test data, which can take a long time to copy and load.
  • Many products have separate test teams writing test automation.  Testers who are not developers tend to write tests that run through the UI, and those tests are usually slower than developers' code-level unit tests.
Running builds and tests often, so developers know quickly when they make a change that breaks something else, is a key goal of both continuous integration and continuous delivery.  Ideally, a developer should get feedback on whether their code is "ok", using a quick personal build and test run, within 5 minutes.  Anything over 10 minutes is definitely too slow; the developer will probably move on to something else, make more changes, and forget exactly what was changed for that particular test.

Once the quick tests pass, the developer can run a full set of tests and then integrate the tested changes.  Or, in cases where a full set of tests is extremely slow, the developer can integrate his or her code changes once the quick tests pass, and then let the daily build run the full set of tests.

In this DevOps Days open space session, we brainstormed ways to make automated tests run more quickly.  We focused more on quick builds for personal tests, but most of these ideas would make the full set of tests faster too.  Many thanks to the dozens of smart people who contributed their ideas.  I don't even have their names, but they know who they are.  I'm sure we'll use several of these ideas right away.

Watch for an article with more details on each of these, coming soon...

Fail quickly


Run a quick smoke test first

Run a small set of tests that fail often next

Run slow tests last, or not at all

See also: Remove slow tests


Run in parallel


Run test buckets in parallel

Use snapshots of databases or VMs to make it easier to run tests in parallel


Break up tests into smaller groups


Divide your application into components, and test the changed components

Automatically determine which tests to run when code changes


Save time on I/O


Mock responses

Use LXC (Linux Containers)

Move servers and data so they are close to one another

Make your test infrastructure faster

See also: Use snapshots of databases or VMs to make it easier to run tests in parallel

Cache what you can


Remove some tests


Remove tests that never fail

Remove slow tests

Replace some UI tests with code-level tests

Replace some tests with monitoring


Friday, July 6, 2012

DevOps Days Open Space: DevOps for Legacy Code and Real Servers

I proposed the topic for this OpenSpace: DevOps for Legacy Code and Real Servers.  Here are some of the insights I gleaned from this session.

Legacy servers
  • Cloud platforms are evolving to manage real, legacy servers in addition to virtual machines.
  • Chef, for example, can manage both physical and virtual servers.  It can also manage clean OS installations as well as update existing servers.  There's a tool called Blueprint that will attempt to reverse engineer Chef automation for an existing server.
  • It's difficult to re-create systems that weren't automated in the first place.  However, it greatly reduces your risk if you invest the time and effort to do that.  What if the server was destroyed in a fire or something?
  • Sometimes people have even lost the source code for applications that are running in production.  That is a very risky state to be in.
  • Another option is to clone the system into a VM first, snapshot it, and then do your exploratory work on the VM.
  • You can also copy some of your production web traffic to your staging servers.
  • Or, you can start deploying new applications to VMs, and gradually shift your enterprise code to VMs.
Mainframe systems
  • Mainframes are the backbone of many legacy systems, and they are not going away.  
  • People who are used to working with mainframes have a different culture and language than people who are used to developing new web applications.  There's a communication gap to bridge before they can benefit from DevOps principles and practices.
  • One option is to just get an enterprise's web applications to adopt DevOps and punt on the mainframe applications.  But why can't we do the same thing for mainframe applications?
  • Mainframes have limited logging and monitoring systems.  Why?
  • Mainframes have limited tooling.  Why?
  • It's very difficult to see what's going in within an application.
  • It's very difficult to debug applications.
  • Deployments have to be completed with zero downtime.
  • LPARs, CICS regions, etc. could actually be considered a type of virtualization.  Is there a way we could make them behave more like VMs?
  • Could mainframe developers take some of the best practices from .Net and Java?
  • A more open place, like a university, might be more willing to experiment with DevOps first.
Universal Principles
These principles from DevOps can apply to legacy servers and mainframes just as easily:
  • Source Control Everything (including infrastructure code)
  • Version Control Everything (including infrastructure code)
  • Automate Everything (including infrastructure code)
  • Test Driven Development: Test First, Test Everything
  • Test for Operational Quality (performance, transaction load, security, etc.)
  • Agility
  • Focus on the Business Outcome, not the features or requirements
  • Improve teamwork between Dev and Ops
  • Collect metrics so you can find problems earlier

Friday, June 29, 2012

DevOps Days Open Space: How Can Ops Teams Give Feedback to Dev Teams?

This was another interesting Open Space that I participated in: How Can Ops Teams Give Feedback to Dev Teams?

Chaos Monkey can teach developers where things might break.  You need to couple that with some sort of monitoring tools so you can find bugs of the performance/throughput/overload type as well.

People in ops would like developers to program more defensively.  Developers are not generally taught how to do this.  It's also not usually part of their culture. 

One great way to tech developers is by writing tests that fail.  Developers are great at fixing tests that fail.

Another best practice is to embed developers in operations and vice versa.  Some companies have done this with teams of people for months or years at at time.  Others rotate people between the teams for one day every couple of weeks.  Set it up like an apprenticeship, where people can start out with a mentor and gradually become responsible for their own things.

Operations people can review code!

Developers can have pagers!

Product managers need to care about operational constraints and include those in the requirements that they put on the development teams.

You need to get everyone in the company to think about business value and happy customers.  Constantly.

You need to get everyone in the company to watch dashboards.  Give each person a few graphs to watch on a dashboard.