Tag Archives: Apollo 13

Enduring Leadership Lessons of Apollo 13

Apollo 13 Launch
Apollo 13 Launch

Forty five years ago, astronaut Jack Lousma was acting as the ground-based Capcom (a shorthand for Capsule communicator) for the Apollo 13 mission, which was in its coast phase towards the Moon.

The mission so far had been routine. Lunar missions seemed to have become almost straightforward. Lousma asked the crew, as part of normal operations:

13, we’ve got one more item for you, when you get a chance. We’d like you to stir up your cryo tanks.

The response – initially from Command Module pilot Jack Swigert has become a modern legend:

Okay, Houston —I believe we’ve had a problem here.

There had been an explosion. The crew’s first thoughts were that a meteor might have struck the Lunar Module. In fact, an oxygen tank had exploded because of an old wiring issue inside the tank. The following image – taken in the last minutes of the flight – shows the immense damage done to the Service Module.

Damaged Apollo 13 Service Module
Damaged Apollo 13 Service Module

What followed is equally well known – a story of resilience, ingenuity, guts and adaptability. The main Command and Service Module Odyssey would be powered down, and the Lunar Module Aquarius would become an unexpected lifeboat, supporting the full crew way beyond its original design limits. The moon was lost. The rest of the mission would now have a life and death focus on managing resources and controlling the spacecraft’s trajectory. The unfolding events would grab the world’s attention in a way that was unprecedented – billions of people would care, and sometimes pray, for the Apollo 13 crew during their encounter with fate.

Cutting to the present day, It was my privilege to attend a Forty-Fifth anniversary celebration of the mission at Cape Canaveral last Saturday. The highlight was a panel discussion with the surviving astronauts (Jim Lovell, and Fred Haise), mission directors (Gene Kranz, Glynn Lunney, Gerry Griffin) and the Capcom team (Vance Brand, Jack Lousma and Joe Kerwin). The discussion was both entertaining and hugely inspiring. We sat in front of real history.

Again it struck me again how perfect Apollo 13 is as a modern parable: a parable of leadership; of how to take action in a moment of complex crisis; of teamwork, and of engineering excellence. I have often used it in my own thinking, and teaching.

In no particular order, here are just some of the lessons Apollo 13 can teach us.

The Power of Teamwork. The film Apollo 13 centres on Jim Lovell and Gene Kranz. It has to, in order to present a complex real-life drama with the confines of the movie format.

But it was not for nothing that the panel at the celebration event consisted of multiple astronauts and flight directors. And every member of that panel emphasised the importance of teamwork, and how they themselves represented the efforts of 1000s.

Indeed, before cell phones and modern communications, without asking and within a short period after the incident, Houston Mission Control and many other NASA centres were filled with NASA members and supporting contractors. People would be at their desks for days, dividing up an immensely complex problem into solvable pieces. NASA and its suppliers pulled together to save the three astronauts as they swung around the moon.

The Power of Responsible Leadership in a Crisis. One of the key moments for the flight directors managing the crisis occurred when they – represented by Glynn Lunney and Gerry Griffin – went to brief NASA leadership, who would themselves have been under a great deal of hard-edged public pressure. The flight directors went through five complex recovery scenarios, and made their recommendation out of that five.

Given the importance of the decisions being made, the Flight Directors were ready for challenge. The challenge that came delighted them

We have just one question. How can we help?

Those of you who have worked in major problem resolution and crisis management will recognise the importance of this. As a leader, if you have the best people, and good procedures, then your role is to let them act, and give them the resources and support they need.

The Power of Professionalism. The crew of the mission, and the people that supported them, had been selected on the basis of character and ability. They had been well trained. As Ron Howard, director of the film Apollo 13, once pointed out to Jim Lovell , it was hard to hear a problem when listening to the original tapes from the time of the initial explosion. The response to immense problems was calm and measured – in both the spacecraft and in Mission Control.  It was focused on making revised plans, and working those plans. In fact throughout the crisis, there was little doubt and little fear. The NASA team believed it could save the astronauts, and worked carefully to that goal.

The Power of Preparation, Rehearsal and Testing.  Although the Apollo 13 was an extreme incident, NASA’s careful preparation paid off. They had rehearsed using the lunar module to control the full “stack” on Apollo 9. NASA’s detailed, existing procedures proved to be highly adaptable to the new set of problems. The wiring on the Command and Service Module was made to an extraordinarily high standard – a reaction to the tragedy of the Apollo 1 fire. That meant that the dampness inside the hibernating spacecraft did not create issues when it was brought back to life for re-entry.

So, after forty-five years, the story of Apollo 13 – of NASA’s successful failure – still endures, and inspires. Above all, it can teach.

Mission Control celebrates the recovery of the Apollo 13 crew
Mission Control celebrates the recovery of the Apollo 13 crew

Keith Haviland

Keith Haviland is a business and technology leader, with a special focus on how to combine big vision and practical execution at the very largest scale, and how new technologies will reshape tech services. He is a Former Partner and Global Senior Managing Director at Accenture, and founder of Accenture’s Global Delivery Network. Published author and active film producer, including Last Man on the Moon. Advisor/investor for web and cloud-based start-ups.

Failure is an Option

Things will always go wrong, but excellent preparation and strong leadership can turn failure into a kind of success.

The story of Apollo13 is a parable of gritty resolve, technology excellence, calm heroism and teamwork. For anyone focused on leadership, operations and program management it is absolutely the purest of inspirations.

The film of Apollo 13 centres around the phrase Failure is Not an Option,” invented post the original drama in a conversation between Jerry Bostick – one of the great Apollo flight controllers – and the filmmakers. It summarises a key part of the culture of Apollo era NASA, and it has found its way onto the walls or desks of many a leader’s office. It is part of the DNA of modern business culture, and of any sizeable delivery project.

Damaged Apollo 13 Service Module
Damaged Apollo 13 Service Module

Lessons from the Space Program

But one of the reasons that the crew was recovered was this: throughout its history, NASA and mission control knew that failure was precisely an option, and they designed, built and tested to deal with that simple truth. The spacecraft systems had – where physically possible – redundancy. The use of a Lunar Module as a lifeboat had already been examined and analyzed before Apollo 13. In the end, a old manufacturing defect caused an electrical failure with almost catastrophic consequences. It was precisely because Mission Control was used to dealing with issues that Apollo 13 became what has been called a “successful failure” and “NASA’s finest hour.”

The ability to respond like this was hard earned. The Gemini program – sandwiched between the first tentative manned flights of Mercury, and the Apollo program that got to the moon – was designed to test the technologies and control mechanisms needed for deep space. It was a very deliberate series of steps. Almost everything that could go wrong did: fuel cells broke, an errant thruster meant that Gemini 8 was almost lost, rendezvous and docking took many attempts to get right and space walks (EVAs in NASA speak) proved much harder than anybody was expecting. And then the Apollo 1 fire – where three astronauts were actually lost on the launch pad – created a period of deep introspection, followed by much redesign and learning. In 18 months, the spacecraft was fundamentally re-engineered. The final step towards Apollo was the hardest.

But, after less than a decade of hard, hard work – NASA systems worked at a standard almost unique in human achievement.

So, with near infinite planning and rehearsal, NASA could handle issues and error with a speed and a confidence that is still remarkable. Through preparation, failure could be turned into success.

Challenges of a Life More Ordinary

All of us have faced challenges of a lesser kind in our careers. I was once responsible for a major software platform that showed real, but occasional and obscure issues the moment it went into production, expensively tested. We put together an extraordinary SWAT team. The problem seemed to be data driven, and software related and simply embarrassing. I nick-named it Freddie, after the Nightmare on Elm Street movies. It turned out to be a physical issue in wiring – which was hugely surprising and easily fixed. The software platform worked perfectly once that was resolved.

Another example: In the early days of Accenture’s India delivery centres, we had planned for redundancy and were using two major cables for data to and from the US and Europe. But although they were many kilometres apart, both went through the Mediterranean. A mighty Algerian earthquake brought great sadness to North Africa, and broke both cables. We scrambled, improvised, maintained client services, and then bought additional capacity in the Pacific. We now had a network on which the sun never set. It was a lesson in what resilience and risk management really means.

Soon enough, and much more often than not, we learnt to handle most failures and problems with fluency.  In the Accenture Global Delivery network we developed tiered recovery plans that could handle challenges with individual projects, buildings, and cities. So we were able to handle problems that – at scale – happen frequently. These included transport issues, point technology failures, political actions and much more – all without missing a single beat. Our two priorities were firstly people’s safety and well being,  and secondly client service, always in that order.

Technology – New Tools and New Risks

As technology develops, there are new tools but also new risks. On the benefit side, the Cloud brings tremendous, generally reliable compute power at increasingly low cost. Someone else has thought through service levels and availability, and invested in gigantic industrialized data centres. The cloud’s elasticity also allows smart users to side step common capacity issues during peak usage. These are huge benefits we have only just started to understand.

But even the most reliable of cloud services will suffer rare failures, and at some point a major front-page incident is inevitable. The world of hybrid clouds also brings new points of integration, and interfaces are where things often break. And agile, continuous delivery approaches means that the work of different teams must often come together quickly and – hopefully – reliably.

The recent Sony incident shows – in hugely dramatic ways – the particular risks around security and data. Our technology model has moved from programs on computers to services running in a hybrid and open world of Web and data centre. The Web reflects the overall personality of the human race – light and dark – and we have only just begun to see the long-term consequences of that in digital commerce.

Turning Failures into Success

What follows is my own summary view of those key steps required to handle the inevitably of challenges and problems. It is necessarily short.

1. Develop a Delivery Culture – Based on accountability, competence and a desire for peerless delivery and client service. Above all, there needs to be an acknowledgement that leadership and management are about both vision and managing and avoiding issues. Create plans, and then be prepared to manage the issues.

2. Understand Your Responsibilities – They will always be greater in number that you think. Some of them are general, often obvious and enshrined in law – if you employ people, handle data about humans, work in the US, work in Europe, work in India and work across borders you are surrounded by regulations. Equally importantly, the expectations with your business users or clients need to be set and mutually understood – there are many problems caused by costing one service level, and selling another. Solving a service problem might take hours or days. Solving a problem with expectations and contracts may be the work of months and years.

3. Architect and Design – Business processes and use cases (and indeed users!) need to account for failure modes. The design for technical architectures must acknowledge and deal with component and service failures – and they must be able to recover. As discussed above, cloud services can solve resilience issues by offering the benefits of large-scale, industrialised supply, but they also bring new risks around integration between old and new. Cloud brings new management challenges.

4. Automate – Automation (properly designed, properly tested) can be your friend. Automated recovery and security scripts are much less error prone than those done by people under stress. There are many automated tools and services that can help test and assess your security environment. Automated configuration management brings formal traceability – essential for the highest levels of reliability. Automated regression testing is a great tool to reduce the costs of testing in the longer term.

5. Test – Test for failure modes in both software and business process. Test at points of integration. Test around service and service failures. Test at, and beyond, a system’s capacity limits. Test security. Test recovery. Test testing.

6. Plan for Problems – Introduce a relevant level of risk management. Create plans for business continuity across technology systems and business processes. Understand what happens if a system fails, but also what happens if your team can’t get to the office, or a client declares a security issue.

7. Rehearse Invest in regular rehearsals of problem handling and recovery. Include a robust process for debriefing.

8. Anticipate and Gather Intelligence – For any undertaking of significance, understand potential issues and risks. Larger organisations will need to understand emerging security issues – from the small, technical and specific to more abstract global threats. Truly global organisations will need to sometimes understand patterns of weather – for example: to determine if transport systems are at threat. (I even once developed personal expertise in seismic science and volcanism.)

9. Respond – But finally acknowledge that there will be major issues that will happen, and such issues will often be unexpected. So, a team must focus on:

  • Simply accepting accountability, focusing on resolution and accepting the short-term personal consequences. It is what you are paid for.
  • Setting-up a management structure for the crisis, and trigger relevant business continuity plans
  • Setting up an expert SWAT team, including what is needed from suppliers.
  • How to report diagnosis and resolution – be accurate, be simple, avoid false optimism and be frequent
  • How to communicate with stakeholders in a way that balances information flow and the need for a core team to focus on resolution
  • How to handle media, if you are providing a public service
  • And after the problem is solved and the coffee machine is temporarily retired, how does the team learn

And finally a Toast …

In previous articles, I have acknowledged the Masters of Delivery I have come across in my varied career.

In this domain covered by this article, I have worked with people in roles such as“Global Asset Protection”, “Chief Information Security Officer” and teams across the world responsible for business continuity, security and engineering reliable cloud services. They work on the kind of activity that often goes unacknowledged when things go well – but in the emerging distributed and open future technology world, they are all essential. To me, these are unsung “Masters of Delivery.” Given this is the start of 2015, let’s raise a virtual glass in celebration of their work. We all benefit by it.

Keith Haviland

This is a longer version of an article originally posted on linkedin.  Keith Haviland is a business and technology leader, with a special focus on how to combine big vision and practical execution at the very largest scale, and how new technologies will reshape tech services. He is a Former Partner and Global Senior Managing Director at Accenture, and founder of Accenture’s Global Delivery Network. Published author and active film producer, including Last Man on the Moon. Advisor/investor for web and cloud-based start-ups.