Category Archives: Execution

Danger was the safest thing in the world if you went about it right

I am now contributing safety thoughts and ideas on safetydifferently.com. Here is a reprint of my initial posting. If you wish to add a comment, I suggest you first read the other comments at safetydifferently.com and then includes yours at the end to join the conversation.

Danger was the safest thing in the world if you went about it right

This seemingly paradoxical statement was penned by Annie Dillard. She isn’t a safety professional nor a line manager steeped in safety experiences. Annie is a writer who in her book The Writing Life became fascinated by a stunt pilot, Dave Rahm.

“The air show announcer hushed. He had been squawking all day, and now he quit. The crowd stilled. Even the children watched dumbstruck as the slow, black biplane buzzed its way around the air. Rahm made beauty with his whole body; it was pure pattern, and you could watch it happen. The plane moved every way a line can move, and it controlled three dimensions, so the line carved massive and subtle slits in the air like sculptures. The plane looped the loop, seeming to arch its back like a gymnast; it stalled, dropped, and spun out of it climbing; it spiraled and knifed west on one side’s wings and back east on another; it turned cartwheels, which must be physically impossible; it played with its own line like a cat with yarn.”

When Rahm wasn’t entertaining the audience on the ground, he was entertaining students as a geology professor at Western Washington State College. His fame to “do it right “ in aerobatics led to King Hussein recruiting him to teach the art and science to the Royal Jordanian stunt flying team. While in Jordan performing a maneuver, Rahm in his plane plummeted to the ground and burst into flames. The royal family and Rahm’s wife and son were watching. Dave Rahm was instantly killed.

After years and years of doing it right, something went wrong for Dave Rahm. How could have this happen? How can danger be the safest thing? Let’s turn our attention towards Resilience Engineering and the concept of Emergent Systems. By viewing Safety as an emergent property of a complex adaptive system, Dillard’s statement begins to make sense.

Clearly a stunt pilots pushes the envelope by taking calculated risks. He gets the job done which is to thrill the audience below. Rahm’s maneuver called “headache” was startling as the plane stalled and spun towards earth seemingly out of control. He then adjusted his performance to varying conditions to bring the plane safely under control. He wasn’t pre-occupied with what to avoid and what not to do. He knew in his mind what was the right thing to do.

Operating pointWe can apply Richard Cook’s modified Rasmussen diagram to characterize this deliberate moving the operating point towards failure but taking action to pull back from the edge of failure. As the op point moves closer to failure, conditions change enabling danger as a system property to emerge. To Annie Dillard this aggressive head into, pulling back action was how Danger was the safest thing in the world if you went about it right.

“Rahm did everything his plane could do: tailspins, four-point rolls, flat spins, figure 8’s, snap rolls, and hammerheads. He did pirouettes on the plane’s tail. The other pilots could do these stunts, too, skillfully, one at a time. But Rahm used the plane inexhaustibly, like a brush marking thin air.”

The job was to thrill people with acts that appeared dangerous. And show after show Dave Rahm pleased the crowd and got the job done. However, on his fatal ride, Rahm and his plane somehow reached a non-linear complexity phenomenon called the tipping point, a point of no return, and sadly paid the final price.

Have you encountered workers who behave like stunt pilots? A stunt pilot will take risks and fly as close to the edge as possible. If you were responsible for their safety or a consultant asked to make recommendations, what would you do? Would you issue a “cease and desist” safety bulletin? Add a new “safety first…”rule to remove any glimmers of workplace creativity? Order more compliance checking and inspections? Offer whistle-blowing protection? Punish stunt pilots?

On the other hand, you could appreciate a worker’s willingness to take risks, to adjust performance when faced with unexpected variations in everyday work. You could treat a completed task as a learning experience and encourage the worker to share her story. By showing Richard Cook’s video you could make stunt pilots very aware of the complacency zone and over time, how one can drift into failure. This could lead to an engaging conversation about at-risk vs. reckless behaviour.

How would you deal with workers who act as stunt pilots? Command & control? Educate & empower? Would you do either/or? Or do both/and?

Do you lead from the Past or lead to the Future?

Ahhah

Recently Loren Murray, Head of Safety for Pacific Brands in Australia penned a thought provoking blog on the default future, a concept from the book  ‘The Three Laws of Performance’. I came across the book a few years ago and digested it from a leader effectiveness standpoint. Loren does a nice job applying it to a safety perspective.

“During my career I noticed that safety professionals (and this included myself) have a familiar box of tricks. We complete risk assessments, enshrine what we learn into a procedure or SOP, train on it, set rules and consequences, ‘consult’ via toolboxes or committees and then observe or audit.

When something untoward happens we stop, reflect and somehow end up with our hands back in the same box of tricks writing more procedures, delivering more training (mostly on what people already know), complete more audits and ensure the rules are better enforced….harder, meaner, faster. The default future described in The Three Laws of Performance looked a lot like what I just described!

What is the default future? We like to think our future is untold, that whatever we envision for our future can happen….However for most of us and the organisations we work for, this isn’t the case. To illustrate. You get bitten by a dog when you are a child. You decide dogs are unsafe. You become an adult, have kids and they want a dog. Because of your experiences in the past it is unlikely you will get a dog for your kids. The future isn’t new or untold it’s more of the past. Or in a phrase, the past becomes our future. This is the ‘default future’.

Take a moment to consider this. It’s pretty powerful stuff with implications personally and organisationally. What you decide in the past will ultimately become your future.

How does this affect how we practice safety? Consider our trusty box of tricks. I spent years learning the irrefutable logic of things like the safety triangle and iceberg theory. How many times have I heard about DuPont’s safety journey? Or the powerful imagery of zero harm. The undeniable importance of ‘strong and visible’ leadership (whatever that means) which breeds catch phrases like safety is ‘priority number one’.

These views are the ‘agreement reality’ of my profession. These agreements have been in place for decades. I learnt them at school, they were confirmed by my mentors, and given credibility by our regulators and schooling system. Some of the most important companies in Australia espouse it, our academics teach it, students devote years to learning it, workers expect it…. Our collective safety PAST is really powerful.”

 
Loren’s blog caused me to reflect on the 3 laws and how they might be applied in a complexity-based safety approach. Let’s see how they can help us learn so that we don’t keep on repeating the past.
First Law of Performance
“How people perform correlates to how situations occur to them.”
It’s pretty clear that the paradigms which dominate current safety thinking view people as error prone or create problems working in idealistic technological systems, structures, and processes. Perplexed managers get into a “fix-it” mode by recalling what worked in the past and assume that is the solution going forward. It’s being mindful of perception blindness and opening both eyes.
Second Law of Performance
“How a situation occurs arises in language.”

As evidence-based safety analysts, we need to hear the language and capture the conversations. One way is the Narrative approach where data is collected in the form of stories. We may even go beyond words and collect pictures, voice recordings, water cooler snippets, grapevine rumours, etc. When we see everything as a collective, we can discover themes and patterns emerging. These findings could be the keys that lead to an “invented” future.

Third Law of Performance
“Future-based language transforms how situations occur to people.”

Here are some possible yet practical shifts you can start with right now:

  • Let’s talk less about inspecting to catch people doing the wrong things and talk more about Safety-II; i.e., focusing on doing what’s right.
  • Let’s talk less about work-as-imagined deviations and more about work-as-done adjustments; i.e., less blaming and more appreciating and learning how people adjust performance when faced with varying, unexpected conditions.
  • Let’s talk less about past accident statistics and injury reporting systems and talk more about sensing networks that trigger anticipatory awareness of non-predictable negative events.
  • Let’s talk less about some idealistic Future state vision we hope to achieve linearly in a few years and talk more about staying in the Present, doing more proactive listening, and responding to the patterns that emerge in the Now.
  • And one more…let’s talk less about being reductionists (breaking down a social-technical system into its parts) and talk more about being holistic and understanding how parts (human, machines, ideas, etc.) relate, interact, and adapt together in a complex work environment.

The “invented” future conceivably may be one that is unknowable and unimaginable today but will emerge with future-based conversations.

What are you doing as a leader today? Leading workers to the default future or leading them to an invented Future?

Click here to read Loren’s entire blog posting.

When thinking of Safety, think of coffee aroma

CoffeeSafety has always been a hard sell to management and to front-line workers because, as Karl Weick put forward, Safety is a dynamic non-event. Non-events are taken for granted. When people see nothing, they presume that nothing is happening and that nothing will continue to happen if they continue to act as before.

I’m now looking at Safety from a complexity science perspective as something that emerges when system agents interact. An example is aroma emerging when hot water interacts with dry coffee grinds. Emergence is a real world phenomenon that System Thinking does not address.

Safety-I and Safety-II do not create safety but provide the conditions for Safety to dynamically emerge. But as a non-event, it’s invisible and people see nothing. Just as safety can emerge, so can danger as an invisible non-event. What we see is failure (e.g., accident, injury, fatality) when the tipping point is reached. We can also reach a tipping point when we do much of a good thing. Safety rules are valuable but if a worker is overwhelmed by too many, danger in terms of confusion, distraction can emerge.

I see great promise in advancing the Safety-II paradigm to understand what are the right things people should be doing under varying conditions to enable safety to emerge.

For further insights into Safety-II, I suggest reading Steven Shorrock’s posting What Safety-II isn’t on Safetydifferently.com. Below are my additional comments under each point made by Steven with a tie to complexity science. Thanks, Steven.

Safety-II isn’t about looking only at success or the positive
Looking at the whole distribution and all possible outcomes means recognizing there is a linear Gaussian and a non-linear Pareto world. The latter is where Black Swans and natural disasters unexpectedly emerge.

Safety-II isn’t a fad
Not all Safety-I foundations are based on science. As Fred Manuelle has proven, Heinrich’s Law is a myth. John Burnham’s book Accident Prone offers a historical rise and fall of the accident proneness concept. We could call them fads but it’s difficult to since they have been blindly accepted for so long.

This year marks the 30th anniversary of the Santa Fe Institute where Complexity science was born. At the May 2012 Resilience Lab I attended, Erik Hollnagel and Richard Cook introduced the RMLA elements of Resilience engineering: Respond, Monitor, Learn, Anticipate. They fit with Cognitive-Edge’s complexity view of Resilience: Fast recovery (R), Rapid exploitation (M,L), Early detection (A). This alignment had led to one way to operationalize Safety-II.

Safety-II isn’t ‘just theory’
As a pragmatist, I tend to not use the word “theory” in my conversations. Praxis is more important to me instead of spewing theoretical ideas. When dealing with complexity, the traditional Scientific Method doesn’t work. It’s not deductive nor inductive reasoning but abductive. This is the logic of hunches based on past experiences  and making sense of the real world.

Safety-II isn’t the end of Safety-I
The focus of Safety-I is on robust rules, processes, systems, equipment, materials, etc. to prevent a failure from occurring. Nothing wrong with that. Safety-II asks what can we do to recover when failure does occur plus what can we do to anticipate when failure might happen.

Resilience can be more than just bouncing back. Why return to the same place only to be hit again? Early exploitation means finding a better place to bounce to. We call it “swarming” or Serendipity if an opportunity unexpectedly arises.

Safety-II isn’t about ‘best practice’
“Best” practice does exist but only in the Obvious domain of the Cynefin Framework. It’s the domain of intuition and the Thinking Fast in Daniel Kahneman’s book Thinking Fast and Slow. What’s the caveat with best practices? There’s no feedback loop. So people just carry on as they did before.  Some best practices become good habits. On the other hand, danger can emerge from the baddies and one will drift into failure.

Safety-II and Resilience is about catching yourself before drifting into failure. Being alert to detect weak signals (e.g., surprising behaviours, strange noises, unsettling rumours) and having physical systems and people networks in place to trigger anticipatory awareness.

Safety-II isn’t what ‘we already do’
“Oh, yes, we already do that!” is typically expressed by an expert. It might be a company’s line manager or a safety professional. There’s minimal value challenging the response.  You could execute an “expert entrainment breaking” strategy. The preferred alternative? Follow what John Kay describes in his book Obliquity: Why Our Goals are Best Achieved Indirectly.

Don’t even start by saying “Safety-II”. Begin by gathering stories and making sense of how things get done and why things are done a particular way. Note the stories about doing things the right way. Chances are pretty high most stories will be around Safety-I. There’s your data, your evidence that either validates or disproves “we already do”. Tough for an expert to refute.

Safety-II isn’t ‘them and us’
It’s not them/us, nor either/or, but both/and.  Safety-I+Safety-II. It’s Robustness + Resilience together.  We want to analyze all of the data available, when things go wrong and when things go right.

The evolution of safety can be characterized by a series of overlapping life cycle paradigms. The first paradigm was Scientific Management followed by the rise of Systems Thinking in the 1980s. Today Cognition & Complexity are at the forefront. By honouring the Past, we learn in the Present. We keep the best things from the previous paradigms and let go of the proven myths and fallacies.

Safety-II isn’t just about safety
Drinking a cup of coffee should be a total experience, not just tasting of the liquid. It includes smelling the aroma, seeing the Barista’s carefully crafted cream design, hearing the first slurp (okay, I confess.) Safety should also be a total experience.

Safety can emerge from efficient as well as effective conditions.  Experienced workers know that a well-oiled, smoothly running machine is low risk and safe. However, they constantly monitor by watching gauges, listening for strange noises, and so on. These are efficient conditions – known minimums, maximums, and optimums that enable safety to emerge. We do things right.

When conditions involve unknowns, unknowables, and unimaginables, the shift is to effectiveness. We do the right things. But what are these right things?

It’s about being in the emerging Present and not worrying about some distant idealistic Future. It’s about engaging the entire workforce (i.e., wisdom of crowds) so no hard selling or buying-in is necessary.  It’s about introducing catalysts to reveal new work patterns.  It’s about conducting small “safe-to-fail” experiments to  shift the safety culture. It’s about the quick implementation of safety solutions that people want now.

Signing off and heading to Starbucks.

My story: A day with Sidney Dekker

A story is an accounting of an event as experienced through the eyes, ears, cognitive biases, and paradigms of one person. This is my story about attending the Day with Sidney Dekker ’ at the Vancouver Convention Centre on Friday September 19 2014.  The seminar was sponsored by the Lower Mainland chapter of CSSE (Canadian Society of Safety Engineering). I initially heard about the seminar through my associations with RHLN and the HFCoP.

I was aware that Sidney Dekker (SD) uses very few visual slides and provides no handouts. So I came fully prepared to take copious notes with my trusty iPad Air. This not a play-by-play (or a blow-by-blow if you take in SD’s strong opinions on HR, smart managers bent on controlling dumb workers, etc.)  I’ve shifted content around to align my thinking and work I’ve done co-developing our Resilient Safety Culture course with Cognitive-Edge. My comments are in square brackets and italics.

SD: Goal today is to teach you think about intractable issues in safety
SD: Don’t believe a word I say; indulge me today then go find out for yourself
SD: We care when bad people make mistakes but we should care more when good people make mistakes and why they do

Where are we today in safety thinking?


Here is a recent article  announcing a new safety breakthrough [http://bit.ly/1mVg19a]
LIJ medical center has implemented a safety solution that will be all to end all
A remote video auditing (RVA) in a surgical room developed by Arrowsight [http://bit.ly/1mVh2yf]

RVA monitors status every 2 minutes for tools left in patients, OR team mistakes
Patient Safety improved to a near perfect score
Culture of safety and trust is palpable among the surgical team
Real-time feedback on a  smartphone
RVA is based on the “bad apple” theory and model and an underlying assumption there is a general lack of vigilence
Question: Who looks at the video?
Ans: Independent auditor who will cost money. Trade-off tension created between improving safety or keeping costs down
Assumption: He who watches knows best so OR team members are the losers
Audience question: What if the RVA devices weren’t physically installed but just announced; strategy is to put in people’s minds that someone is watching to avoid complacency
SD: have not found any empirical evidence that being watched improves safety. But it does change behaviour to look good for the camera
Audience question: Could the real purpose of the RVA be to protect the hospital’s ass during litigation cases?
SD: Very good point! [safety, cost, litigation form a SenseMaker™ triad to attach meaning to a story]
One possible RVA benefit: Coaching & Learning
If the video watchers are the performers, then feedback is useful for learning purposes
Airline pilots can ask to replay the data of a landing but only do so on the understanding there are serious protections in place – no punitive action can be a consequence of reviewing data
Conclusion: Solutions like RVA give the illusion of perfect resolution
 
How did we historically arrive at the way we look at safety and risk today?
[Reference SD’s latest book released June 2014:  “Safety Differently” which is an update of “Ten Questions About Human Error: A New View of Human Factors and System Safety”]
[SD’s safety timeline aligns the S-curve diagram developed by Dave Snowden http://gswong.com/?page_id=11]

Late Victorian Era

Beginning of measurement (Germany, UK) to makes things visible
Discover industrial revolution kills a lot of people, including children
Growing concern with enormous injury and fatality problem
Scholars begin to look at models
1905 Rockwell: pure accidents (events that cannot be anticipated) seldom happen; someone has blundered or reversed a law of nature
Eric Farmer: carelessness or lack of attention of the worker
Oxford Human Factor definition: physical, mental, or moral shortcoming of the individual that predisposes the person

We still promote this archaic view today in programs like Hearts & Mind [how Shell and the Energy Institute promote world class HSE]
campaigns with posters, banners, slogans
FAITH-BASED safety approach vs. science-based

In 2014, can’t talk about physical handicaps but are allowed to for mental and moral (Hearts and Minds) human deficiencies
SD: I find it offensive to be treated as an infantile

1911 Frederick Taylor introduced Scientific Management to balance the production of pigs, cattle
Frank Gilbreth conducted time and motion studies
Problem isn’t the individual but planning, organizing, and managing
Scientific method is to decompose into parts and find 1 best solution [also known as Linear Reductionism]
Need to stay with 1 best method (LIJ’s RVA follows this 1911 edict)
Focus on the non-compliant individual using line supervision to manage dumb workers
Do not let people to work heuristically [rule of thumb] but adamantly adhere to the 1 best method
We are still following the Tayloristic approach
Example: Safety culture quote in 2000: “It is generally acknowledged that human frailty lies behind the majority of our accidents. Although many of these have been anticipated by rules, procedures, some people don’t do what they are supposed to do. They are circumventing the multiple defences that management has created.”

It’s no longer just a Newton-Cartesian world

        Closed system, no external forces that impinge on the unit
        Linear cause & effect relationships exist
        Predictable, stable, repeatable work environment
        Checklists, procedures are okay
        Compliance with 1 best method is acceptable

Now we know the world is complex, full of perturbations, and not a closed system 

[Science-based thinking has led to complex adaptive systems (CAS) http://gswong.com/?wpfb_dl=20]

SD’s story as an airline pilot
Place a paper cup on the flaps (resilience vs. non-compliance) because resilience is needed to finish the design of the aircraft by the operators
Alway a gap between Work-as-imagined vs Work-as-done [connects with Erik Hollnagel’s Safety-II]
James Reason calls the gap a non-compliance violation; we can also call that gap Resilience – people have to adapt to the local conditions using their experience, knowledge, judgement

SD: We pay people more money who have experience. Why?  Because the 1 best method may not work
There is no checklist to follow
Taylorism is limited and can’t go beyond standardization

Audience question: Bathtub curve model for accidents – more accidents involving younger and older workers. Why does this occur?
SD: Younger workers are beaten to comply but often are not told why so lack understanding
Gen Y doesn’t believe in authority and sources of knowledge (prefer to ask a crowd, not an individual)
SD: Older worker research suggests expertise doesn’t create safety awareness. They know how close they can come to the margin but if they go over the line, slower to act. [links with Richard Cook’s Going Solid / Margin of Manoeuvre concept http://gswong.com/?wpfb_dl=18]

This is not complacency (a motivational issue) but an attenuation towards risk. Also may not be aware the margins have moved (example: in electric utility work, wood cross-arm materials have changed). Unlearning, teaching the old dog new tricks is difficult.[Master builder/Apprenticeship model: While effective for passing on tacit knowledge, danger lies in old guys becoming stale and passing on myths and old paradigms]

1920s & 1930s – advent of Technology & animation of Taylorism

World is fixed, technology will solve the problems of the world
Focus on the person using rewards and punishment, little understanding of deep ethical implications
People just need to conform to technology, machines, devices [think of Charlie Chaplin’s Modern Times movie]

Today: Behaviour-based Safety (BBS) programs still follow this paradigm re controlling human behaviour
Example: mandatory drug testing policy. What does this do to an organization?
In a warehouse, worker is made to wear a different coloured vest (a dunce cap)
“You are the sucker who lost this month’s safety bonus!” What happens to trust, bonding?

Accident Proneness theory (UK, Germany 1925)

Thesis is based on data and similar to Bad Apply theory
[read John Burnham’s book http://amzn.to/1mV63Vn ]
Data showed some people more involved in accidents than others (eg. 25% cause 55%)
Idea was to target these individuals
Aligned with the eugenic thinking in the 1920s (Ghost of the time/spirit/zeitgeist)
        Identify who is fit and weed out (exterminate) the unfit [think Nazism]
Theory development carried on up the WWII
Question: what is the fundamental statistical flaw with this theory?
Answer: We all do the same kind of work therefore we all have the same probability of incurring an accident
Essentially comparing apples with oranges
We know better – individual differences exist in risk tolerance
SD: current debate in medical journal: data shows 3% of surgeons causing majority of deaths
Similar article in UK 20% causing 80%
So, should we get rid of these accident-prone surgeons?
No, because the 3% may include the docs who are willing to take the risk to try something new to save a life

WWII Technologies

Nuclear, radar, rocketry, computers
Created  a host of new complexities, new usability issues

Example: Improvements to the B17 bomber
Hydraulic gear and flap technology introduced
However, belly-flop landings happened
Presumed cause was dumb pilots who required more training, checklists, and punishment
Would like to remove these reckless accident-prone pilots damaging the planes
However, pilots are in short supply plus give them a break – they have been shot at by the enemy trying to kill them
Shifted focus from human failure to design flaws. Why do 2 switches in dashboard look the same?
In 1943 redesigned switch to prevent bellyflopping
Message: Human error is systemically connected and predictability so to the features of tools and products that people use. Bad design induces errors. Better to intervene in the context of people’s work.

Safety thinking begins to change: What happens in the head is acutely important.
Now interested in cognitive psychology [intuition, reasoning, decision-making] not just behavioural psychology [what can be observed]
Today: Just Culture policy (human error, at-risk behaviour, reckless behaviour)

After lunch exercise: Greek airport 1770m long

Perceived problem: breaking EU rules by taxiing too close to the road
White line – displaced threshold – don’t land before this line
Need to rapidly taxi back to the terminal to unload for productivity reasons (plane on-the-ground costs money)
Vehicular traffic light is not synced with plane landing (i.e., random event)

Question: How do you stop non-compliant behaviour if you are the regulator? How might you mitigate the risk?
SD: Select a solution approach with choices including Taylorism, Just Culture, Safety by Design
Several solutions heard from the audience but no one-best

SD: Conformity and compliance rules are not the answer, human judgment required
Situation is constantly changing – Tarmac gets hot in afternoon; air rises so may need to come in at a lower angle. At evening when cooler, approach angle will change
[Reinforces the nature of a CAS where agents like weather can impact  solutions and create emergent, unexpected consequences]
SD concern: Normalization of deviance – continual squeezing of the boundaries and gradual erosion of safety margins
They’re getting away with it but eventually there will be fatal crash
[reminds me of the frog that’s content to sit in the pot of water as the temperature is slowly increased. The frog doesn’t realize it’s slowly cooking to death until it’s too late}
[Discussed in SD’s Drift into Failure book and http://gswong.com/?p=754]
Back to the historical timeline…

1980s Systems Thinking

James Reason’s Swiss Cheese Model undermines our safety efforts
        Put in layers of defence which reinforces the 1940s thinking
        Smarter managers to protect the dumb workers
        Cause and effect linear model of safety
Example: 2003 Columbia space shuttle re-entry
        Normal work was done, not people screwing up (foam maintenance)
        There were no holes according to the Swiss Cheese Model
        Emergence: Piece of insulation foam broke off damaging the wing
Example: 1988 Piper Alpha oil rig
        Prior to accident, recognized as the most outstanding safe and productive oil rig
        Explosion due to leaking gas killing 167
        “I knew everything was right because I never got a report anything was wrong”
       Looking for the holes in the Swiss Cheese Model again
       Delusion of being safe due to accident-free record

Many people carry an idealistic image of safety: a world without harm, pain, suffering
Setting a Zero Harm goal is counter-productive as it suppresses reporting and incents manipulation of the numbers to look good

Abraham Wald example
Question: Where should we put the armour on a WWII bomber?Wrong analysis: Let’s focus on the holes and put armour there to cover them up
Right analysis: Since the plane made it back, there’s no need for armour on the holes!
Safety implication: Holes represent near-miss incidents (bullets that fortunately didn’t down the plane). We shouldn’t be covering the holes but learning from them

Safety management system (SMS)
Don’t rest on your laurels thinking you finally figured it out with a comprehensive SMS
Australian tunnelling example:
Young guy dies working near an airport
There were previous incidents with the contractor but no connection was made
Was doing normal work but decapitated finishing the design
An SMS will never pick this up

Don’t be led astray by the Decoy phenomenon
Only look at what we can count in normal work and ignore other signals
Example: Heinrich triangle – if we place our attention on the little incidents, then we will avoid the big ones (LTA, fatality) [now viewed as a myth like Accident Prone theory]
Some accidents are unavoidable  – Barry Turner 1998 [Man-made Disasters]
Example: Lexington accident [2006 Comair Flight 5191] when both technology and organization failed

Complexity has created huge, intractable problems
In a world of complexity, we can kill people without precursory events
[If we stay with the Swiss Cheese Model idea, then Complexity would see the holes on a layer dynamically moving, appearing, disappearing and layers spinning randomly and melting together to form new holes that were unknowable and unimaginable]

2014

Safety has become a bureaucratic accountability rather than an ethical responsibility
Amount of fraud is mounting as we continue measuring and rewarding the absence of negative incidents
Example: workers killed onsite are flown back home in a private jet to cover up and hide accidents
If we can be innovative and creative to hide injuries and fatalities, why can’t we use novel ways to think about safety differently?
Sense of injustice on the head of the little guy

Advances in Safety by Design
“You’re not lifting properly” compared “the job isn’t designed properly”
An accident is a free lesson, learning opportunity, not a HR performance problem
Singapore example: Green city which to grow must go vertically up. Plants grow on all floors of a tall building. How to maintain?
One approach is to punish the worker if accident occurs
Safety by Design solution is to design wall panels that rotate to maintain plants; no fall equipment needed
You can swat the mosquito but better to drain the swamp

Why can’t we solve today’s problems the same way we solved them back in the early 1900s?

What was valued in the Victorian Era

  1. People are a problem to control
  2. We control through intervention at the level of their behaviour
  3. We define safety as an absence of the Negative

Complexity requires a shift in  what we value today

  1. People are a solution, a resource
  2. Intervene in the context and condition of their work
  3. Instead of measuring and counting negative events, think in terms of the presence of positive things – opportunities, new discoveries, challenges of old ideas

What are the deliverables we should aim for today?

Stop doing work inspections that treat workers like children
It’s arrogant believing that an inspector knows better
Better onsite visit: Tell me about your work. What’s dodgy about your job?
Intervene the job, not the individual’s behaviour.
Collect authentic stories.
[reinforces the practice of Narrative research http://gswong.com/?page_id=319]

Regulators need to shift their deliverables from engaging reactively (getting involved after the accident has occurred), looking for root causes, and formulating policy constraints
Causes are not things found objectively; causes are constructed by the human mind [and therefore subject to cognitive bias]
Regulators should be proactively co-evolving the system [CAS]
Stop producing accident investigation reports closing with useless recommendations to coach and gain commitment
Reference SD’s book: Field Guide to Investigating accidents – what you look for you will find

Question: Where do we place armour on a WWII bomber if we don’t patch the holes?
Answer: where we can build resilience by enabling the plane to take a few hits and still make it back home
[relates to the perspective of resilience in terms of the Cynefin Framework http://gswong.com/?page_id=21]

Resilience Engineering deliverables

  1. Do we keep risk awareness alive? Debrief and more debrief on the mental model? Even if things seem to be under control? Who leads the debriefing? Did the supervisor or foreman do a recon before the job starts to lessen surprise? [assessing the situation in the Cynefin Framework Disorder domain]
  2. Count the amount of rework done – can be perceived as a leading indicator although it really lags since initial work had been performed
  3. Create ways for bad news to be communicated without penalty. Stat: 83% of plane accidents occur when pilots are flying and 17% when co-pilots are.  Institute the courage to speak up and say no. Stop bullying to maintain silience. It is a measure of Trust and empowers our people. Develop other ways such as role playing simulations, rotation of managers which identify normalization of deviance (“We may do that here but we don’t do that over there”)
  4. Count the number of fresh perspectives and opinions that are allowed to be aired. Count the number of so-called best practice rules that are intelligently challenged. [purpose of gathering stories in a Human Sensor Network http://gswong.com/?page_id=19]
  5. Count number or % of time re human-human relationships (not formal inspections) but honest and open conversations that are org hierarchy-free.

Paradigm Shift:

Spend less time and effort on things that go wrong [Safety-I]
Invest more effort on things that go right which is most of the time [Safety-II]

Final message:

Don’t do safety to satisfy Bureaucratic accountability
Do safety for Ethical responsibility reasons

There were over 100 in attendance so theoretically there are over 100 stories that could be told about the day. Some will be similar to mine and my mind is open to accepting some will be quite different (what the heck was Gary smoking?)  But as we know, the key to understanding complexity is Diversity –  the more stories we seek and allow to be heard, the better representation of the real world we have.

Safety-I + Safety-II

At a July 03 hosted conference Dave Snowden and Erik Hollnagel shared their thoughts about safety. Dave’s retrospects of their meeting are captured in his blog posting. Over the next few blogs I’ll be adding my reflections as a co-developer of Cognitive-Edge’s Creating and Leading a Resilient Safety Culture course.

Erik introduced Safety-II to the audience, a concept based on an understanding of what work actually is, rather than what it is imagined to be. It involves placing more focus on the everyday events when things go right rather than on errors, incidents, accidents when things go wrong. Today’s dominating safety paradigm is based on the “Theory of Error”. While Safety-I thinking has advanced safety tremendously, its effectiveness is waning and is now on the downside of the S-curve. Erik’s message is that we need to escape and move to a different view based on the “Theory of Action”.

Erik isn’t alone. Sidney Dekker’s latest presentation on the history of safety reinforces how little safety thinking has changed and how we are plateauing. Current programs such as Hearts & Minds continue to assume people have physical, mental, and moral shortcomings as was done way back in the early 1900s.

Dave spoke about Resilience and why the is critical as its the outliers where you find threat and opportunity. In our CE safety course, we refer to the Safety-I events that help prevent things from going wrong as Robustness. This isn’t an Either/Or situation but a Both/And. You need both Robustness + Resilience.

As a young electrical utility engineer, the creator of work-as-imagined, I really wanted feedback but struggled obtaining it. It wasn’t until I developed a rapport with the workers was I able to close the feedback loop to make me a better designer. Looking back I realize how fortunate I was since the crews were in proximity and exchanges were eye-to-eye.

During these debriefs I probably learned more from the “work-as-done” stories. I was told changes were necessary due to something that I had initially missed or overlooked. But more often it was due to an unforeseen situation in the field such as a sudden shift in weather or unexpected interference from other workers at the job site. Crews would make multiple small adjustments to accommodate varying conditions without fuss, bother, and okay, the occasional swear word.

I didn’t know it then but I know now: these were adjustments one learns to anticipate in a complex adaptive system. It was also experiencing Safety-II and Resilience in action in the form of narratives (aka stories).

Apple buying Beats might be a safe-to-fail experiment

The music industry is a complex adaptive system (CAS). The industry is full of autonomous agents who have good and bad relationships with each other. Behaviours and reactive consequences can build on each other.  Media writers and industry analysts are also agents who are  easily attracted to big events. Their comments and opinions add to the pile and fuel momentum. However the momentum is nonlinear. Interest in the  topic will eventually fall off  as pundits tire and move on or a feverish pitch continues. Alternatively a CAS phenomenon called tipping point occurs. The music industry then changes. It might be small or a huge paradigm shift. It can’t be predicted; it will just emerge . In complexity jargon, the the system doesn’t evolve but co-evolves.  It’s asymmetrical – in other words, there is no reset or UNDO button to go back prior to the event.

While I might have an opinion about Apple buying Beats, I’m more interested in observing music industry behaviour. Here’s one perspective. I’ll use complexity language and apply the Cynefin Framework.

1. Apple is applying Abductive thinking and playing a hunch.

“Let’s buy Beats because the deal might open up some cool serendipitous opportunities. We can also generate some free publicity and let others promote us, and have fun keeping people guessing.  Yeh, it may be a downer if they write we’re nuts. But on the upside they are helping us by driving the competition crazy.”

2. Apple is probing the music industry by conducting a safe-to-fail experiment.

“It’s only $3.2B so we can use some loose change in our pockets. Beats is pulling in $1B annual revenue so really it’s no big big risk.”

3. Apple will monitor agent behaviour and observe what emerges.

“Let’s see what the media guys say.
“Let’s read about researchers guessing what we’re doing.”
“Let’s watch the business analysts  tear their hair out trying to figure out a business case  with a positive NPV. Hah! If they only knew a business case is folly in the Complex domain since predictability is impossible. That’s why we’re playing a hunch which may or may not be another game changer for us.”

4. If the Apple/Beats deal starts going sour, dampen or shut down the experiment.

“Let’s have our people on alert to detect unintended negative consequences. We can dampen the impact by introducing new information and watch the response. If we feel it’s not worth saving, we’ll cut our losses. The benefits gained will be what we learn from the experiment.”

5. If the Apple/Beats deal takes off, accelerate and search for new behaviour patterns to exploit.

“The key agents in the CAS to watch are the consumers. Observing what they buy is easy.  What’s more important is monitoring what they don’t buy.  We want to discover where they are heading and what the is strange attractor. It might be how consumers like to stream music, how they like to listen to music (why only ears?), or simply cool headphones are fashion statements.”

6. Build product/service solutions that  exploit this new pattern opportunity.

“Once we discover and understand the new consumer want, be prepared to move quickly.  Let’s ensure our iTunes Radio people are in the loop as well as the AppleTV and iWatch gangs. Marketing should be ready to use the Freemium business model. We’ll offer the new music  service for free to create barriers of entry to block competitors  who can’t afford to play the new game. It will be similar to our free medical/safety alert service we’ll offer with the iWatch. Free for Basic and then hook ’em with the gotta-have Premium.”

7. Move from the Complex domain to the Complicated Domain to establish order and stability.

“As soon as we’re pretty certain our Betas are viable, we’ll put our engineering  and marketing teams on it to release Version 1. We’ll also start thinking about Version 2. As before, we’ll dispense with ineffective external consumer focus groups. We’ll give every employee the product/service and gather narrative (i.e., stories) about their experiences. After all, employees are consumers and if it’s not great for us, then it won’t be great for the public.

Besides learning from ourselves, let’s use our Human Sensor network to cast  a wide net on emerging new technologies and ideas. Who knows, we might find another Beats out there we can buy to get Version 2 earlier to market.”

Fantasy? Fiction? The outcomes may be guesses but the Probe, Sense, Respond process in the Cynefin Complex Domain isn’t.

 

Asiana Flight 214 followup

The following excerpts are from Wikipedia regarding Flight 214. What they do is reinforce the paradigm that the Aviation industry is a complex adaptive system (CAS) with many agents like the NTSB and ALPA who interact with each other. The imposed fine of $500K reconfirms the need to Act when in the Chaotic domain but more importantly, Sense and Respond to the needs of all people impacted by communicating your actions clearly and quickly.

“Shortly after the accident, the National Transportation Safety Board (NTSB) used Twitter and YouTube to inform the public about the investigation and quickly publish quotes from press conferences. NTSB first tweeted about Asiana 214 less than one hour after the crash. One hour after that, the NTSB announced via Twitter that officials would hold a press conference at Reagan Airport Hangar 6 before departing for San Francisco. Less than 12 hours after the crash, the NTSB released a photo showing investigators conducting their first site assessment.

Air Line Pilots Association

On July 9, 2013, the Air Line Pilots Association (ALPA) criticized the NTSB for releasing “incomplete, out-of-context information” that gave the impression that pilot error was entirely to blame.

NTSB Chairman Hersman responded: “The information we’re providing is consistent with our procedures and processes … One of the hallmarks of the NTSB is our transparency.  We work for the traveling public. There are a lot of organizations and groups that have advocates. We are the advocate for the traveling public. We believe it’s important to show our work and tell people what we are doing.”  Answering ALPA’s criticism, NTSB spokeswoman Kelly Nantel also said the agency routinely provided factual updates during investigations. “For the public to have confidence in the investigative process, transparency and accuracy are critical,” Nantel said.

On July 11, 2013, in a follow-up press release without criticizing the NTSB, ALPA gave a general warning against speculation.

Fines

On February 25, 2014 the U.S. Department of Transportation (DOT) fined Asiana Airlines US$500,000 for failing to keep victims and family of victims updated on the crash.”

 

When a disaster happens, look for the positive

In last month’s blog I discussed Fast Recovery and Swarming as 2 strategies to exit the Chaotic Domain. These are appropriate when looking for a “fast answer”. A 3rd strategy is asking a “slow question.”

Resilience as Cynefin DynamicsWhile the process flow through the Cynefin Framework is similar to Swarming (Strategy B), the key difference is not looking for a quick solution but attempting to understand the behaviour of agents (humans, machines, events, ideas). The focus is on identifying something positive emerging from the disaster, a serendipitous opportunity worth exploiting.

By conducting safe-to-fail experiments, we can probe the system, monitor agent behaviour, and discover emerging patterns that may lead to improvements in culture, system, process, structure.

Occasions can arise when abductive thinking could yield a positive result. In this type of reasoning, we begin with some commonly well known facts that are already accepted and then works towards an explanation. The vernacular would be playing a hunch.

Snowstorm Repairs

In the electric utility business when the “lights go out”, a trouble crew  is mobilized and the emergency restoration process begins. Smart crews are also on the lookout for serendipitous opportunities. One case involved a winter windstorm causing  a tree branch to fall across the live wires. Upon restoration, the crew leader took it upon himself to contact customers affected by the outage to discuss removal of other potentially hazardous branches. The customers were very willing and approved the trimming. The serendipity arose because these very same customers vehemently resisted in the Fall to have their trees trimmed as part of the routine vegetation maintenance program.  The perception held then was that the trees were in full bloom and aesthetically pleasing; the clearance issues were of no concern. Being out of power for a period of time in the cold winter can shift paradigms.

When a disaster happens, will it be fast recovery or swarming?

Last month’s blog was about Act in the Cynefin Framework’s Chaotic domain.  Be aware you cannot remain in the Chaotic domain as long as you want. If you are not proactively trying to get out it, somebody or something else will be taking action as Asiana Airlines learned.

How you decide to Sense and Respond? We can show 2 proactive strategies:

Resilience as Cynefin DynamicsStrategy A is a fast recovery back to the Ordered side. It assumes you know what went wrong and have a solution documented in a disaster plan ready to be executed.

If it’s not clearly understand what caused the problem and/or you don’t have a ready-made solution in place,  then Strategy B is preferred. This is a “swarming” strategy perfected by Mother Nature’s little creatures, in particular, ants.

AntsIf the path to a food supply is unexpectedly blocked, ants don’t stop working and convene a meeting like humans do. There are no boss ants that command and control. Individual ants are empowered to immediately start probing to find a new path to the food target. Not just one ant, but many participate. Once a new path is found, communication is quickly passed along and a new route is established.

This is Resilience – the ability to bounce back after taking a hit. 

When a disaster happens, how fast do you act?

In the Cynefin framework, we place unexpected negative events into the Chaotic domain. The solution methodology is to Act-Sense-Respond. When a disaster produces personal injuries and fatalities, Act is about initially rendering the situation as safe as possible and stabilizing conditions to prevent additional life-threatening events from occurring.

Whenever a disaster happens, we go into “damage control” mode. We think were in control because we determine what information will be released, when and by whom. Distributing information to the right channels is a key action under Act. We try our best to limit the damage not only to our people and equipment but to our brand, reputation, and credibility. In other terms, we attempt to protect our level of trust with customers/clients, media, general public.

In the latter stages of the 20th century, breakthroughs in information technology meant we had to learn how to quickly communicate because news traveled really fast. In today’s 21st century, news can spread even faster, wider, and cheaper by anyone who can tweet, upload a Facebook or Google+ photo, blog, etc. The damage control window has literally shrunk from hours to minutes to seconds.

This month we sadly experienced a tragedy at SFO when Asian Airlines flight 214 crashed. I recently reviewed slides produced by SimpliFlying, an aviation consultancy focused on crisis management. Their 2013 July 06 timeline of events is mind boggling:

11:27am: Plane makes impact at SFO
11.28am: First photo from a Google employee boarding another flight hits Twitter (within 30 secs!)
11.30am: Emergency slides deployed
11.45am: First photo from a passenger posted on Path, Facebook and Twitter
11.56am: Norwegian journalists asks for permission to use photo from first posters. Tons of other requests follow
1.20pm: Boeing issues statement via Twitter
2.04pm: SFO Fire Department speaks to the press
3.00pm: NTSB holds press conference, and keeps updating Twitter with photos
3.39pm: Asiana Airlines statement released
3.40pm: White House releases statement
8.43pm: First Asiana Press release (6.43am Korea time)

Although Asiana Airlines first Facebook update was welcomed, they did not provide regular updates and didn’t bother replying to tweets. Bottom line was their stock price and brand took huge hits. Essentially they were ill prepared to Act properly.

“In the age of the connected traveller, airlines do not have 20 minutes, but rather 20 seconds to respond to a crisis situation. Asiana Airlines clearly was not ready for this situation that ensued online. But each airline and airport needs to build social media into its standard operating procedures for crises management.”

If you encounter a disaster, how fast are you able to act? Does your emergency restoration plan include social media channels? Do you need to rewrite your Business Disaster Recovery SOPs?

If you choose to revisit or rewrite, what paradigm will you be in? If it’s Systems Thinking, your view is to control information. Have little regard for what others say and only release information when you are ready. Like Asiana Airlines.  If you’re in the Complexity & Sense-Making paradigm, you realize you cannot control but only can influence. You join and participate in the connected network that’s already fast at work commenting on your disaster.

That’s Act. How you decide to Sense and Respond will be subsequently covered.