Tag Archives: Resilience

Early detection, fast recovery, early exploitation

Black Elephants in our Safety Systems

COVID-19 is a black elephant. A black elephant is a cross between a black swan and the proverbial elephant in the room. The black elephant is a problem that is actually visible to everyone, but no one wants to deal with it, and so they pretend it is not there. When it blows up as a problem, we all feign surprise and shock, behaving as if it were a black swan [1].

Nicholas Nassim Taleb popularized the black swan metaphor to describe an event that is rare, unexpected, and has a large negative, game-changing impact. COVID-19 is an infectious disease that waited for the right conditions to emerge. Like an accident just waiting to happen. It reminds me of Todd’s Conklin’s statement: “Workers don’t cause failures. Workers trigger latent conditions that lie dormant in organizations waiting for this specific moment in time.”

Taleb also noted that a black swan event is often inappropriately rationalized after the fact with the benefit of hindsight. This should ring a bell for those with accident investigation experience. It’s the counterfactual argument when work-as-imagined is compared to work-as-done. What’s ignored is the normal variability adjustments the victim had to make due to unexpected changes in the environment.

The emergence of COVID-19 is not a black swan. We shouldn’t have been surprised but we were. There have been 11 pandemics from the Plague of Justinian (541 – 750 AD) to Ebola (2014-2016).[2] In 2015, Bill Gates warned all nations that we have invested very little in a system to stop an epidemic.[3] In the October 2019 Event 201 global pandemic exercise, a call to action was outlined in seven recommendations.[4] Some countries have acted while others have chosen to ignore the peril for political-economic reasons deemed higher priority.

Those that acted installed seemingly robust fail-safe disaster response systems. Scarred by the SARS epidemic that erupted in 2002, China thought they had an airtight response process free from political meddling in place. However, Wuhan local health bureaucrats not wanting to raise an alarm and cause embarrassment suppressed automatic reporting. This fear kept Beijing in the dark and delayed the response. In our words, their fail-safe system failed. [5]

The real surprise is finding out in a painful way how intricately connected we are globally. Our ground, air, water transportation systems made it easy for the COVID-19 disease to spread exponentially. “Going viral” has sickened and killed thousands, crumpled economies, and plunged societal life into a fearful limbo with no easily discernible end in sight. Tightly coupled supply chains we take for granted are disrupted. Westerners are feeling what “made in China” means when local store shelves remain empty. Everyone is having a firsthand excruciating experience of surviving in a complex adaptive system (CAS).

Every organization is a CAS. Every industry is a CAS. So is every state, country, nation. Civilization is a CAS. We are many complex adaptive systems entangled to form one mega CAS called planet Earth. That idea was reinforced seeing Blue Marble, the image of the Earth from Apollo 17. Boomers felt that when Disney showcased It’s a Small World at the 1964 World’s Fair. (Now that we’ve mentioned it, is the tune streaming in your head too? Sorry about that.)

In the spirit of Safety Differently, let’s ask our 3 fundamental questions: Why? What? How? and pose different, non-traditional responses.

WHY… will we face more Black Elephants?

The emphasis in a CAS is on relationships between agents. Besides humans, other agents are machines, events, and ideas. Relationships are typically non-linear, not if-then causal, strengthened or weakened by fast feedback interactions. The Butterfly effect is a small initial change which can yield a huge impact if a tipping point is reached. Non-linear relationships are exponential like the COVID-19 spread. Many relationships in the real world follow a Pareto distribution, a logarithmic power law. Catastrophes like Black Elephants are rare in frequency but huge in severity. These are also called outliers, as they lie outside the realm of regular expectations of the Gaussian world. So it’s not if they will happen but a matter of when.

The level of complexity increases exponentially every time a new relationship is formed. For humans it could be the simple act of friending on Facebook or accepting a LinkedIn invitation. Or a person you don’t know choosing to follow you on Twitter. Annoying outcomes from new connections are more spam emails and unsolicited ads. More disconcerting is computer programmed machines interacting with other smart parts sometimes in hidden ways. When algorithms collide, no one really knows what will happen. Flash market crashes. Non-ethical AI. Boeing 737 Max 8. Or in one hypothesis, the cutting down of rain forests which allowed once-contained diseases like COVID-19 to infect wild animals. Hmm, workers trigger latent conditions that lie dormant

Realistically organizations are unable to develop emergency plans for every disaster identified. Even if they had unlimited time and money, there’s no guarantee that the recovery solutions will be successful. And by definition, we can’t plan for unknowables and unimaginables.

WHAT…can we do Today?

The starting point is understanding the present situation of the CAS. The Cynefin Framework has been widely around the world in contexts as diverse as the boardrooms of international fashion houses, militaries, NGOs, and SWAT teams on city streets. For a brief explanation of the sense-making framework, check out Jennifer Garvey Berger’s YouTube video.

The above graphic maps the type of safety decisions made and actions executed in each Cynefin domain. No domain is better than any other. The body of knowledge that Safety-I has provided is clearly in the Obvious and Complicated domains. Much of Safety-II advancements reside in the Complicated domain as experts wrestle with new processes and tools. Whether these can be made easy for anyone to use and thus moved into the Obvious domain remains to be seen. A major accomplishment would be shifting the front-line worker mindset to include what goes right when planning a job.

Now let’s apply sense-making in our battle with COVID-19. Be aware this is a dynamic exercise with relationships, events, and ideas constantly changing or emerging. Also note that the Cynefin Framework is a work-in-progress. The Obvious domain is being renamed as the Clear domain.

Self-isolation is in the Clear domain; you haven’t been tested so avoid others as a precautionary measure. Self-quarantine means you have tested positive; your act is to monitor your conditions and respond as you get better or worse.

Conspiracy theorists are in the far corner of the Clear domain. They believe their ordered life mindset has been deliberately disrupted. Strongly held beliefs range from political party subterfuge to willingness to risk death to save the economy to blaming 5G.

At the time of this writing, experts have not agreed if the mandatory wearing of masks will help or hinder the COVID-19 battle. Two resistance movements are shown. Both reside in the Cynefin Complex domain near the Chaotic border. Not all Coronavirus challenges hoping to go viral have been positive. Licking toilet seats may have garnered lots of social media attention for the challenge creator. But it has plunged one follower into the Chaotic domain with his testing positive.[6] Some who attended parties with a feeling of invincibility have also fallen into the Chaotic domain.[7]

The Disorder domain is associated with confusion and not knowing what to do. Many myths, hoaxes, and fake news include expert quotes and references. When eventually exposed as fiction and not fact, they can up the level of personal frustration and anxiety.

One fact is that your overarching safety strategy hasn’t changed: strengthen Robustness + build Resilience. As this article is about surviving change, let’s focus our attention on 3 capabilities of a resilient organization.

With the COVID-19 battle still raging, the chances of doing a fast recovery and returning to the original operating point [A] are practically slim to none.

If we know that we have black elephants, then we have an early detection system in place [C]. Being caught with our pants down simply means the messenger doesn’t have enough trust and respect or the organization bureaucracy is too dominant and overbearing. Path [B] is shaping the “new normalcy” for the organization. This entails asking key questions and exploring options. Change Management specialist Peter Hadas has posed a set of questions:

  • If we suddenly had to double our capacity, could we do it?”
  • Are our systems well connected to suddenly handle a spike in capacity?”
  • If I had to scale back to just 20% of my employees that I would absolutely need to rebuild my company from scratch, who would that be?”
  • Of the remaining 80% of your staff, who has mission-critical information in their heads that is not written down anywhere?

In his blog Peter cites case studies where a downturn was perceived not as a calamity but an opportunity. In terms of safety, let’s ask:

  • What black elephants will be unleashed when our organization changes to the new normalcy?
  • What existing latent conditions will enable safety or danger to emerge due to a new tipping point in the new normalcy?
  • When we return to work or if we need new recruits, what will be different in our safety induction and orientation programs in the new normalcy?
  • What human-imposed constraints such as safety policies, standards, rules, procedures need to adapt in the new normalcy?

HOW…can we operationalize the What?

Top Management is under fire to demonstrate leadership and take charge in a time of crisis. First step: Stop the bleeding to get out of the Cynefin Chaotic domain. Now what? Craft an idealistic vision of the new normalcy? Direct subordinates to “make it so”? Well, this would be a Complicated domain approach. Since the future is uncertain and unpredictable, developing the new normalcy happens in the Cynefin Complex domain. Instead we manage the evolutionary potential of the Present and shape the new normalcy on a different scaffold. Classical economics? Evonomics? Doughnut Economics? How about Narrative economics?

One of the principles of Safety Differently is: Safety is not defined by the absence of accidents, but by the presence of capacity. Adaptive Safety means building adaptive capacity. When working in the Complex domain, one capacity is possessing useful heuristics to cope with uncertainty. Heuristics are simple, efficient rules we use to form judgments and made decisions. These are mental shortcuts developed on past successful experiences. The upside is they can be hard measures of performance if used correctly (i.e., not abstract and no gamification). The downside is that heuristics can blind us or may not work in novel situations. So don’t get locked into a trusty but rusty heuristic and be willing to adapt.

To operationalize the What, let’s apply 3 simple rules drilled into US soldiers to improve the chances of survival in war: Keep moving, stay in communication, and head to higher ground.

Keep moving

Okay, we’ve stopped the bleeding. We, however, can’t sit still hoping to wait until things settle out. Nor go into paralysis analysis looking at a multitude of options. We don’t want to be an easy target or prey for a competitive predator. Let’s speedily move into the Complex domain and head towards this new normalcy.

Before we move, we need a couple of tools – a compass and a map to plot our path [B]. As shown by Palladio’s Amber Compass, a complexity thinking compass is different. In our backpack will be tools to probe the system (mine detector?), launch experiments (flares?), and that map not only to plot our direction but monitor our progress.

In the military world, each soldier on the battlefield is equipped with a GPS device. This capacity enables the Command centre to monitor the movement of troops in real-time on a visual screen. How might we build a similar map of movement towards a new normalcy?

Stay in Communication

In the battle against COVID-19 to stop the bleeding, numerous organizations immediately enacted an age-old heuristic: terminate, layoff, stand down. This action is akin to removing soldiers from the battlefield for financial reasons. The policy is based on the traditional reductionistic paradigm of treating people as replaceable parts of a mechanistic system. It reinforces a Classical Management theory of treating humans as expenses not assets. (Suggestion: Organizations that state Our people are our greatest resource should update to our greatest expendable resource.)

In novel times like today, perhaps it calls for a different heuristic. Denmark as a nation CAS leader decided to “Freeze the Economy.”[8] The Danish government told private companies hit by the effects of the pandemic that it would pay 75% of their employees’ salaries to avoid mass layoffs. The idea is to enable companies to preserve their relationship with their workers. It’s going to be harder to have a strong recovery if companies have to spend time hiring back workers that have been fired. Other countries like Canada [9] and the US [10] are following Denmark’s lead and launching modified experiments. From a complexity perspective, this is a holistic paradigm where the whole is greater than the sum of its parts.

No matter what employee policy has been invoked, cutting payroll costs does not necessarily mean disengaging. Along the same lines, practicing social distancing does not mean social disconnecting. One can stay in communication. But let’s communicate differently. Ron Gantt wrote that it’s time for an anthropological approach. We agree. Let’s become physically-distanced storytellers and ethnographers.

Cognitive Edge is offering the adaptive capacity of SenseMaker® for COVID-19  to collect real-life stories from employees working remotely plus temporarily removed from the battlefield. It’s an opportunity to show empathy, sense wellbeing, and build early detection capability [C]. Most of all, we can use stories to generate our map of movement towards a new normalcy.

Head to higher ground

This is a story-generated 2D contour map from Safety Pulse. Each dot is a story that can be read by clicking on it. The red X marks a large cluster of undesirable stories – rules are being bent to get a low amount of work done. In our military analogy, we have soldiers on a hill but it’s the wrong high ground.

For illustrative purposes, the higher ground or new normalcy is the green checkmark where quality work is being completed on-time, on-budget, and within the safety rules. Thanks to the map, we now have our compass heading. The question is how do we make it easy to head to the higher ground? In other words, how might we get fewer stories at red X and more stories like green checkmark?

The challenge is the ground to be traversed is an entanglement of work-as-imagined safety policies, standards, rules, procedures and work-as-done practices. W-A-I is created by the Blunt end and W-A-D by the Sharp end of the safety spear. Another twisted layer is the formal vertical hierarchy and informal horizontal networks. Entanglement implies that any new normalcy thinking ought to include everyone and shines a different light on Diversity.

Heading in a Top-down only route to a new normalcy could cause inadvertent harm since it is clueless on how work really gets done in the organization. Read Peter’s firsthand experience of a company letting go of people with mission-critical tacit knowledge. Similarly a Bottom-up only route may fail to consider PESTLE tensions that impact the entire safety spear.

Shaping the new normalcy is not a change initiative with an authoritative few making a huge Ego bet. Think carefully about consultants recommending a huge disruptive transformation change. People are in a fragile state trying to personally survive COVID-19. Think the opposite. It’s Eco and empowering all in the CAS. Think not about doing it to people but doing it with people.

The Executive’s role is to mobilize diverse learning teams and act as central coordinator. These would not be large teams subjected to Order system linear forming -> storming ->norming ->performing crapola. Teams would be 3 people (trio) with the capacity of diverse knowledge and empowered to make decisions within their coherent unit. A cardinal rule would be a trio must inform Central what decisions they have made. Diversity is not necessarily the typical demographic lines of gender, age, geographic location but body of knowledge and informal network relationships. For example, a trio could be comprised of a senior executive, a new employee fresh out of college, and a front-line tradesperson. Another could be an IT specialist, a mechanic, and a public relations coordinator.

Each trio would have the skillset to design safe-to-fail experiments. But before launching, they would engage the Unintended Consequences trio. Good candidates for the UC trio would be the CFO, a risk analyst, and the  person with the reputation of being the noisiest complainer in the organization. Using a process like Ritual Dissent or Red Teaming, the UC trio has the task of pointing out the risks and any harm that a safe-to-fail experiment might unleash. Their role isn’t to kill but improve the proposal.

Central would have the capacity to navigate using a near real-time dashboard with maps generated by stories. All trios would have access to the maps to learn how their experiments are enabling desirable aspects of a new normalcy to emerge.

Informed by authentic voices (i.e, click on a map dot and read its story), Central would make the strategic choice what will be in or out of the new normalcy. Trios or more probably combinations would form innovation teams to implement.

Innovation teams would carry on resilient path [B] and move into the Complicated domain and apply project management practices. Key activities would be document, adapt constraints (i.e., policies, standards, rules, procedures), and train the workforce in the patterns of the new normalcy.

The continuous flow of stories is required for the dashboard and maps to maintain their near real-time value to manage the PM portfolio.  Establishing a human sensor network would also fuel the Early Detection capability [C]. Imagine being able to respond to “I’ve got a bad feeling about this” attitudinal stories well before they turn into “We wouldn’t be in this mess if someone had listened to me.”

In the new normalcy, everything would be up for scrutiny by trios including venerable best practices, sacred cows and of course, black elephants. Small, simple changes such as replacing weekly reports and entering stories whenever (24/7/365) and wherever (office/field/home) can go a long way.

Act now. Act quickly. Act differently.

References:

  1. The Black Elephant Challenge for Governments. Peter Ho, former head of civil service for the city of Singapore. 2017.
  2. Pandemics that Changed the Course of Human History. Business Insider. 2020-Mar-20.
  3. The next outbreak? We’re not ready. Bill Gates. TED Talk. 2015-04-03.
  4. Public-private cooperation for pandemic preparedness and response. Event 201 recommendations. 2019-10.
  5. China Created a Fail-Safe System to Track Contagions. It Failed. The New York Times. 2020-03-29.
  6. ‘Corona Challenge’: Man Tests Positive For COVID-19 Days After Licking Toilet Bowl. 2020-03-26.
  7. Florida college students test positive for coronavirus after going on spring break. CBS News. 2020-03-23.
  8. Denmark’s Idea Could Help the World Avoid a Great Depression. The Atlantic. 2020-03-21.
  9. Trudeau promises 75% wage subsidy for businesses hit by coronavirus pandemic. Global News. 2020-03-27.
  10. The government will now pay businesses to keep workers on payrolls (and hire back ones they laid off). Fast Company. 2020-04-02.

Evolution of Safety

Yesterday I was pleased to speak at the Canadian Society of Safety Engineering (CSSE)  Fraser Valley branch dinner.  I chose to change the title from the Future of to the Evolution of Safety.  Slides are available in the Downloads or click here.  The key messages in the four takeaways are listed below.

1. Treat workers not as problems to be managed but solutions to be harnessed.

Many systems have been designed with the expectation  humans will perform perfectly like machines. It’s a consequence of the Systems Thinking era based on an Engineering paradigm. Because humans are error prone, we must be managed so that we don’t mess up the ideal flow of processes using technologies we are trained to operate.

Human & Organizational Performance (HOP) Principle #1 acknowledges people are fallible. Even the best will make mistakes. Despite the perception humans are the “weakest link in the chain”,  harnessing our human intelligence will be critical for system resilience, the capacity to either detect or quickly recover from negative surprises.

As noted in the MIT Technology Review, “we’re seeing the rise of machines with agency, machines that are actors making decisions and taking actions autonomously…” That means things are going to get a lot more complex with machines driven by artificial intelligence algorithms. Smart devices behaving in isolation will create conflicting conditions that enable danger to emerge. Failure will occur when a tipping point is passed.

MIT Professor Nancy Leveson believes technology has advanced to such a point that the routine problem-solving methods engineers had long relied upon no longer suffice.  As complexity increases within a system, linear root cause analysis approaches lose their effectiveness. Things can go catastrophically wrong even when every individual component is working precisely as its designers imagined. “It’s a matter of unsafe interactions among components,” she says. “We need stronger tools to keep up with the amount of complexity we want to build into our systems.” Leveson developed her insights into an approach called system theoretic process analysis (STPA), which has rapidly spread through private industries and the military. It would be prudent for Boeing to apply STPA in its 737 Max 8 investigation. 

So why is it imperative that workers be seen as resourceful solutions?  Because complex systems will require controls that use the  immense power of the human brain to quickly recognize hazard patterns, make sense of bad situations created by ill-behaving machines, and  swiftly apply heuristics to prevent plunging into the Cynefin Chaotic domain.

2. When investigating, focus on the learning gap between normal deviation / hazard and avoid the blaming counterfactual.

If you read or hear someone say:
“they shouldn’t have…”
“they could have…”
“they failed to…”
“if only they had…”
it’s a counterfactual. In safety, counterfactuals are huge distractions because they focus what didn’t happen. As Todd Conklin explains, it’s the gap between the  black line (work-as-imagined) and the blue line (work-as-done). The wavy blue line indicates that a worker must adapt performance in response to varying conditions. The changes hopefully enable safety to emerge so that the job can successfully completed. In the Safety-II view, this is deemed normal deviation. Our attention should not be on “what if” but “what” did.

The counterfactual provides an easy path for assigning blame. “If only Jose had done it this way, then the accident wouldn’t have happened.”  Note to safety professionals engaged in accident investigations: Don’t give decision makers bullets to blame but information to learn. The learning from failure lessons are in the gap between the blue line and the hazard line.

3. Be a storylistener and ask storytellers:
How can we get more safety stories like these, fewer stories like those?

I described the ability to generate 2D contour maps from safety stories told by the workforce.  The WOW factor is we now can visually see safety culture as an attitudinal map. We can plot a direction towards a safety vision and monitor our progress.  Click here for more details.

Stories are powerful. Giving the worker a voice to be heard is an effective form of employee engagement. How safety professionals use the map to solve safety issues is another matter. Will it be Ego or Eco? It depends. Ego says I must find the answer. Eco says we can find the answer.

Ego thrives in hierarchy, an organizational  structure adopted from the Church and Military. It works in the Order system, the Obvious and Complicated domains of the Cynefin Framework. Just do it. Or get a bunch of experts together and direct them to come up with viable options. Then make your choice.

Safety culture resides in the Cynefin Complex domain. No one person is in charge. Culture emerges from the relationships and interactions between people, ideas, events, and as noted above, machines driven by AI algorithms. Eco thrives on diversity, collaboration, and co-evolution of the system.

An emerging role for safety professionals is helping Ego-driven decision makers understand they cannot control human behaviour in a complex adaptive system. What they control are the system constraints imposed as safety policies, standards, rules. They also set direction when expressing they want to hear “more safety stories like these, fewer stories like those.”

And less we forget, it’s not all about the workers at the front line. Decision makers and safety professionals are also storytellers. What safety stories are you willing to share? Where would your stories appear as dots on the safety culture map?

Better to be a chef and not a recipe follower.

If Safety had a cookbook, it would be full of Safety Science recipes and an accumulation of hints and tips gathered over a century of practice. It would be a mix of still useful, questionable (pseudoscience), emerging, and recipes given myth status by Carsten Busch.

In the Cynefin Complex and Chaotic domains, there are no recipes to follow. So we rely on heuristics to make decisions. Some are intuitive and based on past successes – “It’s worked before so I’ll do it again.” Until they don’t because conditions that existed in the past no longer hold true. More resilient heuristics are backed by natural science laws and principles so they withstand the test of time.

By knowing  the art and principles of cooking, a chef accepts the challenge of ambiguity and can adapt to unanticipated conditions such as missing ingredients, wrong equipment, last-minute diet restrictions, and so on.

It seems logical that safety professionals would want to be chefs. That’s why I’m curious in the study An ethnography of the safety professional’s dilemma: Safety work or the safety of work?  a highlight is “Safety professionals do not leverage safety science to inform their practice.”
Is it worth having a conversation about, even collecting a few stories? Or are safety professionals too time-strapped doing safety work?

Danger was the safest thing in the world if you went about it right

I am now contributing safety thoughts and ideas on safetydifferently.com. Here is a reprint of my initial posting. If you wish to add a comment, I suggest you first read the other comments at safetydifferently.com and then includes yours at the end to join the conversation.

Danger was the safest thing in the world if you went about it right

This seemingly paradoxical statement was penned by Annie Dillard. She isn’t a safety professional nor a line manager steeped in safety experiences. Annie is a writer who in her book The Writing Life became fascinated by a stunt pilot, Dave Rahm.

“The air show announcer hushed. He had been squawking all day, and now he quit. The crowd stilled. Even the children watched dumbstruck as the slow, black biplane buzzed its way around the air. Rahm made beauty with his whole body; it was pure pattern, and you could watch it happen. The plane moved every way a line can move, and it controlled three dimensions, so the line carved massive and subtle slits in the air like sculptures. The plane looped the loop, seeming to arch its back like a gymnast; it stalled, dropped, and spun out of it climbing; it spiraled and knifed west on one side’s wings and back east on another; it turned cartwheels, which must be physically impossible; it played with its own line like a cat with yarn.”

When Rahm wasn’t entertaining the audience on the ground, he was entertaining students as a geology professor at Western Washington State College. His fame to “do it right “ in aerobatics led to King Hussein recruiting him to teach the art and science to the Royal Jordanian stunt flying team. While in Jordan performing a maneuver, Rahm in his plane plummeted to the ground and burst into flames. The royal family and Rahm’s wife and son were watching. Dave Rahm was instantly killed.

After years and years of doing it right, something went wrong for Dave Rahm. How could have this happen? How can danger be the safest thing? Let’s turn our attention towards Resilience Engineering and the concept of Emergent Systems. By viewing Safety as an emergent property of a complex adaptive system, Dillard’s statement begins to make sense.

Clearly a stunt pilots pushes the envelope by taking calculated risks. He gets the job done which is to thrill the audience below. Rahm’s maneuver called “headache” was startling as the plane stalled and spun towards earth seemingly out of control. He then adjusted his performance to varying conditions to bring the plane safely under control. He wasn’t pre-occupied with what to avoid and what not to do. He knew in his mind what was the right thing to do.

Operating pointWe can apply Richard Cook’s modified Rasmussen diagram to characterize this deliberate moving the operating point towards failure but taking action to pull back from the edge of failure. As the op point moves closer to failure, conditions change enabling danger as a system property to emerge. To Annie Dillard this aggressive head into, pulling back action was how Danger was the safest thing in the world if you went about it right.

“Rahm did everything his plane could do: tailspins, four-point rolls, flat spins, figure 8’s, snap rolls, and hammerheads. He did pirouettes on the plane’s tail. The other pilots could do these stunts, too, skillfully, one at a time. But Rahm used the plane inexhaustibly, like a brush marking thin air.”

The job was to thrill people with acts that appeared dangerous. And show after show Dave Rahm pleased the crowd and got the job done. However, on his fatal ride, Rahm and his plane somehow reached a non-linear complexity phenomenon called the tipping point, a point of no return, and sadly paid the final price.

Have you encountered workers who behave like stunt pilots? A stunt pilot will take risks and fly as close to the edge as possible. If you were responsible for their safety or a consultant asked to make recommendations, what would you do? Would you issue a “cease and desist” safety bulletin? Add a new “safety first…”rule to remove any glimmers of workplace creativity? Order more compliance checking and inspections? Offer whistle-blowing protection? Punish stunt pilots?

On the other hand, you could appreciate a worker’s willingness to take risks, to adjust performance when faced with unexpected variations in everyday work. You could treat a completed task as a learning experience and encourage the worker to share her story. By showing Richard Cook’s video you could make stunt pilots very aware of the complacency zone and over time, how one can drift into failure. This could lead to an engaging conversation about at-risk vs. reckless behaviour.

How would you deal with workers who act as stunt pilots? Command & control? Educate & empower? Would you do either/or? Or do both/and?

When thinking of Safety, think of coffee aroma

CoffeeSafety has always been a hard sell to management and to front-line workers because, as Karl Weick put forward, Safety is a dynamic non-event. Non-events are taken for granted. When people see nothing, they presume that nothing is happening and that nothing will continue to happen if they continue to act as before.

I’m now looking at Safety from a complexity science perspective as something that emerges when system agents interact. An example is aroma emerging when hot water interacts with dry coffee grinds. Emergence is a real world phenomenon that System Thinking does not address.

Safety-I and Safety-II do not create safety but provide the conditions for Safety to dynamically emerge. But as a non-event, it’s invisible and people see nothing. Just as safety can emerge, so can danger as an invisible non-event. What we see is failure (e.g., accident, injury, fatality) when the tipping point is reached. We can also reach a tipping point when we do much of a good thing. Safety rules are valuable but if a worker is overwhelmed by too many, danger in terms of confusion, distraction can emerge.

I see great promise in advancing the Safety-II paradigm to understand what are the right things people should be doing under varying conditions to enable safety to emerge.

For further insights into Safety-II, I suggest reading Steven Shorrock’s posting What Safety-II isn’t on Safetydifferently.com. Below are my additional comments under each point made by Steven with a tie to complexity science. Thanks, Steven.

Safety-II isn’t about looking only at success or the positive
Looking at the whole distribution and all possible outcomes means recognizing there is a linear Gaussian and a non-linear Pareto world. The latter is where Black Swans and natural disasters unexpectedly emerge.

Safety-II isn’t a fad
Not all Safety-I foundations are based on science. As Fred Manuelle has proven, Heinrich’s Law is a myth. John Burnham’s book Accident Prone offers a historical rise and fall of the accident proneness concept. We could call them fads but it’s difficult to since they have been blindly accepted for so long.

This year marks the 30th anniversary of the Santa Fe Institute where Complexity science was born. At the May 2012 Resilience Lab I attended, Erik Hollnagel and Richard Cook introduced the RMLA elements of Resilience engineering: Respond, Monitor, Learn, Anticipate. They fit with Cognitive-Edge’s complexity view of Resilience: Fast recovery (R), Rapid exploitation (M,L), Early detection (A). This alignment had led to one way to operationalize Safety-II.

Safety-II isn’t ‘just theory’
As a pragmatist, I tend to not use the word “theory” in my conversations. Praxis is more important to me instead of spewing theoretical ideas. When dealing with complexity, the traditional Scientific Method doesn’t work. It’s not deductive nor inductive reasoning but abductive. This is the logic of hunches based on past experiences  and making sense of the real world.

Safety-II isn’t the end of Safety-I
The focus of Safety-I is on robust rules, processes, systems, equipment, materials, etc. to prevent a failure from occurring. Nothing wrong with that. Safety-II asks what can we do to recover when failure does occur plus what can we do to anticipate when failure might happen.

Resilience can be more than just bouncing back. Why return to the same place only to be hit again? Early exploitation means finding a better place to bounce to. We call it “swarming” or Serendipity if an opportunity unexpectedly arises.

Safety-II isn’t about ‘best practice’
“Best” practice does exist but only in the Obvious domain of the Cynefin Framework. It’s the domain of intuition and the Thinking Fast in Daniel Kahneman’s book Thinking Fast and Slow. What’s the caveat with best practices? There’s no feedback loop. So people just carry on as they did before.  Some best practices become good habits. On the other hand, danger can emerge from the baddies and one will drift into failure.

Safety-II and Resilience is about catching yourself before drifting into failure. Being alert to detect weak signals (e.g., surprising behaviours, strange noises, unsettling rumours) and having physical systems and people networks in place to trigger anticipatory awareness.

Safety-II isn’t what ‘we already do’
“Oh, yes, we already do that!” is typically expressed by an expert. It might be a company’s line manager or a safety professional. There’s minimal value challenging the response.  You could execute an “expert entrainment breaking” strategy. The preferred alternative? Follow what John Kay describes in his book Obliquity: Why Our Goals are Best Achieved Indirectly.

Don’t even start by saying “Safety-II”. Begin by gathering stories and making sense of how things get done and why things are done a particular way. Note the stories about doing things the right way. Chances are pretty high most stories will be around Safety-I. There’s your data, your evidence that either validates or disproves “we already do”. Tough for an expert to refute.

Safety-II isn’t ‘them and us’
It’s not them/us, nor either/or, but both/and.  Safety-I+Safety-II. It’s Robustness + Resilience together.  We want to analyze all of the data available, when things go wrong and when things go right.

The evolution of safety can be characterized by a series of overlapping life cycle paradigms. The first paradigm was Scientific Management followed by the rise of Systems Thinking in the 1980s. Today Cognition & Complexity are at the forefront. By honouring the Past, we learn in the Present. We keep the best things from the previous paradigms and let go of the proven myths and fallacies.

Safety-II isn’t just about safety
Drinking a cup of coffee should be a total experience, not just tasting of the liquid. It includes smelling the aroma, seeing the Barista’s carefully crafted cream design, hearing the first slurp (okay, I confess.) Safety should also be a total experience.

Safety can emerge from efficient as well as effective conditions.  Experienced workers know that a well-oiled, smoothly running machine is low risk and safe. However, they constantly monitor by watching gauges, listening for strange noises, and so on. These are efficient conditions – known minimums, maximums, and optimums that enable safety to emerge. We do things right.

When conditions involve unknowns, unknowables, and unimaginables, the shift is to effectiveness. We do the right things. But what are these right things?

It’s about being in the emerging Present and not worrying about some distant idealistic Future. It’s about engaging the entire workforce (i.e., wisdom of crowds) so no hard selling or buying-in is necessary.  It’s about introducing catalysts to reveal new work patterns.  It’s about conducting small “safe-to-fail” experiments to  shift the safety culture. It’s about the quick implementation of safety solutions that people want now.

Signing off and heading to Starbucks.

Safety-I + Safety-II

At a July 03 hosted conference Dave Snowden and Erik Hollnagel shared their thoughts about safety. Dave’s retrospects of their meeting are captured in his blog posting. Over the next few blogs I’ll be adding my reflections as a co-developer of Cognitive-Edge’s Creating and Leading a Resilient Safety Culture course.

Erik introduced Safety-II to the audience, a concept based on an understanding of what work actually is, rather than what it is imagined to be. It involves placing more focus on the everyday events when things go right rather than on errors, incidents, accidents when things go wrong. Today’s dominating safety paradigm is based on the “Theory of Error”. While Safety-I thinking has advanced safety tremendously, its effectiveness is waning and is now on the downside of the S-curve. Erik’s message is that we need to escape and move to a different view based on the “Theory of Action”.

Erik isn’t alone. Sidney Dekker’s latest presentation on the history of safety reinforces how little safety thinking has changed and how we are plateauing. Current programs such as Hearts & Minds continue to assume people have physical, mental, and moral shortcomings as was done way back in the early 1900s.

Dave spoke about Resilience and why the is critical as its the outliers where you find threat and opportunity. In our CE safety course, we refer to the Safety-I events that help prevent things from going wrong as Robustness. This isn’t an Either/Or situation but a Both/And. You need both Robustness + Resilience.

As a young electrical utility engineer, the creator of work-as-imagined, I really wanted feedback but struggled obtaining it. It wasn’t until I developed a rapport with the workers was I able to close the feedback loop to make me a better designer. Looking back I realize how fortunate I was since the crews were in proximity and exchanges were eye-to-eye.

During these debriefs I probably learned more from the “work-as-done” stories. I was told changes were necessary due to something that I had initially missed or overlooked. But more often it was due to an unforeseen situation in the field such as a sudden shift in weather or unexpected interference from other workers at the job site. Crews would make multiple small adjustments to accommodate varying conditions without fuss, bother, and okay, the occasional swear word.

I didn’t know it then but I know now: these were adjustments one learns to anticipate in a complex adaptive system. It was also experiencing Safety-II and Resilience in action in the form of narratives (aka stories).

When a disaster happens, look for the positive

In last month’s blog I discussed Fast Recovery and Swarming as 2 strategies to exit the Chaotic Domain. These are appropriate when looking for a “fast answer”. A 3rd strategy is asking a “slow question.”

Resilience as Cynefin DynamicsWhile the process flow through the Cynefin Framework is similar to Swarming (Strategy B), the key difference is not looking for a quick solution but attempting to understand the behaviour of agents (humans, machines, events, ideas). The focus is on identifying something positive emerging from the disaster, a serendipitous opportunity worth exploiting.

By conducting safe-to-fail experiments, we can probe the system, monitor agent behaviour, and discover emerging patterns that may lead to improvements in culture, system, process, structure.

Occasions can arise when abductive thinking could yield a positive result. In this type of reasoning, we begin with some commonly well known facts that are already accepted and then works towards an explanation. The vernacular would be playing a hunch.

Snowstorm Repairs

In the electric utility business when the “lights go out”, a trouble crew  is mobilized and the emergency restoration process begins. Smart crews are also on the lookout for serendipitous opportunities. One case involved a winter windstorm causing  a tree branch to fall across the live wires. Upon restoration, the crew leader took it upon himself to contact customers affected by the outage to discuss removal of other potentially hazardous branches. The customers were very willing and approved the trimming. The serendipity arose because these very same customers vehemently resisted in the Fall to have their trees trimmed as part of the routine vegetation maintenance program.  The perception held then was that the trees were in full bloom and aesthetically pleasing; the clearance issues were of no concern. Being out of power for a period of time in the cold winter can shift paradigms.

When a disaster happens, will it be fast recovery or swarming?

Last month’s blog was about Act in the Cynefin Framework’s Chaotic domain.  Be aware you cannot remain in the Chaotic domain as long as you want. If you are not proactively trying to get out it, somebody or something else will be taking action as Asiana Airlines learned.

How you decide to Sense and Respond? We can show 2 proactive strategies:

Resilience as Cynefin DynamicsStrategy A is a fast recovery back to the Ordered side. It assumes you know what went wrong and have a solution documented in a disaster plan ready to be executed.

If it’s not clearly understand what caused the problem and/or you don’t have a ready-made solution in place,  then Strategy B is preferred. This is a “swarming” strategy perfected by Mother Nature’s little creatures, in particular, ants.

AntsIf the path to a food supply is unexpectedly blocked, ants don’t stop working and convene a meeting like humans do. There are no boss ants that command and control. Individual ants are empowered to immediately start probing to find a new path to the food target. Not just one ant, but many participate. Once a new path is found, communication is quickly passed along and a new route is established.

This is Resilience – the ability to bounce back after taking a hit. 

When a disaster happens, how fast do you act?

In the Cynefin framework, we place unexpected negative events into the Chaotic domain. The solution methodology is to Act-Sense-Respond. When a disaster produces personal injuries and fatalities, Act is about initially rendering the situation as safe as possible and stabilizing conditions to prevent additional life-threatening events from occurring.

Whenever a disaster happens, we go into “damage control” mode. We think were in control because we determine what information will be released, when and by whom. Distributing information to the right channels is a key action under Act. We try our best to limit the damage not only to our people and equipment but to our brand, reputation, and credibility. In other terms, we attempt to protect our level of trust with customers/clients, media, general public.

In the latter stages of the 20th century, breakthroughs in information technology meant we had to learn how to quickly communicate because news traveled really fast. In today’s 21st century, news can spread even faster, wider, and cheaper by anyone who can tweet, upload a Facebook or Google+ photo, blog, etc. The damage control window has literally shrunk from hours to minutes to seconds.

This month we sadly experienced a tragedy at SFO when Asian Airlines flight 214 crashed. I recently reviewed slides produced by SimpliFlying, an aviation consultancy focused on crisis management. Their 2013 July 06 timeline of events is mind boggling:

11:27am: Plane makes impact at SFO
11.28am: First photo from a Google employee boarding another flight hits Twitter (within 30 secs!)
11.30am: Emergency slides deployed
11.45am: First photo from a passenger posted on Path, Facebook and Twitter
11.56am: Norwegian journalists asks for permission to use photo from first posters. Tons of other requests follow
1.20pm: Boeing issues statement via Twitter
2.04pm: SFO Fire Department speaks to the press
3.00pm: NTSB holds press conference, and keeps updating Twitter with photos
3.39pm: Asiana Airlines statement released
3.40pm: White House releases statement
8.43pm: First Asiana Press release (6.43am Korea time)

Although Asiana Airlines first Facebook update was welcomed, they did not provide regular updates and didn’t bother replying to tweets. Bottom line was their stock price and brand took huge hits. Essentially they were ill prepared to Act properly.

“In the age of the connected traveller, airlines do not have 20 minutes, but rather 20 seconds to respond to a crisis situation. Asiana Airlines clearly was not ready for this situation that ensued online. But each airline and airport needs to build social media into its standard operating procedures for crises management.”

If you encounter a disaster, how fast are you able to act? Does your emergency restoration plan include social media channels? Do you need to rewrite your Business Disaster Recovery SOPs?

If you choose to revisit or rewrite, what paradigm will you be in? If it’s Systems Thinking, your view is to control information. Have little regard for what others say and only release information when you are ready. Like Asiana Airlines.  If you’re in the Complexity & Sense-Making paradigm, you realize you cannot control but only can influence. You join and participate in the connected network that’s already fast at work commenting on your disaster.

That’s Act. How you decide to Sense and Respond will be subsequently covered.