Tuesday, October 22, 2013

Security Threats in Integrated Circuits

By Asif Iqbal, SDM '11


Asif Iqbal, SDM '11
With the ubiquity of embedded processors in almost everything, security has become a matter of grave concern. Digital hardware-software-based platforms are increasingly deployed in military, financial systems, and other critical infrastructures like smart grid, healthcare, public records, etc. These platforms have always been at high risk and have been historically compromised by myriad software and social engineering attacks. Adversaries have been exploiting the Internet and the "connected world" at will. There have been numerous cases and significant published literature showing that creative software techniques can sneak through the crevices of modern software systems. With the current available tools, such software threats are increasing; however, the sophistication of the hackers is on the rise.[i] At the same time, we are well aware of these hacks, and software security is a mature field of study. The figure below shows how software-related hacks have grown in sophistication over the years.

Fig 1. Evolution of cyber-security threats over time
Fig 1. Evolution of cyber-security threats over time[ii]

So, what's the next big thing in cyber security—the ultimate level of sophistication, the unthinkable destructive impact, and the crack in the backbone? The following short excerpt from an article in IEEE Spectrum[iii] builds context for the discussions to follow.

September 2007—Israeli jets bombed a suspected nuclear installation in northeastern Syria. Among the many mysteries still surrounding that strike was the failure of Syrian radar, supposedly state of the art, to warn the Syrian military of the incoming assault. It wasn't long before military and technology bloggers concluded that this was an incident of electronic warfare and not just any kind. Post after post speculated that the commercial off-the-shelf microprocessors in the Syrian radar might have been purposely fabricated with a hidden "back door" inside. By sending a preprogrammed code to those chips, an unknown antagonist had disrupted the chips' function and temporarily blocked the radar.

The above example was a case of an infected integrated circuit (IC) leaking information, a Type II attack that will be discussed later. If we think of the case mentioned above, the damage was the leak of information. However, thinking this through more deeply, it could have easily been a "kill switch" (Type III attack) with the potential to detonate the missile in the carrier jet or a Type IV attack capable of changing the target's location. This is an infection at the most fundamental level, difficult to detect, incurable, and potentially destructive not only to finance and global resources, but also to human life.

Recently there have been numerous media reports that confirm this. For years, fake and infected ICs have been deeply infiltrating military warfare systems. With embedded smart processors handling data of increasing value, such as consumer banking credentials, security of other critical infrastructures is at risk. There are additional case studies noted in the appendix.

In response to this threat, hardware security has started to emerge as an important research topic. In the current literature, the agent for malicious tampering is referred to as a hardware Trojan horse (HTH). An HTH causes an integrated circuit to malfunction to perform some additional malicious functions along with the intended one(s). Conventional design-time verification and post-manufacturing testing cannot readily be extended to detect HTHs due to their stealth nature, inordinately large number of possible instances, and large variety of structures and operating modes.

An HTH can be designed to disable or destroy a system at some future time, or to leak confidential information and secret keys covertly to the adversary[iv]. Trojans can be implemented as hardware modifications to microprocessors, digital signal processors (DSP), application-specific ICs (ASIC) and commercial off-the-shelf (COTS) parts. They can also be implemented as FPGA bit streams[v].

This paper borrows theoretical concepts and design examples from current research literature and my prior experience in circuit design. To build a theoretical context, I will start with the definition of hardware security and explain the intent of a secure hardware design. Building on this concept, I will expose threats posed by HTHs and methods for detecting them. Types of attacks with associated agents will be discussed. In the latter half of this paper, taxonomy is also presented along with design examples for a few classes.

What Is Hardware Security?

In abstract terms, the word "security" can be used to cover several very different underlying features of a design. Every system design will require a different set of security properties, depending on the type and value of the assets or the resource worth protecting; security is about trying to defend against malicious attack.

A property of the system that ensures that resources of value cannot be copied, damaged, or made unavailable to genuine users.

The fundamental security properties on which nearly every higher-level property can be based are those of confidentiality and integrity.

Confidentiality

An asset that is confidential cannot be copied or stolen by a defined set of attacks. This property is essential for assets such as passwords and cryptographic keys.

Integrity

An asset that has its integrity assured is defended against modification by a known set of attacks. This property is essential for some of the on-chip root secrets (keys, encryption algorithms) on which the rest of the system's security is based.

Authenticity

In some circumstances, a design cannot provide integrity and instead provides the property of authenticity. In this case, an attacker can change the value of the asset, but the defender will be able to detect the change (by verifying authenticity) before the chip function is compromised. In some implementations, the chip may cease to function in the event of tampering.

Types of Attacks

IC security issues are mainly attributed or at least traced back to the physical security of the design or manufacturing facilities. Different mechanisms for performing attacks are broken down into four classes: hack attacks, shack attacks, lab attacks, and fab attacks.

Hack Attack

A hack attack is one where the hacker is only capable of executing a software attack. Examples include viruses and malware, which are downloaded to the device via a physical or a wireless connection. In many cases of a successful hack attack, the device user inadvertently approves the installation of the software, which then executes the attack. This is either because the malware pretends to be a piece of software that the user actually wants to install or because the user does not understand the warning messages displayed by the operating environment.

Shack Attack

A shack attack is a low-budget hardware attack using equipment that could be bought from a store like Radio Shack. In this scenario, attackers have physical access to the device, but not enough equipment or expertise to attack within the integrated circuit packages. They can use logic probes and network analyzers to snoop bus lines, pins, and system signals. They may be able to perform simple active hardware attacks, such as forcing pins and bus lines to be at a high or low voltage, reprogramming memory devices, or replacing hardware components with malicious alternatives. Some of the existing IC testability features, such as JTAG debug, boundary scan I/O, and BIST (built-in self-test) facilities, can be used to hack a chip's functional state.

Lab Attack

The lab attack is more comprehensive and invasive. If attackers have access to laboratory equipment, such as electron microscopes, they can perform unlimited reverse engineering of the device. It must be assumed that attackers can reverse engineer transistor-level detail for any sensitive part of the design, including logic and memory. Attackers can reverse engineer a design, attach microscopic logic probes to silicon metal layers, and introduce glitches into a running circuit using lasers or other techniques. They can also monitor analog signals, such as device power usage and electromagnetic emissions, to perform attacks such as cryptographic key analysis.

Fab Attack

A fab attack is the lowest level of attack wherein malicious code is inserted into the net list or layout of an integrated circuit in the foundry or fabrication plant. Circuitry fabricated in the chip cannot be easily detected by chip validation.

Trust in Integrated Circuits

Security in integrated circuit design and manufacture is the final line of defense for securing hardware systems. Because of the fabless business model, third-party IP reuse, and untrusted manufacturing of the semiconductor industry, ICs are becoming increasingly vulnerable to malicious activities and alterations.[vi] [vii] These concerns have caused the Defense Advance Research Projects Agency (DARPA) to initiate the Trust in ICs program.[viii]

An IC product development process contains three major steps and agents: design, fabrication, and test and validation. These steps are pictorially represented below along with their trust levels. An untrusted agent is a potential source of infection. IC security is more of a physical security issue, which can be held in check by tight control and vertical integration over the complete manufacturing process.

Fig 2. Trusted and untrusted components of design and manufacturing chain
Fig 2. Trusted and untrusted components of design and manufacturing chain

Design

Specification

Design starts with specifications wherein alterations can be made to modify the functions and protocols or design constraints. This is considered to be a trusted component and insider attack is very unlikely. From my research to date, no cases have been reported; however, the possibility cannot be negated.

Third-party IPs and Libraries

Due to the ever-increasing complexity of designs and time-to-market constraints, high reuse is prevalent in the IC industry. This includes third-party soft/firm/hard IP blocks, models, and standard cells used by the designer during the design process and by the foundry during the post-design processes. These third-party IPs and libraries are considered untrusted.

CAD Tools

Cadence, Mentor Graphics, Magma, and Synopsys provide the industry-standard CAD tools for design. These tools are considered trusted. However, from my personal experience and interviews, design engineers have been using untrusted third-party TCL[ix] scripts (open source or proprietary) on trusted CAD software for design automation even in big design houses.

Fabrication

Fabrication involves preparing masks and wafers, which is an integrated manufacturing process of oxidation, diffusion, ion implantation, chemical vapor decomposition, metallization, and lithography. In the present context, with fabrication being outsourced to the third-party foundries, trust is in question. The adversary could change the parameters of the manufacturing process, geometries of the mask, or even embed a malicious circuit at the mask layout level. The mask information is contained in an electronic file format called GDS. Entire mask sets may be replaced by replacing the GDS and the adversary could substitute a compromised Trojan IC mask for the genuine one.[x]

Manufacturing Test

In the testing phase, test vectors are applied to the inputs of the manufactured IC, and output ports are monitored for expected behavior. Generally, the automated test equipment fails to detect a Trojan. However, test vectors or automated test equipment can be constructed to mask Trojans. Hence testing would be considered trusted only if it is done in the production test center of the client (semiconductor company or government agency).

Fig.3. Vulnerable steps of modern IC life cycle
Fig.3. Vulnerable steps of modern IC life cycle [Source: R.S. Chakraborty et al.]

Design Abstraction Levels

Trojan circuits can be embedded at various hardware abstraction levels. As we move to a lower abstraction level, the level of sophistication required increases, i.e. it is more difficult to embed a desired malicious functionality into lower levels of abstraction, as compared to higher levels.

The netlist or the gate level of a design is considered to be secure and must not be tampered with by hand. It is interesting to note that changes are made directly in the netlist or gate level at late design stages for legitimate purposes. An experienced engineer can insert a malicious circuit directly in the gate level.

The different levels of abstraction at which design is done and a Trojan may be inserted are listed below.
  • At the system level in different hardware modules and interconnection and communication protocols. This requires a low level of sophistication.
  • At the register transfer level (RTL), a Trojan can be inserted by coding its behavioral description along with the intended functionality of the chip. This is difficult in terms of physical access, but low in complexity of attack.
  • At the gate level a hacker can carefully control all aspects of the inserted Trojan, including size and location. Physical access is difficult and the hack is complicated.
  • At the transistor level, hacks are related to changing circuit parameters to compromise the reliability of the chip and cause ultimate mission mode failure. This is a very sophisticated attack, still in the trusted zone with difficult physical access.
  • At the layout level, hacks are related to foundry attacks and physical access is easier because of the untrusted zone. However, this hack has the highest level of sophistication.

Ensuring Authenticity

There are two main options to ensure that a chip used by a client is authentic, meaning it performs only those functions originally intended and nothing more. They are:
  1. Make the entire fabrication process trusted.
  2. Verify the trustworthiness of manufactured chips upon return to the clients.
While the first option is expensive and nearly impossible considering the current business climate and trends in the global distribution of the IC design and fabrication, the second option requires tightly controlling testing and validation to ensure the chip's conformance with the original functional and performance specifications. Tehranipoor et al.[xi] call this new step silicon design authentication.

Deep Dive into Hardware Trojans

Hardware Trojans are modifications to original circuitry that are inserted by adversaries who have the malicious intent of using hardware or hardware mechanisms to gain access to data or software running on the chips. The example in Figure 4 shows cryptographic hardware with the output bypassed with a simple multiplexer. When the select line is high, the unencrypted input is sent to the output. The multiplexer is the Trojan here, which when activated by a trigger alters the intended functionality and sends the unencrypted data to the adversary.
Fig. 4. A simple Trojan [Source: J Rajendran et al.]
Fig. 4. A simple Trojan [Source: J Rajendran et al.]

An interesting point to note here is that bypass structures like the one in Figure 4 are used routinely in design for debug and design for testability (DFT).[xii] It is very difficult to distinguish such modifications and detect this type of Trojan, which may be disguised as a normal debug function. There are many other characteristics of a hardware Trojan, such as small area and rare trigger, which make it difficult to detect. Hardware Trojan detection is still a fairly new research area, but it has gained significant traction in the past few years.

Difficulty of Detection

Detection of malicious alterations is extremely difficult, for several reasons.
  • Reuse. There is a great deal of third-party soft or hard Internet Protocol (IP) integration in ICs to accelerate the time to market. The IPs are getting increasingly small and detecting a small malicious alteration in a third-party IP is extremely difficult.
  • Small Size. Small, submicron, IC feature sizes make detection by physical inspection and destructive reverse engineering very difficult and costly. Moreover, destructive reverse engineering does not guarantee a comprehensive test, especially when Trojans are dispersed throughout the entire chip.
  • Low Activation Probability: Trojan circuits, by design, are activated under very specific low probability conditions, such as sensing a specific low-frequency toggling design signal or such analog parameters as power or temperature. This makes them unlikely to be activated and detected using random or functional stimuli during limited test times, but more easily triggered during the mission mode.
  • Insufficient Manufacturing Tests. Tests of manufacturing faults, such as stuck-at and delay faults, cannot guarantee detection of Trojans. Such tests are limited by test times, which are typically a few milliseconds per chip. Within this time frame, they cannot activate and detect Trojans. Even when 100 percent fault coverage for all types of manufacturing faults is possible, there are no guarantees as far as Trojans are concerned, since all functional use cases and state vectors are not exercised.
  • Decreasing Physical Geometry: Devices are getting smaller each day because of improvements in lithography. As physical feature sizes decrease, process (PVT) and environmental variations have a greater impact on the integrity of the circuit parameters (voltages, current, power, and I/O delay). This makes parametric detection of Trojans using simple measurement of signals ineffective.

Taxonomy of Trojans

Wang, Tehranipoor, and Plusquellic[xiii] developed a detailed taxonomy for hardware Trojans. Wang et al. suggest three main categories of Trojans according to their physical, activation, and action characteristics. Although Trojans could be hybrids of this classification (for instance, they could have more than one activation characteristic), this taxonomy captures the elemental characteristics of Trojans and is useful for defining and evaluating the capabilities of various detection strategies.

Fig. 5. Detailed taxonomy of hardware Trojans [Source: Wang et al.]
Fig. 5. Detailed taxonomy of hardware Trojans [Source: Wang et al.][xiii]

Physical Characteristics

The physical category describes the various hardware manifestations of Trojans. This type of category partitions Trojans into functional and parametric classes. The functional class includes Trojans that are physically realized through the addition or deletion of transistors or gates, whereas the parametric class refers to Trojans that are realized through modifications of existing wires and logic.

The size category accounts for the number of components in the chip that have been added, deleted, or compromised. The distribution category describes the location of the Trojan in the chip's physical layout. The structure category refers to the case when an adversary is forced to regenerate the layout to insert a Trojan, which could then cause the chip's physical form to change. Such changes could result in different placement for some or all design components. Any malicious changes in physical layout that could change the chip's delay and power characteristics would facilitate Trojan detection.

Trigger Characteristics

Trojans can also be classified based on their activation or trigger characteristics. A Trojan consists of a trigger and a payload. The trigger function causes the payload to be active and carry out its malicious function. Once activated, the Trojan may continue to be in an activated state or return to its base state (one-shot activation). These triggers are further divided into two categories, externally activated and internally triggered.

Externally triggered Trojans require external inputs to act. The external trigger can be an adversary input or a legitimate user input or even a lab component's output. User input triggers may include push buttons, switches, keyboards, or keywords/phrases in the input data stream. An external component trigger could be a signal that is received by an antenna or sensor and triggers a payload inside the circuit. The activation condition could be based on the output of a sensor that monitors temperature, voltage, or any type of external environmental condition (such as electromagnetic interference, humidity, or altitude).

An internally triggered Trojan is activated by an event that occurs within the target device. The event may be either time–based or physical condition–based. Common methods include hardware counters, which can trigger the Trojan at a predetermined time. These are also called time bombs. Triggering circuitry may monitor physical parameters such as temperature and power consumption of the target device. When these parameters reach a predetermined value, they trigger the Trojan. The Trojan in this case is implemented by adding logic gates and/or flip-flops to the chip, and hence is represented as a combinational or sequential circuit. Action characteristics identify the types of disruptive behavior introduced by the Trojan.

"Always On" Trigger

The "always on" trigger keeps the Trojan active, continuously deteriorating the chip's performance. This trigger can disrupt the chip's normal reliability and function at any time. This subclass covers Trojans that are implemented by modifying the chip's geometries such that certain nodes or paths have a higher susceptibility to failure.

Another classification of Trojan based on triggers is done by Chakraborty et al.[xiii] Based on this classification, trigger mechanisms can be of two types: digital and analog.

Fig. 6. Classification of triggers based on digital/analog mechanismsX
Fig. 6. Classification of triggers based on digital/analog mechanisms[xiii]

Analog-triggered Trojans are based on detection methods of chip power or current levels. Digital-triggered Trojans can again be classified into combinational and sequential types. A combinational trigger is a logic function of internal circuit state variables. Typically, an attacker would choose a rare activation condition so that it is very unlikely for the Trojan to trigger during a conventional manufacturing test. On the other hand, sequentially triggered Trojans are activated by the occurrence of a sequence, or a period of continuous operation. The simplest sequential Trojan triggers are synchronous stand-alone counters, which trigger a malfunction on reaching a particular count. In general, detecting sequential Trojans is more difficult because the activation probability is lower due to the content and timing variables. Additionally, the number of such sequential trigger conditions for arbitrary Trojan instances can be insurmountably large for a deterministic logic testing approach, making testing and detection impractical.

Fig. 7. Example of Trojans with trigger mechanisms [Source: R.S Chakraborty et.al]
Fig. 7. Example of Trojans with trigger mechanisms [Source: R.S Chakraborty et.al]
Fig. 7. Example of Trojans with trigger mechanisms [Source: R.S Chakraborty et.al][xiv]

Payload/Effect-based Classification
Payload consists of the circuitry designed for the intended functionality. Payload can characterize a Trojan by the severity of the effect. A Trojan can change the function of the target device and can cause errors that may be difficult to detect in testing but are detrimental in mission mode. Another class of Trojans can change specifications by changing device parameters. They may change the reliability, functional, or parametric specifications (such as power and delay). Trojans can also leak sensitive information through a secret or already existing channel. Information can be leaked by radio frequency, optical and thermal means, and via interfaces such as RS 232 and JTAG. Trojan can also be designed to create backdoor access to assist in software-based attacks like privilege escalation and password theft. Trojans can hog chip resources, including bandwidth, computation, and battery power, causing the chip to malfunction, emulating a denial of service. Some Trojans may physically destroy, disable, or alter the configuration of the device (kill switches).

Another way to categorize Trojans is based on the type of circuitry: digital and analog. Digital Trojans can either affect the logic values at chosen internal nodes, or can modify the contents of memory locations. Analog payload Trojans, on the other hand, affect circuit parameters, such as performance, power, and noise margin. Another form of analog payload would be generation of excess activity in the circuit and accelerating the aging process of an IC and shortening its lifespan. All this happens without affecting the IC functionality.

Current Trojan Detection Methods

Detection of Trojans is extremely difficult for the reasons discussed in the previous sections. It is an important area of research that has led to the development of some Trojan detection methods over the past few years. These are categorized mainly as chip-level solutions and architectural-level Trojan detection solutions.

Chip-level Methods

Power and Current Measurement

Trojans typically change a design's parametric characteristic by, for example, hampering performance, increasing or decreasing power, or causing reliability problems in the chip. Measuring current and voltage can provide information about the internal structure and activities within the IC, enabling detection of Trojans without fully activating them.

A weakness of such methods is that a Trojan can draw only a very small amount of current and that it could be submerged below the noise floor and process variation effects, thus making it undetectable by conventional measurement equipment. However, Trojan detection capability can be greatly enhanced by measuring current locally and from multiple power ports or pads, switching off certain sections of the chip, and thus increasing the small differential of voltage or current with respect to the normal operating parameters.

Timing-based Methods

In timing-based methods, Trojans can be detected by measuring the delays between a circuit's inputs and outputs. Trojans can be detected when one or a group of path delays are extended beyond the threshold determined by the process variations level.

Many different samples from a process lot are checked under the same test patterns and compared. An outlier is a suspect of Trojan infection. This method uses statistical analysis to deal with process variations. However, it is not suitable for today's complex circuits, which contain millions of paths between inputs and output. Measuring all these paths, especially the short ones, is not easy.

Architecture-level Trojan Detection

An attack can occur at different levels of design abstraction, for example at the specification, RTL, gate level, or post-layout level. At the most abstract level, the adversary can access the interpreter and perform software tampering, scan-chain readout, or a fault attack. At the hardware microarchitecture and circuit levels, the attacker takes into account power energy consumption or electromagnetic energy. As we ascend to an upper level of abstraction, the required sophistication of the attacking agent decreases and detectability of the Trojan decreases. This is because the automated synthesis and automated place and route process distribute the logic all over the chip area.

Design for Trust

One approach is to design chips for detectability of any tampering. The CAD and test community has long benefited from Design for Testability (DFT) and Design for Manufacturability (DFM). Design for Trust is another "ility" that is critical for Trojan detection. These design methods, proposed by the hardware security and trust community, improve Trojan detection and isolation by changing or modifying the design flow. They help prevent insertion of Trojans, facilitate easier detection, and provide effective IC authentication.
Some methods are physical-level tamper-proofing techniques, such as placing security parts into special casings with light, temperature, tampering, or motion sensors.

Suh, Deng, and Chan et al.[xv] have proposed a design-level tamper-proofing method. In their paper, they discuss an encryption microarchitecture featuring a high-end secure microprocessor. A secure processor is authenticated by a checksum response to a challenge within a time limit. The unique checksum is based on the cycle-to-cycle activities of the processor's specific internal micro-architectural mechanism. The authors showed that small differences in the crypto-architecture result in significant deviations in the checksum.
The architectural detection methods are specific and have to be built into the design for easy tamper detectability. The chip-level methods are too high-precision and error-prone because it is so difficult to identify a trigger in the presence of chip noise and process variation.

Conclusion

The issue of IC security and effective countermeasures has drawn considerable research interest in recent times. This paper presents a survey of different Trojan types and emerging methods of detection. Analog Trojans present a major future challenge because there are numerous types of activation and observation conditions. Considering the varied nature and types of IC vulnerabilities, a combination of design and test methods would be required to provide an acceptable level of security.

Designs are inherently made secure each day. However, the hacker is always one step ahead!! Engineers are reacting to changing security needs. They are proactively designing in "trust-ability" and making designs more secure, but physical access is something beyond the control of the academic and engineering communities. Businesses have to be aware and procurement policies have to be improved. The threats to IC security are more severe in regards to physical security. Vertical integration of the entire manufacturing chain would bring up trust in the manufacturing process, enabling many Trojans to be controlled.

Appendix: Short Cases of IC Vulnerability[xvi]

The sensitive assets that each market sector tries to protect against attack are diverse. For example, mobile handsets aim to protect the integrity of radio networks, while television set-top boxes prevent unauthorized access to subscription channels. The varied type and value of the assets being protected, combined with the different underlying system implementations, mean that the attacks experienced by each also vary.

Mobile Sector

Two critical parts of a GSM handset are the International Mobile Equipment Identity (IMEI) code, a unique 15-digit code used to identify an individual handset when it connects to the network, and the low-level SIMLock protocol that is used to bind a particular device to SIM cards of a particular network operator.
Both of these components are used to provide a security feature: the IMEI is used to block stolen handsets from accessing a network, and the SIMLock protocol is used to tie the device to the operator for a contract's duration. On many handsets both of these protection mechanisms can be bypassed with little effort, typically using a USB cable and a reprogramming tool running on a desktop workstation.
The result of these insecurities in the implementation is an opportunity for fraud to be committed on such a large scale that statistics reported by Reuters UK suggest it is driving half of all street crime through mobile phone thefts, costing the industry billions of dollars every year.

Security requirements placed on new mobile devices no longer relate only to the network, but also to content and services available on the device. Protection of digital media content through Digital Rights Management (DRM) and protection of confidential user data, such as synchronized email accounts, is becoming critical as both operators and users try to obtain more value from their devices.

Consumer Electronics and Embedded Sector

The requirements placed on consumer electronics, such as portable game consoles and home movie players, are converging with those seen in the mobile market. Increasing wired and wireless connectivity, greater storage of user data, dynamic download of programmable content, and handling of higher value services all suggest the need for a high-performance and robust security environment.

Security attacks are not limited to open systems with user-extensible software stacks. Within the automotive market most systems are closed or deeply embedded, yet odometer fraud, in which the mileage reading is rolled back to inflate the price of a secondhand vehicle, is still prevalent. The US Department of Transportation reports that this fraud alone costs American consumers hundreds of millions of dollars every year in inflated vehicle prices.

Security features typically encountered in these embedded systems are those that verify that firmware updates are authentic and those that ensure that debug mechanisms cannot be used maliciously.

Notes


iCyber Security in Federal Government, Booz Allen Hamilton
iiSource: Booz Allen Hamilton. www.boozallen.com
iii"The Hunt for the Kill Switch," IEEE Spectrum, May 2008
iv"The Hunt for the Kill Switch," IEEE Spectrum, May 2008
vAn FPGA, or field-programmable gate array, is a general-purpose programmable chip with logic blocks and programmable interconnections. FPGA often replace application-specific ICs for small-volume applications. A bit stream is the interconnection information between the logic elements of the FPGA. A bit stream defines the function of the FPGA.
viReport of the Defense Science Board Task Force on High Performance Microchip Supply, Defense Science Board, US Department of Defense, February 2005; http://www.acq.osd.mil/dsb/reports/2005-02-HPMS_Report_Final.pdf.
viiInnovation at Risk: Intellectual Property Challenges and Opportunities, white paper, Semiconductor Equipment and Materials International, June 2008.
viiihttp://www.darpa.mil/Our_Work/MTO/Programs/Trusted_Integrated_Circuits_(TRUST).aspx
ixTool control language: Standard CAD tools support a common tool control language for automating design flows and batch mode jobs
x"The Hunt for the Kill Switch," IEEE Spectrum, May 2008
xiTowards a Comprehensive and Systematic Classification of Hardware Trojans, J Rajendran et al.
xiihttp://larc.ee.nthu.edu.tw/~cww/n/625/6251/05DFT0603.pdf
xiiiX. Wang, M. Tehranipoor, and J. Plusquellic, "Detecting Malicious Inclusions in Secure Hardware: Challenges and Solutions," Proc. IEEE Int'l Workshop Hardware-Oriented Security and Trust (HOST 08), IEEE CS Press, 2008, pp. 15-19
xivHardware Trojan: Threats and Emerging Solutions, Rajat Subhra Chakraborty et al.
xvG.E. Suh, D. Deng, and A. Chan, "Hardware Authentication Leveraging Performance Limits in Detailed Simulations and Emulations," Proc. 46th Design Automation Conf. (DAC 09), ACM Press, 2009, pp. 682-687.
xviSource: Building a Secure System Using TrustZone™ Technology, ARM Technologies white paper
Asif Iqbal, SDM '11

Friday, October 4, 2013

Understanding Patient Wait Times at the LV Prasad Eye Institute

By Ali Kamil, SDM '12, and Dmitriy Lyan, SDM '11 
 

The challenge presented in this project was to reduce patient wait times and variability at LV Prasad Eye Institute (LVPEI) in Hyderabad, India. Since its inception, LVPEI has served more than 15 million patients, of which more than 50 percent were served at no charge. Each outpatient department (OPD) clinic sees 65 to 120 patients in a given day, with the average wait time ranging from 45 minutes to 6 hours. This variability in service time and associated long delays is a source of angst for patients, stress for hospital staff—who consistently work overtime, and damage to the reputation of the clinic in the region (see Figure 1). The MIT Sloan team was tasked with applying management and engineering principles to investigate the source of the variability and delays at LVPEI.
Figure 1. Service time variability at LVPEI.
Figure 1. Service time variability at LVPEI.

The process

To understand the problem holistically, the team attempted to build a reference model of the problem experienced at LVPEI. From January through March 2013, the team:
  • Communicated with the leads from LVPEI's clinical and administrative operations staff;
  • Conducted interviews with key stakeholders to understand patient flow dynamics; and
  • Focused on qualitative metrics, due to constraints in accessing actual data points.
To identify existing best practices in managing patient flows and reducing variability, the team also conducted research at Boston-area eye clinics—Massachusetts General Hospital, Massachusetts Eye and Ear Hospital, and Mount Auburn Hospital.

The team traveled to Hyderabad, India, in March 2013 to conduct on-the-ground research and collect quantitative metrics for patient service and wait times. Operating from the hospital, the team:
  • Conducted time and motion studies in four of LVPEI's OPD clinics, including two cornea and two retina clinics;
  • Collected time stamps as patients and corresponding medical folders moved through the clinics;
  • Interviewed stakeholders, including faculty ophthalmologists in each of the studied clinics, administrators who oversee appointment scheduling and resource allocation, and operations professors from the Indian School of Business in Hyderabad, to understand their prior work on patient wait time trends at LVPEI;
  • Conducted patient surveys at walk-in counters to understand the motivation for choosing the walk-in option, and surveyed patients at the checkout counter to gauge patient satisfaction levels and concerns about their LVPEI experiences;
  • Constructed a system dynamics model—based on the qualitative data gathered from numerous interviews and observations—that reflects the core structure of LVPEI OPD operations and simulates patient flow in a given day; the model was then validated by key stakeholders and calibrated to the data collected on site (see Figure 2); and
  • Worked with key stakeholders to validate and calibrate the data collected on site.
Figure 2. Patient arrivals by time of day.
Figure 2. Patient arrivals by time of day.
Figure 3. Patient's adherence to appointments.
Figure 3. Patient's adherence to appointments.

The findings

Based on our work on the ground and subsequent application of system dynamics to determine the cause for variability and long service times, we showed that:
  • Given a fixed OPD capacity, patient wait times are largely a function of service demand, scheduling, and resource-specific factors;
  • Demand and scheduling factors include the complexity of patient cases, their volume, and the way they are scheduled in a given day; factors impacting resource allocation and utilization include patient workup time, patient investigation time, and the operating hours of the OPD clinic;
  • To accommodate larger daily volumes of patients, providers reduce the time they spend with each patient, thereby undermining the quality of care provided and increasing the likelihood of medical errors; and
  • Walk-in patients are the source of variability in the system and cause the established schedule at LVPEI to deviate.
Given the fixed OPD capacity and service staff, we recommended that LVPEI consider allocating blocks of time in the day dedicated specifically for walk-in patients and follow-up patients. Increasing awareness and enforcing adherence to an appointment-based scheduling system will enable predictable patient wait and service times.

Next steps

Further analysis is needed to study the relationship between the volume of patients, the number of incorrect diagnoses, and the number of patients that return to the clinic to receive additional treatment as a result of error. The team is continuing its work with LVPEI to obtain additional data on patient check-in and checkout times. Additionally, the team is working to make the system dynamics model robust under extreme scenarios and able to delineate among patient types—i.e. walk-in, appointment-based, or follow-up patients.

About the Authors

Ali Kamil is a graduate student at the MIT Sloan School of Management and the Harvard Kennedy School of Government. His research focuses on understanding managerial and organizational effectiveness in low-resource settings—specifically developing and emerging markets. His expertise lies in employing system dynamics–based modeling and tools to simulate complex operations and devise effective policy measures. Prior to MIT, Kamil was an engagement manager at Deloitte Consulting LLP, where he advised leading media, entertainment, and telecom clients in matters of competitive strategy, operations, and technology implementation/outsourcing. He holds a B.S. in computer science and economics from the Georgia Institute of Technology.

Dmitriy Lyan is a senior product manager of technical product at Amazon. He is a graduate of the MIT System Design and Management program, where he specialized in the development of performance management systems for shared value-focused organizations. In his thesis work, Lyan applied system dynamics methodology to explore performance dynamics in US military behavioral health clinics. Prior to MIT, he worked in the investment management and software development industries. He holds an M.S. in financial engineering from Claremont Graduate University/Peter F. Drucker School of Management and a B.S. in computer engineering from the University of California, San Diego.

Thursday, October 3, 2013

Supply Chain and Risk Management

Making the Right Risk Decisions to Strengthen Operations Performance

By Ioannis Kyratzoglou

This study analyzes the supply chain operations and risk management approaches of large companies and examines their operations and financial performance in the face of supply chain disruptions. It proposes a framework and a set of principles to help companies manage today's risk challenges and prepare for future opportunities. Using this framework, business leaders can increase their awareness of where their companies and their competitors stand.

Contents

Executive Summary

The MIT/PricewaterhouseCoopers Global Supply Chain and Risk Management Survey is a study of the supply chain operations and risk management approaches of 209 companies with global footprints. As globally operating organizations, they are exposed to high-risk scenarios ranging from controllable risks—such as raw material price fluctuation, currency fluctuations, market changes, or fuel price volatility—to uncontrollable ones such as natural disasters.

The findings validate five key principles that companies can learn from to better manage today's risk challenges to their supply chains and prepare for future opportunities.
  1. Supply chain disruptions have a significant impact on company business and financial performance.
  2. Companies with mature supply chain and risk management capabilities are more resilient to supply chain disruptions. They are impacted less and they recover faster than companies with immature capabilities.
  3. Mature companies investing in supply chain flexibility are more resilient to disruptions than mature companies that do not invest in supply chain flexibility.
  4. Mature companies investing in risk segmentation are more resilient to disruptions than mature companies that do not invest in risk segmentation.
  5. Companies with mature capabilities in supply chain and risk management do better along all surveyed dimensions of operational and financial performance than immature companies.
"Capability maturity," as referred to above, was determined using our supply chain and risk management capability maturity framework. This framework assesses the degree to which companies are applying the most effective enablers of supply chain risk reduction (e.g., flexibility, risk governance, alignment, integration, information sharing, data, models and analytics, and rationalization) and their associated processes. The model depicts where a company stands in relation to its competition and the rest of the industry.

According to the survey results, as many as 60 percent of the companies pay only marginal attention to risk reduction processes. These companies are categorized as having immature risk processes. They mitigate risk by either increasing capacity or strategically positioning additional inventory. This is not a surprise as the survey also shows that most of these companies are focused either on maximizing profit, minimizing costs, or maintaining service levels.

The remaining 40 percent do invest in developing advanced risk reduction capabilities and are classified as having mature processes. Our research validated that companies with mature risk processes perform operationally and financially better—something for CEOs and CFOs to note. Indeed, managing supply chain risk is good for all parts of the business—product design, development, operations, and sales. Using the capability maturity model, companies can benchmark their ability to respond to risks and then increase their capability maturity to gain competitive advantage.

When Mature Risk Management and Operational Resilience Pay Off

On March 11, 2011[1], Nissan Motor Company Ltd. and its suppliers experienced a 9.0-magnitude earthquake as it struck off the east coast of Japan. The quake was among the five most powerful earthquakes on record. Tsunami waves in excess of 40 meters traveled up to 10 km inland, causing a "Level 7" meltdown at three nuclear reactors at Fukushima Daiichi. The impact of this disaster was devastating: 25,000 people died, went missing, or were injured; 125,000 buildings were damaged; and economic losses were estimated at $200 billion.

In the weeks following the catastrophic earthquake, 80 percent of the automotive plants in Japan suspended production. Nissan's production capacity was perceived to have suffered most from the disaster compared to its competitors. Six production facilities and 50 of the firm's critical suppliers suffered severe damage. The result was a loss of production capacity equivalent to approximately 270,000 automobiles.
Despite this devastation, Nissan's recovery was remarkable. During the next six months, Nissan's production in Japan decreased by only 3.8 percent compared to an industrywide decrease of 24.8 percent. Nissan ended 2011 with an increase in production of 9.3 percent compared to a reduction of 9.3 percent industrywide.

How was Nissan able to successfully navigate a disruption of this magnitude so successfully?
  1. To begin with, Nissan responded by adhering to the principles of its risk management philosophy. It focused on identifying risks as early as possible, actively analyzing these risks, planning countermeasures, and rapidly implementing them.
  2. The company had prepared a continuous readiness plan encompassing its suppliers, including: an earthquake emergency response plan; a business continuity plan; and disaster simulation training. Nissan deployed these advanced capabilities throughout risk management and along the supply chain.
  3. Management was empowered to make decisions locally without lengthy analysis.
  4. The supply chain model structure was flexible, meaning there was decentralization with strong central control when required. This was combined with simplified product lines.
  5. There was visibility across the extended enterprise and good coordination between internal and external business functions.
These capabilities allowed the company to share information globally, allocate component part supplies on higher margin products, and adjust production in a cost-efficient way.

Why This Study?

Counterintuitive stories such as Nissan's are at the heart of this study, illustrating that companies with highly mature capabilities in both supply chain management and risk management will be able to effectively address risks, outperform the market, and even gain competitive advantage.

We believe that linking the customer value proposition, sound supply chain operations, and robust risk management is key to success. Moreover, there are supply chain and risk management principles, frameworks, and processes that enable companies to address complex market challenges and achieve superior performance.

The MIT Forum for Supply Chain Innovation and PricewaterhouseCoopers (PwC) launched the Supply Chain Risk Management Survey to assess how global organizations address these challenges and their impact on business operations. The survey was distributed to members of the MIT Forum for Supply Chain Innovation and worldwide clients of PwC. In total, 209 companies completed the survey. Appendix A characterizes the participant population.

The Challenges of a More Global Supply Chain

When a company expands from a local or regional presence to a more global one, the operations strategy needs to be adjusted to align with the changes. The economic crisis in Europe is a good example of this. Due to the decrease in demand for many products and services on the continent, companies are changing strategies, seeking alternate global markets. That's when operations become more complex. Transportation and logistics become more challenging, lead times lengthen, costs increase, and end customer service can suffer. With a more a global footprint, different products are directed to more diverse customers via different distribution channels, which require different supply chains.

To address the challenge successfully, there are a number of questions companies need to consider as their operations globalize.
  1. What are the drivers of supply chain complexity for a company with global operations, and how have they evolved over the recent past?
  2. What are the sources of supply chain risk?
  3. How can vulnerability and exposure to high-impact supply chain disruptions be properly assessed and managed?
  4. How can supply chain resilience be improved?
  5. What supply chain operations and risk principles will guide the improvement of the company's bottom line: the operations and financial performance?
Through this research, we aim to provide valuable insight in response to these questions.

What Are the Drivers of Supply Chain Operations Complexity?

Supply chains are exposed to both domestic and international risks. The more complex the supply chain, the less predictable the likelihood and the impact of any disruption. In other words, exposure to risk is potentially higher. We asked survey participants their views on how certain key supply chain complexity drivers have evolved over the past three years. The responses are shown in Figure 1.
Constantine G. Vassiliadis
Figure 1. Evolution of supply chain complexity over the past three years.

In recent years, the size of the supply chain network has increased, dependencies among entities and functions have shifted, the speed of change has accelerated, and the level of transparency has decreased.
Overall, developing a product and getting it to the market requires more complex supply chains needing a higher degree of coordination.

What Are the Sources of Supply Chain Risk?

Risks to global supply chains vary from known-unknowns and controllable, to unknown-unknowns and uncontrollable ones[2]. In the Nissan case, the devastating natural disasters were unknown-unknowns (difficult to quantify the likelihood of occurrence) and uncontrollable (you cannot manage the expected risk and its impact).

To understand the level of exposure to diverse and broad-ranging sources of risk, we asked survey participants to identify the sources of risks faced by their supply chain. The results are shown in Figure 2.

Figure 2
Figure 2. Survey participants' view on sources of risks faced by their supply chain.

Interestingly, all the top six risks, with the exception of environmental catastrophes, are known-unknowns and controllable to some degree.

To What Parameters Are Supply Chain Operations Most Sensitive?

Respondents replied that their supply chain operations were most sensitive to skill set and expertise (31%), price of commodities (29%), and energy and oil (28%). See Figure 3.

As an example of the energy and oil parameter, according to the US Department of Energy Information Administration, US diesel prices rose 9.5 cents per gallon in February 2012. Cognizant of the sensitivity and impact diesel prices can have on their financial bottom line, shippers adjust their budgets to offset the increased costs higher fuel prices produce.

Figure 3
Figure 3. Parameters to which survey participants' supply chain operations are most sensitive.

How Do Companies Mitigate Against Disruptions?

What kind of actions do our survey respondents currently take to reduce the exposure of their supply chain to potential disruptions or to mitigate the impact? Nissan had a well-thought-out and exercised business continuity plan ready to kick into action to facilitate a quick recovery. And indeed, 82 percent of respondents said they had business continuity plans ready. See Figure 4.

Figure 4
Figure 4. Actions companies take to mitigate supply chain risk.

The Supply Chain and Risk Management Maturity Framework

Strengthen Supply Chain and Risk Management

As Nissan illustrated, to reduce vulnerability and exposure to high-impact supply chain disruptions, companies need advanced capabilities along two dimensions: supply chain management and risk management. But how can they understand the maturity level of their capabilities in these areas before designing ways to strengthen them?

The Seven Supply Chain and Risk Enablers of Maturity

There are seven factors that enable stronger capabilities in both supply chain management and risk management. By matching their practices against these seven "enablers," companies can assess how mature or immature their capabilities are. This is the basis of our Supply Chain and Risk Management Maturity Model—an empirical framework that applies set questions across the seven enablers.
  1. Risk governance—the presence of appropriate risk management structures, processes, and culture.
  2. Flexibility and redundancy in product, network, and process architectures—having the right levels of flexibility and redundancy across the value chain to be able to absorb disruptions and adapt to change.
  3. Alignment between partners in the supply chain—strategic alignment on key value dimensions, identification of emerging patterns, and advancement toward higher value propositions.
  4. Upstream and downstream supply chain integration—information sharing, visibility, and collaboration with upstream and downstream supply chain partners.
  5. Alignment between internal business functions—alignment and the integration of activities between company value chain functions on a strategic, tactical, and operational level.
  6. Complexity management/rationalization—ability to standardize and simplify networks and processes, interfaces, product architectures, and product portfolios and operating models.
  7. Data, models, and analytics—development and use of intelligence and analytical capabilities to support supply chain and risk management functions.
According to our survey, companies consider alignment between partners in the supply chain as the most important factor in enabling risk reduction (60%). See Figure 5.

Internal and external process integration is also very important (49%) and (47%). Risk governance (44%) and network flexibility and redundancy (37%) are also being included in the mix. Finally, despite recent advances, data, models and analytics (28%), and complexity management/ rationalization (26%) are low on the priority list. As analytics continue to mature, this may change.

Figure 5
Figure 5. Survey participants' view on which capability enabler they consider the most important.

Four Levels of Maturity in Supply Chain Operations and Risk Management

Supply chain operations and risk management processes go hand in hand and complement one another. At lower maturity levels, the processes are decoupled and stand alone, but at high maturity levels they are fully intertwined. For developing and deploying capabilities to manage supply chain risk effectively, a high level of supply chain sophistication is an absolute prerequisite. There are four levels of supply chain and risk management process maturity:

Level I: Functional supply chain management and ad hoc management of risk. Supply chains are organized functionally with a very low degree of integration. They are characterized by high duplication of activities, internally and externally disconnected processes, and an absence of coordinated efforts with suppliers and partners. Product design is performed independently and there is little visibility into partners/suppliers operations. Inventory and capacity levels are unbalanced, leading to poor customer service and high total costs. There is no risk governance structure and poor visibility into sources of supply chain risk. Only very limited vulnerability or threat analysis is performed. Risk is managed in an ad hoc way with no anticipation or positioning of response mechanisms.

Level II: Internal supply chain integration and positioning of planned buffers to absorb disruptions. Supply chains are cross-functionally organized. Internal processes are integrated, information is shared, and visibility is provided between functions in a structured way. Resources are jointly managed and there is a higher level of alignment between performance objectives. Integrated planning is performed at strategic, tactical, and operational levels—leading to a single company plan. Risk management processes are documented and internally integrated. Basic threats and vulnerabilities are analyzed. Scenarios concerning the base integrated plan are conducted to position targeted buffers of capacity and inventory to absorb disruptions. Postponement or delayed differentiation product design principles are explored to improve response to changing demand patterns. There is minimum visibility, however, into emerging changes and patterns outside the company.

Level III: External supply chain collaboration and proactive risk response. Supply chains feature collaboration across the extended enterprise. Information sharing is extensive and visibility is high. Key activities such as product design or inventory management are integrated among supply chain partners. External input is incorporated into internal planning activities. Interfaces are standardized, and products and processes are rationalized to reduce complexity. Information sharing and visibility outside the company domain is exploited to set up sensors and predictors of change and variability to proactively position response mechanisms. Formal quantitative methodologies for risk management are introduced and sensitivity analysis is conducted. Suppliers and partners are monitored for resilience levels and business continuity plans are created.

Level IV: Dynamic supply chain adaptation and fully flexible response to risk. Companies are fully aligned with their supply chain partners on key value dimensions across the extended enterprise. Individual strategies and operations are guided by common objectives and fitness schemata. Supply chains are fully flexible to interact and adapt to complex dynamic environments. Emerging value chain patterns resulting from this interaction are probed and identified and higher value equilibrium points are achieved. At this level, the supply chain is often segmented to match multiple customer value propositions. Risk sensors and predictors are supported by real-time monitoring and analytics. Risk governance is formal but flexible. Full flexibility in the supply chain product, network, and process architecture and short supply chain transformation lead-times allow quick response and adaptability. Supplier segmentation is performed. Risk strategies are segmented based on supplier profiles and market-product combination characteristics.

Table 1 summarizes the criteria used as a basis for the questions and the maturity levels.

Table 1
Table 1. Capability maturity classification model.

How Mature Are Company Capabilities?

The framework is a useful tool in evaluating each company's capabilities. Importantly, according to our study, it shows that the majority of the companies surveyed have immature supply chain operations and risk management processes in place. See Figure 6.

Specifically, of the companies surveyed, only 41 percent were classified as having mature processes, based on their responses; 59 percent of companies have immature processes in place to effectively address incidents. Only a minority of companies (9 percent) are fully prepared to address potential challenges from supply chain disruptions in increasingly complex environments.
Figure 6
Figure 6. Companies classified by capability level.

Key Insights—More Mature Capabilities Lead to Better Operational Performance

Having assessed the maturity levels of the 209 companies in the survey, we then analyzed their business and operational performance indicators over the previous 12 months. Our aim was to understand the impact of disruptions on mature vs. immature companies.

The indicators cover a wide spectrum of company performance including profitability, efficiency, and service. Both the scale of the impact and the time it took to recover to prior or improved levels of performance were measured. These are the key insights from the 209 companies surveyed.

1. Supply chain disruptions have a significant impact on company business and financial performance.
To better understand the impact of disruptions[3], we assessed the performance of companies that faced at least three disruptive incidents over the previous 12 months. If performance indicators were negatively affected by 3 percent or higher, this was considered "significant impact." As Figure 7 illustrates, 54 percent said that sales revenue was negatively affected and 64 percent suffered a decline in their customer service levels. Across all the operational key performance parameters (KPIs) examined, at least 60 percent reported a 3 percent or higher loss of value. For example, in India's textile industry, raw material costs rose by 6 percent due to India's recent sharp currency fall, causing fabric prices to rise[4]. This currency volatility triggered a rise in total costs for fabric makers.

The importance of having mature capabilities in place to deal with supply chain disruptions is clear.
Figure 7
Figure 7. Percentage of companies that suffered a 3 percent or higher impact on their performance indicators as a result of supply chain disruptions in the previous 12 months.

2. Companies with mature supply chain and risk management processes are more resilient to disruptions than those with immature processes.
According to the survey results, companies with mature (maturity levels III & IV) supply chain and risk management processes are more resilient to disruptions than companies with immature (maturity levels I & II) processes. The more mature companies suffer lower impact and enjoy faster recovery.

Figure 8 shows the percentage of companies with more than three incidents that suffered an impact of 3 percent or higher on their performance as a result of supply chain disruptions in the previous 12 months.
Only 44 percent of the companies with mature processes suffered a 3 percent or more decline in their revenue compared to 57 percent with immature processes. The higher resilience trend for mature companies is common for all the KPIs examined. The difference is striking in key areas such as total supply chain cost, order fulfillment lead times and lead-time variability. These KPIs are among those most heavily impacted by supply chain disruptions, so mature companies gain a distinct advantage by investing in the proposed set of capabilities.

Figure 8
Figure 8. Performance of companies with mature vs. immature capabilities.

3. Mature companies that invest in supply chain flexibility are more resilient to disruption than mature companies that don't.
Flexibility is critical to a company's ability to adapt to change. A greater degree of flexibility allows companies to better respond to demand changes, labor strikes, technology changes, currency volatility, and volatile energy and oil prices. However, flexibility does not come free, and the higher the level of flexibility the more expensive it is to achieve. Similarly, achieving a higher level of service can be costly. It's a difficult trade-off between the desire to minimize costs vs. investing in flexibility or increasing customer service levels.
We asked the respondents to identify the key supply chain value drivers for their leading customer value proposition. High customer service level (34 percent) and flexibility (27 percent) were cited as the top two drivers followed by cost minimization (22 percent) and efficient use of inventory (14 percent). See Figure 9.
Figure 9
Figure 9. Key supply chain value driver to match customer value proposition.

Two distinctive groups emerge from this response:
  • The cost-efficient group—mature companies that selected cost or efficiency as their key supply chain value driver.
  • The flexible-response group—mature companies that selected flexibility or customer service levels as their key supply chain value driver.
When we compared the performance resilience of these two groups, we learned that the flexible-response group fared significantly better. The performance of cost-efficient companies suffered more from the changes and disruptions in their supply chains, even though they possess mature capabilities in deploying their strategy. Mature companies investing in flexibility, responsiveness, and customer service demonstrate higher performance resilience compared to companies whose strategies emphasize cost and efficiency. Figure 10 highlights the major differences.
Figure 10
Figure 10. Performance of mature cost-efficient vs mature flexible-response companies.

Figure 10 also illustrates that the largest majority of cost-efficient companies (80 percent) face high variability in their supply chain lead times once a supply chain disruption takes place. This is interesting given that low variability is one of the key drivers of an efficient operating strategy.

4. Mature companies that invest in risk segmentation are more resilient to disruptions than mature companies that don't.
Companies with different market value propositions prioritize different value dimensions in their supply chains. Today, companies often target different market segments and therefore have several customer value propositions. For example, one part of the product portfolio may emphasize price as the key differentiator while another emphasizes product innovation or product selection and availability.

We asked our survey respondents to identify the key value dimension of their leading customer value proposition. The top three choices were: quality (23 percent), innovation (14 percent), and price (14 percent). See Figure 11.
Figure 11  
Figure 11. The key value dimension of the leading customer value proposition of survey participants.
Different value propositions—and the corresponding operating strategies—do not necessarily have the same risk profile. Value dimensions are not exposed to the same threats and vulnerabilities. As a result, the management of supply chain risk—exposure reduction and mitigation strategies—may need to vary significantly based on the value dimension.

Consider a value proposition emphasizing product innovation. The high speed of innovation, the corresponding lower forecast accuracy, the higher price risk, and the higher supply risk will essentially determine the type of strategy the company deploys with its supplier. If the price risk or supply risk is higher as a result of the speed of innovation then it is more likely that flexible risk-sharing contracts, rather than a buildup of inventory buffers is appropriate. Thus, risk strategies needs to be segmented according to the value driver.

We asked survey respondents whether they actively pursued risk strategy segmentation. Almost 60 percent do and 40 percent don't. See Figure 12.
Figure 12
Figure 12. Percentage of companies that perform risk strategy segmentation.

We asked the 59 percent of companies that pursued risk segmentation, "What product differentiators do you use as a basis for risk strategy segmentation?" The top three choices were: strategic importance (56 percent), demand volatility (52 percent) and sales volume (45 percent). See Figure 13.

Figure 13
Figure 13. Key product differentiators for risk strategy segmentation.

Companies with mature capabilities were clustered into two main groups: those that perform risk strategy segmentation and those that don't. We then compared the performance resilience to supply chain disruptions for both groups. We observed that mature companies investing in risk segmentation based on different value propositions demonstrated higher performance resilience than companies that did not invest in risk segmentation.

Figure 14 highlights the major difference between the two groups across operations and financial performance indicators. Of particular note is the sales revenue category. Only 32 percent of the mature companies that segment their risk management strategy were significantly impacted as a result of incidents that occurred. This compares to 70 percent of mature companies that don't segment—a 38 percent difference!

Figure 14
Figure 14. Performance of companies based on risk strategy segmentation.

5. Companies with mature capabilities in supply chain management and risk management do better along all surveyed dimensions of operational and financial performance than immature companies.
We compared how company operations and financial performance differed between the mature and immature companies over the prior 12 months. As Figure 15 highlights, companies with mature capabilities in supply chain and risk management did better along all surveyed dimensions of operational and financial performance.

This finding suggests that there is a direct link between having mature supply chain and risk management capabilities and higher overall performance.


Figure 15
Figure 15. Business and financial performance difference between mature and immature companies.

The capability maturity evaluation will enable company executives to gain insight into the risk position and maturity of the company measured in terms of operations and financial performance.

Appendix A: Survey Demographics and Trends

The majority of the 209 survey participants are from Europe. Figure 16 illustrates the geographical distribution of survey participants according to where their headquarters are based.
Figure 16
Figure 16. Distribution of survey participants' headquarters by region.

Figure 17
Figure 17. Distribution of survey participants by industry.

Figure 18
Figure 18. Distribution of survey participants by annual sales revenue based on 2011 reported sales revenues.

The majority of survey participants (64 percent) are manufacturing companies. See Figure 19.
Figure 19
Figure 19. Percentage of manufacturing vs. non-manufacturing survey companies.

A total of 83 percent of the participating companies have their manufacturing operations dispersed in multiple geographic regions while only 17 percent have them in the same region as their headquarters.
Figure 20
Figure 20. Distribution of companies by scale of operations globalization.

With 83 percent of the companies having operations across regions, we examined how the split of operations volume by regions compared with the split of their sales volume by region to get an indication of the use of regional vs. global operations strategies to meet demand. For the previous 12 months, we observed that sales vs. operation volumes per region were mostly aligned—indicating use of regional strategies by survey participants.
Figure 21
Figure 21. Comparison between manufacturing operations volume and sales volume by region.

This is a comparison between the current and the future operations volume in 2015 by region based on the expectation of survey participants. America's operations remain constant. A 3 percent growth is shown for Asia and a corresponding 2 percent decline for Europe, indicating a shift of operations from Europe to Asia.
Figure 22
Figure 22. Comparison between current vs. future expected operations by volume.

Survey participants expect a drop in their sales volume in Europe by 2015 and an increase in sales volumes in most of other world regions with Asia, the Middle East, and Africa contributing the biggest part.
Figure 23
Figure 23. Comparison between current vs. future expected sales volumes by region.

Appendix B: Key Performance Indicator Definitions

The key operations[5] and financial performance indicators used in this study are described below:


Market value
The current market value of a company is the total number of shares outstanding multiplied by the current price of its shares. Recent research has shown that shareholder value can be significantly impacted by severe supply chain disruptions. An example is Mattel, the world's largest toymaker, which had to issue a major product recall due to quality issues. Mattel's stock price suffered a steep fall when the recall was announced in Q3 2007 and did not recover for many months.

Sales revenue
The revenues a company makes after the sale of its products. Supply chain disruptions or structural market shifts can impact a company's ability to deliver the value proposition and lead to loss of sales volume and sales revenue.

Market share
The company's sales over the period divided by the total sales of the industry over the same period. Loss of delivery capability or damaged brand image can lead to market-share loss, especially when the impact of a supply chain disruption is long-lasting.

Earnings before income and taxes margin
The earnings before interest and tax (EBIT) divided by total revenue. EBIT margin can provide an investor with a clearer view of a company's core profitability.

Total supply chain cost
The sum of fixed and variable costs to perform the plan, source, make, and deliver functions for company products. Supply chain disruptions have an impact on total supply chain cost as a number of activities need to be expedited or redesigned across the various functions.

Supply chain asset utilization
Supply chain asset utilization is a measure of actual use of supply chain assets divided by the available use of these assets. Assets include both fixed and moving assets. Fixed assets enable direct product development, transformation, and delivery of a company's products or services, as well as indirect support, and typically have greater than one year of service life. A disruption can directly impact the usability of assets and resources or cause their repositioning. As a result, the utilization of key assets and resources may deviate significantly from the set targets.

Inventory turns
Inventory turnover ratio measures the efficiency of inventory management. It reflects how many times average inventory was produced and sold during the period. A disruption or change may impact inventory efficiency either by introducing increased obsolescence or by changing inventory positioning and consumption plans.

Customer service levels
The probability that customer demand is met. The loss of delivery, customer communication, or customer service capability due to a supply chain disruption can impact customer service levels.

Order fulfillment lead time
The average actual lead times consistently achieved, from order receipt to order entry complete, order entry complete to start build, start build to order ready for shipment, order ready for shipment to customer receipt of order.

Total supply chain lead time
Total supply chain lead time in supply chain management is the time from the moment the customer places an order (the moment you learn of the requirement) to the moment the product is received by the customer. In the absence of finished goods or intermediate (work in progress) inventory, it is the time it takes to actually manufacture the order without any inventory other than raw materials. Supply chain disruptions can introduce significant delays across all stages of the supply chain.

Total supply chain lead-time variability
Total supply chain lead-time variability is the time variation around the total supply chain lead-time mean. Exposure to incident disruptions introduces variability and fluctuations in the standard lead-time levels within the supply chain.

About the Project Team

Professor David Simchi-Levi, MIT

Department of Civil and Environmental Engineering and the Engineering Systems Division, MIT
David Simchi-Levi Professor David Simchi-Levi is considered to be one of the thought leaders in supply chain management. He holds a Ph.D. from Tel Aviv University. His research currently focuses on developing and implementing robust and efficient techniques for logistics and manufacturing systems. He has published widely in professional journals on both practical and theoretical aspects of logistics and supply chain management. He is also the editor-in-chief of Operations Research, the flagship journal of INFORMS, the Institute for Operations Research and the Management Sciences.

Ioannis M. Kyratzoglou

System Design and Management Fellow, Massachusetts Institute of Technology
Ioannis M. Kyratzoglou Mr. Ioannis M. Kyratzoglou is a fellow at the MIT Sloan School of Management and the School of Engineering. He holds a master of science and a mechanical engineer's degree from MIT. He is currently a principal software systems engineer with The MITRE Corporation. His interests are in software engineering and data analytics.


Constantine G. Vassiliadis

Principal Manager, PricewaterhouseCooper, The Netherlands
Constantine G. Vassiliadis Dr. Constantine Vassiliadis holds a Ph.D. from Imperial College, London, in process systems engineering. He has been working as a consultant on supply chain improvement programs with companies worldwide for the past 15 years. In parallel, he is involved in supply chain research and thought leadership initiatives with leading academic institutions.



References


  1. Nissan Motor Company Ltd.: Building Operational Resiliency: William Schmidt, David Simchi-Levi, MIT Sloan Management: Case Number 13-150
  2. Operations Rules: Delivering Value Through Flexible Operations, David Simchi-Levi, 2010, The MIT Press.
  3. Information about disruption impacts is self-reported by survey participants.
  4. www.business-standard.com, Fabric prices rise on weaker rupee, 5 September 2013
  5. David Simchi-Levi, Phil Kminsky, Edith Simchi-Levi (2008). Designing and Managing the Supply Chain: Concepts, Strategies, and Case Studies, 3rd Edition. McGraw-Hill Irwin