How Nvidia breaks Chrome Incognito

When I launched Diablo III, I didn’t expect the pornography I had been looking at hours previously to be splashed on the screen. But that’s exactly what replaced the black loading screen. Like a scene from hollywood, the game temporarily froze as it launched, preventing any attempt to clear the screen. The game unfroze just before clearing the screen, and I was able to grab a screenshot (censored with bright red):


Even though this happened hours later, the contents of the incognito window were perfectly preserved.

So how did this happen? A bug in Nvidia’s GPU drivers. GPU memory is not erased before giving it to an application. This allows the contents of one application to leak into another. When the Chrome incognito window was closed, it’s framebuffer was added to the pool of free GPU memory, but it was not erased. When Diablo requested a framebuffer of it’s own, Nvidia offered up the one previously used by Chrome. Since it wasn’t erased, it still contained the previous contents. Since Diablo doesn’t clear the buffer itself (as it should), the old incognito window was put on the screen again.

In the interest of reproducing the bug, I wrote a program to scan GPU memory for non-zero pixels. It was able to reproduce a reddit page I had closed on another user account a few minutes ago, pixel perfect:

Of course, it doesn’t always work perfectly, sometimes the images are rearranged. I think it has something to do with the page size of memory on a GPU:

This is a serious problem. It breaks the operating system’s user boundaries by allowing non-root users to spy on each other. Additionally, it doesn’t need to be specifically exploited to harm users – it can happen purely by accident. Anyone using a shared computer could be exposing anything displayed on their screen to other users of the computer.

It’s a fairly easy bug to fix. A patch to the GPU drivers could ensure that buffers are always erased before giving them to the application. It’s what an operating system does with the CPU RAM, and it makes sense to use the same rules with a GPU. Additionally, Google Chrome could erase their GPU resources before quitting

I submitted this bug to both Nvidia and Google two years ago. Nvidia acknowledged the problem, but as of January 2016 it has not been fixed. Google marked the bug as won’t fix because google chrome incognito mode is apparently not designed to protect you against other users on the same computer (despite nearly everyone using it for that exact purpose).




Complex Queries in a multidimensional database

Many internet based companies, such as Google or Amazon, rely on powerful databases to provide their services. These databases must be able to handle trillions of data points and millions of concurrent users. Many of these databases meet these requirements by storing data points as a key value pair. The key is a unique identifier that represents that data point. The value is a set of attributes that describe the object named by the key. For example, in a database created to store a list of items in a grocery store, the key might be the name of a particular food product, while the value would be a set consisting of the price, weight and quantity available.

Key Value
Item Name Price ($) Weight (kg) Quantity
Eggs 3.99 0.6 10
Juice 1.99 1.3 10
Milk 4.99 4 0

The primary operations performed on such a database are GET and SET. The GET operation allows a user to determine the value associated with the key. In the example database, “GET Eggs” would return “Eggs: Price=3.99, Weight=0.6kg, Quantity = 10”. The SET operation allows the user to change the value associated with the key. The main limitation of this system is that both operations require the user to have the key. If a user only knows the value, it is impossible for them to use the database. To solve this problem, another primitive is needed. The SEARCH operation allows users to enter a set of restrictions on the value, and have the database return a list of all keys that match. In the grocery database, a user may want to return a list of all items that are currently in stock.  To do this, a SEARCH operation would be performed with the predicate “Quantity > 0”. Such a search would return the keys “Eggs” and “Juice”. If more than one predicate is specified, only the items matching all predicates are returned.

The implementation of a key-value database must very efficient in order to meet the stringent performance requirements demanded of them. As these databases are expected to handle trillions of keys, even an O(n) algorithm would be too slow to be of any use. Ideally, the performance of a database operation would only depend on the number of keys of interest. As the GET and SET commands operate  on only a single key, they should have O(1) complexity. Similarly, if the SEARCH command returns k keys, it should have O(k) complexity.

A key/value store server might employ a hash table to store the data. As hash tables have an O(1) lookup time, this would satisfy the performance requirement for the GET and SET operations. Unfortunately, a traditional hash table has no way to search through key-value pairs based on the attributes of the value. Executing a SEARCH operation on a traditional hash table would require each item to be inspected individually (O(n)).  To allow for efficient SEARCH operations, the hash table must be constructed in a way that allows it to be indexed by both key and attributes.

A traditional hash table maps each key to a particular position of a one dimensional array. To allow the table to be indexed by additional attributes, more dimensions must be added. In this multidimensional hash table, each attribute corresponds to a separate dimension in the hash table. For example, a two dimensional database might be used to store a list of people’s weight and height. The key would be the person’s name, while weight and height would be the attributes. In such a database, each person is mapped into the 2D space at a point (weight, height). This allows for efficient SEARCH operations, by integrating over the space using the search predicates as bounding conditions. The points inside the integrated area are the points that match the search predicates. This method only looks at points within the integrated area, allowing it to have the desired O(k) complexity (where k is the number of keys returned by a search).

An example of integration in 2D. The shaded regions are search predicates. Only the points inside both the red and green areas match both search predicates.

Only recently have multidimensional hash tables been considered for use in database software. Currently, the only database using this technique is the HyperDex database. The use of multidimensional hash tables allows HyperDex to perform SEARCH operations roughly two orders of magnitude faster than any other database software. While it is fast, HyperDex’s SEARCH operation is very limited. It only supports a single, constant valued  predicate in each dimension (ie, y < 4 or x == 6). These integrals are very easy to solve, as they can always be expressed as a series of definite integrals with constant bounds:

CodeCogsEqn (1)

To support more complicated predicates, more complicated integrals must be employed. For example, to find items in a 2D database that have one attribute at least twice that of another, the predicate “y > 2x” is entered. This predicate can be represented by the integral:

CodeCogsEqn (1)

Xmax and Ymax represent the bounds of the database. Since there are no restrictions on the x dimension, the integral goes from 0 to Xmax. Using this system, some polynomial restrictions can be supported as well. A database containing the heights and weights of a list of people might be searched to find a list of healthy people. One definition of having a healthy weight is having a BMI on the range [18.5, 25]. The equation for BMI can be refactored to be in terms of weight:

18.5 \leq BMI \leq 25,  \> \>BMI = \frac{w}{h^{2}} \Rightarrow 18.5h^{2} \leq  w \leq  25h^{2}

Which can be evaluated as the integral:

CodeCogsEqn (6)

With this method, any equation that can be integrated can be computed on the database, in O(k) time complexity. However, this method is still limited to a single max and min in each dimension. In order to add support for multiple restrictions, the integral must be split into multiple pieces. This is much more complicated, and requires the computation of every possible intersection point between restrictions. For example, the predicates “y<3” and “y<(x+1)” could be expressed as two integrals:

CodeCogsEqn (7)

This method can be used to perform a search operation with any number of predicates. As each separate integral still only includes valid points, the complexity of the operation remains O(k).

Revenge on tHE CAPS LOCK KEY

A standard computer keyboard has 101 keys. Of these 101 keys, no key is more hated than the CAPS LOCK key. It’s relatively large size and position on the keyboard cause it to be frequently pressed accidentally. As a good touch typist thinks ahead of their hands, this mistake may not be corrected until a few words have been typed. If the user is typing in a secure text field, this mistake may not be corrected at all. This is especially undesirable when the user is creating an account, as the erroneous password will be used as the credentials for the website. When the user tries to login, they will be denied access despite having entered the ‘correct’ password. As they are new to the site, they will be much more likely to blame the website and leave than attempt to type their password again. Web developers find this infuriating as they are losing customers to a bug they have no control over.

Why was the CAPS LOCK key added to the keyboard if it so useless? The standard OWERTY keyboard was a direct copy of the standard typewriter. When capital letters were implemented on typewriters, the shift key was added to give each key two purposes. It accomplished this by tilting the entire paper apparatus so that the pads attached to each key contacted the paper farther back. Pressing the shift key required significant effort, making it difficult to keep the button pressed while typing. A shift lock key was introduced that held the paper reel in the shifted position, causing all of the keys to print their shifted value. When the first computer keyboards were developed, the shift lock key was modified to only capitalize letters, leaving the rest of the keys unmodified. As the force required to actuate the shift key on a modern keyboard is minimal, the CAPS LOCK key is a solution to a problem that no longer exists.

The main reason the CAPS LOCK still exists is that the market heavily resists changes to standards. Still, a few people are trying to change this. Google replaced the key on their Chromebook with a search key, and the XO Laptop project replaced it with another control key. Some operating systems allow the CAPS LOCK key to be disabled through software. The key can also be disabled through modifications done to the keyboard itself.

I use a Ducky 1087 mechanical keyboard with N-key rollover. N-Key rollover indicates that the keyboard can detect every key on the keyboard individually. Most keyboards are not NKRO as they arrange they keys in grids to reduce cost. This can cause the keyboard to ignore some of the key presses if multiple keys are pressed at once. As the 1087 is NKRO, it can be inferred that each switch is independent of the others, allowing them to be modified without affecting the rest of the keyboard. The keyboard is easily disassembled with a screwdriver to reveal the circuit board underneath.

To disable the CAPS LOCK key one of the switch terminals was desoldered. As the switch is mounted to both the PCB and a metal frame it remains mechanically stable yet is unable to make an electrical connection. As the keycap is held by the switch, this allows the keyboard to retain it’s original look. After completing the mod the keyboard remains fully functional.

Direct measurement of Vcore

When you are overclocking a CPU, the voltage is the most often tweaked variable. Higher voltage generally allows for higher clock speeds, but too much and you risk ruining the CPU. Due to a variety of factors, the actual applied CPU voltage (Vcore) can vary significantly from the value set in the EFI/BIOS. Additionally, varying loading conditions on the CPU can have an effect on the applied voltage. To remove this unknown from the overclocking process, Vcore can be measured directly from the motherboard’s  VRMs. The output stage of the VRMs represents the Vcore applied to the CPU. The filter capacitors are the largest component of the output stage, making them easier to solder to. Looking at the P8P67 Pro Motherboard, we can see the filter capacitors are the group of tightly packed capacitors placed close to the CPU:

The negative terminal of the capacitor is marked by the dark stripe. Flipping the board over the legs of the filter capacitors are easily identified:

A wire can be easily soldered to any one of the positive capacitor terminals:

 This signal voltage can be read by any number of measurement tools. Initially it was a measured using a digital multimeter, but a more permanent solution was desired. A cheap microcontroller filled this requirement, and was installed inside the computer, acting as a USB ADC. The multimeter was used to calibrate the ADC, as it was discovered that the ADC had a +110mV offset. This offset was attributed to the use of the imprecise internal 5V rail as a reference voltage. In the future a more precise voltage reference IC combined with the ARef pin on the microcontroller should allow for more accurate measurement.

To finish the project, a small applet was written to conveniently display the voltage reading to the user in the menu bar.

Correcting pulse width for switch time

In many lighting applications, the brightness of the device needs to be controlled. When designing an LED driver, this can be accomplished by either varying the current through the LED, or by varying the duty cycle. It is preferable to vary the duty cycle via PWM rather than vary the current, as the color of the LED may change slightly with current.  This ensures that the LED retains it’s color characteristics, as the LED is either fully on or fully off.  Ideally when using PWM the average brightness of the LED would be equal to the duty cycle * maximum brightness. However, this calculation assumes that the time to switch between on and off is zero. As the pulse width is shortened, the difference between the calculated and actual pulse width increases. If the lighting application requires a smooth dim from 0% to 100% brightness, this becomes an issue.

The width of the pulse will be distorted by both the rise and fall time. The rise time will retard the leading edge of the pulse, shortening it. The fall time will extend the trailing edge of the pulse, lengthening it. Thus, the total pulse distortion will be the difference of the rise and fall times. In order to accurately reproduce the correct lightness, the difference between the rise and fall times must be calculated and corrected.

Pulse Width Distortion Simulation

The transient response of an LED driver will vary heavily with the design. The two primary methods of implementing PWM are to either switch the entire converter off and on, or to add another MOSFET dedicated to interrupting current flow. The former is cheaper and simpler whereas the latter is more precise. In an application that demands a 0-100% smooth fade a dedicated MOSFET will almost always be used. This dedicated MOSFET is usually driven directly from the microcontroller’s PWM output. This causes an issue as the output pin of a microcontroller can often source more current than it can sink. This will cause uneven rise and fall times for the MOSFET, which will distort the length of the pulse. This can be calculated by using a few parameters of the parts to be utilized. The MOSFET switch time can be roughly modelled by dividing the gate charge by the  charge current. Thus:


As the MOSFET will not turn on until the gate is charged, and will not turn off until the gate is discharged, the actual pulse width will be:

Using typical values for an IC of 10mA source and 25mA sink, we get a -1.2uS distortion to the pulse width. That is, every pulse width will be 1.2uS shorter than it should be. Even at a PWM frequency of 200 Hz, this causes a significant error as you approach 0.1% duty cycle. Due to the non-linear response to light of the human eye, the driver must be able to accurately reproduce extremely low light levels. To do this, the pulse distortion must be corrected to allow for smooth dimming throughout this range. To correct the error, all pulses are simply extended by the same amount lost by the switching. This software fix can easily be implemented on the microcontroller. At 200 Hz and 12 bit PWM each part represents 1.22uS . As this is close to the determined value of 1.2uS, we can simply add 1 to every 12-bit PWM value to correct.