I am a senior research scientist studying computer architecture at NVIDIA in Austin, TX. I conduct research in the design of efficient, dependable, and secure computer systems.

A photo at HPCA 2011

Current Research Interests

Strong Memory System Reliability

The size and sensitivity of computer memory make its protection the first order of business for a reliability-conscious designer. Despite the long and successful history of using error coding techniques to mitigate memory error rates, there is still a need for strong and flexible memory error protection techniques for large supercomputers and other high-performance systems. Correspondingly, much of my recent research focuses on techniques to provide very high levels of main memory reliability without exceeding the current industry standard storage footprint for error-correcting codes.

Low-Cost Security

As technology advances, ensuring the security of computer systems and networks is becoming essential for preventing cyber attacks, protecting confidential data, and maintaining the trust of individuals and businesses. A large part of current research focuses on ensuring security without costly overheads or prohibitive design complexity. A key goal of my recent publications is memory safety, which involves preventing memory-related vulnerabilities such as buffer overflows and dangling pointer dereferences that can lead to serious security breaches.

Efficient and Reliable Application-Specific Acceleration

Increasing levels of integration make it so that specialized hardware units can be cost-effectively placed on-chip. This, combined with the ever-increasing need for energy efficient execution, will make the hardware acceleration of important applications and workloads more commonplace. Towards this end, some of my research has aimed at the efficient acceleration of workloads that exhibit fine-grained gather/scatter memory access patterns, DRAM link compression, and the reliability characterization of DNN accelerators.

Other Interests

System-Level Reliability

Reliability and resilience are major obstacles on the road to exascale computing and beyond. The number of components required for exascale systems and the decreasing inherent reliability of components in future fabrication technologies conspire to make reliability a first-order design concern. A strong system-level approach towards reliability is needed in order to efficiently handle errors at all scales. In addition, it is important to enable and exploit cross-layer reliability through system-level mechanisms—there are a plethora of different failures that can occur in a large computer system, and superior efficiency can only be achieved by dealing with every error in the appropriate manner and system layer.

Arithmetic Error Detection

Rising levels of integration and decreasing component reliabilities make error protection increasingly important. At the same time, the need for energy efficiency necessitates the careful evaluation of resilience techniques. Arithmetic error protection is typically more expensive than the protection of memory or data movement, requiring large amounts of redundant logic. Protection of computer arithmetic has correspondingly been reserved for critical or high-availability applications. Given current reliability trends, some of my research is focused on providing strong, flexible, and low-cost error protection for arithmetic operations.