Even the most advanced AI chip architectures suffer from performance and energy efficiency limitations caused by the memory bottleneck between computing cores and data. Most state-of-the-art CPUs, GPUs, TPUs and other neural network hardware accelerators are limited by the latency, bandwidth and energy consumed to access data through multiple layers of power-hungry and expensive on-chip caches and external DRAMs. Near-memory computing, based on emerging nonvolatile memory technologies, enables a new range of performance and energy efficiency for machine intelligence. In this presentation, we introduce innovative and affordable near-memory processing architectures for computer vision and voice recognition, and also present architectural recommendations for edge computing and cloud servers. We also discuss how non-volatile memory technologies, such as Crossbar Inc.’s ReRAM, can be directly integrated on-chip with dedicated processing cores, enabling new memory-centric computing architectures. The superior characteristics of ReRAM over legacy non-volatile memory technologies help to address the performance and energy efficiency demands of machine intelligence at the edge and in the data center.