Recent years have seen an explosion in machine learning/AI algorithms with a corresponding need to use custom hardware for best performance and power efficiency. However, there is still a wide gap between algorithm creation and experimentation (using deep learning frameworks such as Tensorflow and Caffe) and custom hardware implementations in FPGAs or ASICs. High-level synthesis (HLS) using standard C++ as the design language can provide an automated path to custom hardware implementations by leveraging existing APIs available in deep learning frameworks (e.g., the Tensorflow Operator C++ API). Using these APIs can enable designers to easily plug their synthesizable C++ hardware models into deep learning frameworks to validate a given implementation. Designing using C++ and HLS not only provides the ability to quickly create AI hardware accelerators with the best power, performance and area (PPA) for a target application, but helps bridge the gap between software algorithms developed in deep learning frameworks and their corresponding hardware implementations.