Machine learning (ML) is becoming ubiquitous and integrated into applications that are important to our daily lives, societal prosperity, and technological progress. Meeting ML demands in real-time is facing tremendous challenges on large datacenters to ingest live data streams such as speech/text translations, real-time image and video classification, and personalized recommendation queries. This is because the existing computing platform is limited by memory bandwidth and technology scaling can no longer provide substantial system performance improvement. We propose a class of heterogeneous architectures with in-memory analog computing (IMAC) circuits to address computational challenges in datacenters for real-world ML applications. The proposed IMAC circuits can realize both matrix-vector multiplication and nonlinear vector operations in the analog domain using the intrinsic characteristics of emerging resistive memory technologies. The proposed heterogeneous systems can support a variety of ML workloads through a fine-grained partitioning of different portions of ML models onto IMACs and CPUs. Novel multi-objective optimization methods will be developed for full-stack design space exploration of CPU-IMAC systems to tune the hyperparameters at circuit-, architecture-, and system-level. The project will focus on achieving five main research objectives including scalability, accuracy, heterogeneity, interoperability, and reliability to bring CPU-IMAC system to a practical and flexible alternative for existing energy-hungry systems. We will provide a full hardware/software stack blueprint towards beyond 1,000 tera operations per second per watt (TOPS/W) ML inference on datacenters. To evaluate the end-to-end scalability and efficiency of the proposed heterogeneous CPU-IMAC system in terms of performance, energy, and accuracy, we will use standard ML benchmark suites providing a wide range of ML models, realistic end-user scenarios, and standardized evaluation metrics.