Intel® Software Development Tools - 2024 Release

Open, accelerated computing that delivers hardware choice, performance, and productivity

Get the Latest on All Things CODE

author-image

By

Nov. 20, 2023—Intel released its 2024 developer tools with new multiarchitecture capabilities that accelerate AI, HPC and rendering. They equip developers with greater development productivity, code portability and performance—expanding support on Intel GPUs and on upcoming platforms 5th gen Intel® Xeon® Scalable and Intel® Core™ Ultra CPUs. Powered by oneAPI (now driven by the Unified Acceleration Foundation), the tools are based on open standards and broad coverage for C++, OpenMP, SYCL, Fortran MPI and Python.  Developers get the most open development platform, increased performance and productivity for HPC, AI, and rendering applications, and freedom of hardware choice across CPUs and accelerators with Intel® Software Development Tools. 

New Tools Features & Enhancements 

The 2024 tools release provides new features and enhancements for more than 30 foundational programming, AI, HPC and rendering software tools. Developers can download the tools from Intel or via popular repositories and containers, and they are coming soon to the Intel® Developer Cloud. The developer cloud provides an easy path to evaluate the latest Intel CPUs, GPUs, and Intel® Gaudi® 2 AI accelerators to build, test, optimize and deploy applications and workloads. Since the developer cloud launch at Intel Innovation, new JupyterLab systems and GenAI demos using Stable Diffusion* and LLMs are available. Below is a full list of the new tools features and exciting technical previews: 

Compilers & Performance Libraries 

  • All tools support 5th gen Intel® Xeon® Scalable and Intel® Core™ Ultra processors. Directory layout is improved across all products to streamline installation and setup. 

  • Intel® DPC++/C++ Compiler improves productivity and code offload with enhancements that deliver a near complete SYCL 2020 implementation. Adds an easier way to adapt C++ code using virtual functions to run with SYCL device offload, improved error messaging and error handling for SYCL and OpenMP code. 

  • Intel® DPC++/C++ Compiler & Intel® Fortran Compiler add popular LLVM sanitizers to easily catch C++, SYCL, OpenMP address, memory leak, uninitialized memory, thread data races, deadlocks and undefined behavior on CPU; and enhance OpenMP 5.0/5.1 standards compliance. 

  • Intel® Fortran Compiler provides initial Fortran 2023 standards support. 

  • Intel® oneAPI Math Kernel Library integrates vector math optimizations into RNGs for HPC simulations, statistical sampling, and more on X86 CPUs and Intel GPUs. Supports Vector Math for FP16 datatype on Intel GPUs. Delivers high-performance benchmarks (HPCG, HPL and HPL-AI) optimized for Intel® Xeon® CPU Max Series  and Intel® Data Center GPU Max Series. SYCL library binary partitioning results in smaller shared objects footprint for applications that use subdomains. 

  • Intel® oneAPI Data Analytics Library optimizes big data analysis with integration into Microsoft's open source ML.Net* framework to build and ship machine learning models. 

  • Intel® oneAPI Deep Neural Network Library streamlines storage efficiency and optimizes performance on Intel® Xeon® processors. It also enhances compatibility with graph compiler capabilities and advances code generation through the Compiler Xbyak backend and accelerated sparse_tensor_dense_matmul() performance on Intel Xeon processors withTensorFlow 2.5—ultimately boosting development productivity and application speed. 

  • Intel® oneAPI Threading Building Blocks (oneTBB) can be compiled on WebAssembly (Wasm) using Emscripten facilitating the library’s use by applications running on a web browser. 

  • Intel® MPI Library simplifies large MPI message passing and get a more granular process grouping than the single MPI_Init/MPI_COMM_WORLD by using MPI Sessions. Improves MPI application performance on systems with nodes that include the data genter GPU Max Series through efficient message passing and collective operations infrastructure. Enables Fortran codes to use larger data sets through seamless support of 8-byte integers with native support of ILP64. Developers can target systems with software management stacks based on the PMIx standard. 

  • Intel® oneAPI Collective Communications Library boosts performance for distributed AI workloads through better utilization of hardware resources. 

  • Intel® Integrated Performance Primitives helps users securely transmit data faster with Intel® Advanced Vector Extensions (Intel® AVX)-2 optimizations for the AES-GCM algorithm and Intel® AVX-512 optimizations for the RSA algorithm. Gain fast performance for the image domain with Intel AVX-512 optimizations for RGB to XYZ color conversion and signal processing; and with Intel AVX-512 optimizations for signal processing domain statistical function L2 Norm.  

  • Intel® Distribution of Python – see AI tools section.  

Advanced preview features for technical evaluation: 

  • Intel DPC++/C++ Compiler enables running standard parallel algorithms easily on Intel CPUs and GPUs with Intel® oneAPI DPC++ Library (oneDPL).
  • oneDPL optimizes compute node resource utilization to choose between round robin, load-based and auto-tune policy to schedule work on available compute devices.
  • oneTBB thread composability manager provides greater flexibility and workload performance when nesting oneTBB and OpenMP threads.
  • Intel® MPI Library provides more efficient message passing with MPI RMA (one-sided communications) via CPU and GPU-initiated communications.
  • SYCL graph for reduced GPU offload overhead - Image and media processing engineers can use SYCL image API extension to accelerate bindless images on Nvidia GPUs using the Intel LLVM open source project DPC++ compiler. 

Analysis & Debugger Tools 

  • Intel® Advisor can now profile Python code to understand application performance headroom against hardware limitations with an automated Roofline analysis. Supports FP16 and BF16 extensions and Intel® AMX profiling in 4th gen Intel Xeon Scalable processors. Discover application performance characterization, such as bandwidth sensitivity, instruction mix, and cache-line use, for Intel GPUs, multi-tile architectures, and VNNI and ISA support. 

  • Intel® VTune™ Profiler enables developers to understand cross-GPU traffic and bandwidth through Intel® Xe Link for each node, and if and why implicit USM data movement is causing performance inefficiencies. 

  • Intel® VTune™ Profiler Advanced preview feature for technical evaluation: Profile offloaded code to NPUs (neural processor units) to understand how much data is transferred from NPU to DDR memory, and identify the most time-consuming tasks.  

  • Intel Distribution for GDB* improves debugging by boosting the debugger performance; refines the UI across the command line, Visual Studio*, and Visual Studio Code*; and provides advanced scheduler locking for fine-tuned lock control to efficiently debug applications for Intel CPUs and GPUs. 

AI Tools, Frameworks & Accelerated Python1

  • Intel Distribution for Python supports Intel GPUs and helps developers get more work done with faster performance using standard Python to deploy numeric workloads and optimized data parallel extensions (Python, NumPy*, Numba*) on CPU and GPU systems. 

  • Intel® Extensions for TensorFlow* & Intel® Extensions for PyTorch* deliver significant performance gains with native support for Intel GPUs and CPUs. Advanced optimizations and features provide additional capabilities to process large and complex AI workloads at scale. 

  • Intel® Optimization for XGBoost* provides new optimizations for GPUs, giving a choice of running on all of Intel’s CPUs and GPUs.  

  • Intel®  Extension for Scikit-learn improves handling of sparse data correctly for better analytics and machine learning performance by using new K-means and low-order algorithms. Python developers now have more options for performance gains and opportunities to speed up their use case with two new algorithms in scikit-learn. 

  • Intel® Distribution of Modin accelerates data tasks for AI performance, efficiency and innovation, and supports Intel GPUs. Data management enhancements focus on interactive job performance, benchmark competitiveness and achieve efficient data ingestion.  

  • Intel® Neural Compressor supports Intel GPUs and simplifies optimization across Intel AI execution providers with integrated Neural Network Compression Framework for a seamless user experience and resource savings. Now supports TensorFlow 2.14, PyTorch 2.2, and ONNX-RT 1.15. 

  • For oneDNN, oneCCL and oneDAL – see performance libraries section. 

  • Overall, get AI frameworks and tools faster with a flexible, streamlined process through an AI selector tool for individual downloads and customize pre-set bundles.2

Code Migration & Other Vendor Architecture Support 

Migrations of CUDA code to create portable, equally performant SYCL code that can be used on Intel and other vendors’ GPUs are increasing. Just a few recent successes include: 

New features include: 

  • Intel® DPC++ Compatibility Tool (based on open source SYCLomatic), aids migrating CUDA* code to be equally performant SYCL* code by adding many more CUDA* library APIs and 20 popular applications in AI, deep learning, cryptography, scientific simulation, imaging, and other areas. 

Rendering & Ray Tracing 

More details can be found in release notes.

Download the latest tools or use them in the Intel® Developer Cloud