"Vi hjelper deg å nå dine kompetansemål i 2017"


Parallel and Heterogeneous Computing with Microsoft PPL and AMP in C++

Kursavgift: kr 17 900 | Varighet: 3 dager

Beskrivelse:

Computationally intensive work is often best solved by moving to parallel processing, to take advantage of multiple cores. In addition many developers are starting to look towards using different types of hardware to optimize their software further.

Today many C++ programmers are looking to the Microsoft PPL (Parallel Patterns Library) and Microsoft AMP (Accelerated Massive Parallelism) to help them write parallel algorithms to run on both the CPU and GPU. Microsoft PPL is designed to help developers quickly get their code running on multiple cores, while the AMP libraries are intended to remove much of the complexity of heterogeneous computing (particularly around mixing hardware from different manufacturers). This course investigates both PPL and AMP, however it has a larger focus on the more advanced AMP functionality.

You will come out of “Parallel and Heterogeneous Computing” knowing the following:
  • How to avoid common optimization pitfalls.
  • When to benefit from parallelism.
  • How the underlying hardware contributes to parallelism.
  • How to take advantage of multiple cores with C++ and Microsoft PPL.
  • How to take advantage of the GPU across multiple manufacturers with C++ and Microsoft AMP.
  • How to avoid common parallel and heterogeneous computing pitfalls.
  • How to debug and dig into parallel and heterogeneous programs.

Målgruppe:

  • Experienced C++ Developers.

Forkunnskaper:

  • Knowledge of multi-threaded programming using threads.

Innhold:

Dag 1



Introduction to Parallelism

C++11 Refresher
  • Move semantics
  • Deleted/Defaulted Functions
  • Lambdas
Measuring Performance
  • Types of performance
  • Taking good benchmarks
  • Accounting for error
Hardware

CPU Internals

Instruction level parallelism
  • Understanding Limitations
  • Caching (NUMA)
  • Cost of shared writes
  • Common Pitfalls
Floating Point Numbers
  • 0.1 + 0.2 != 0.3
  • Limitations of floating point
  • Compiler optimizations with floating point
  • Parallel processing and accuracy
Identifying algorithms that are parallelizable
  • CPU bound Vs. IO bound
  • Re-writing for better parallelization.
  • Amdahl’s Law
Parallel patterns and algorithms
  • Map
  • Reduce
  • Scan
  • Pack
  • Command queues
  • Combined examples
Introduction to PPL
  • Why tasks instead of threads?
  • Runtime support
  • Primitives
  • parallel_for
  • non-determinism
Synchronization in PPL
  • critical_section
  • readerwriterlock
  • scopedlock/scopedread
  • event
  • costs of synchronization
  • alternatives
Visual Studio Debug Tools for Concurrency
  • Concurrency Visualizer
  • Parallel Stacks
  • Parallel Tasks
  • Parallel Watch
  • Identifying anti-patterns
  • Debugging PPL Algorithms
  • Event Tracing
Exception Handling
  • Catching task::get/task::wait
  • Exceptions inside parallel_for vs for loops
Similar Technologies to PPL
  • pthread
  • Intel TBB
  • Boost

Dag 2



Introduction to Vector programming
  • What is SIMD
  • Masking and execution coherence
  • Memory Layout
  • Structure of Array
  • Array of Structs
  • Auto vectorization
Introduction to the GPU Hardware
  • Hardware
  • Memory Types and Caching
  • Cores, Threads, Tiles and Warps
  • PCIe Bus
Methods of writing code for the GPU
  • OpenCL
  • CUDA
  • DirectCompute
  • Microsoft C++ AMP
Introduction to AMP
  • AMP Syntax and Data Types
  • array, array_view
  • index
  • extent
  • grid
  • restrict
Parallelforeach
  • How to use
  • Optimizing Memory Move/Copy
Synchronizing memory with accelerators
  • Implicit synchronization
  • synchronize*()
  • data()
  • Lost Exceptions
Concurrency::fastmath and precisemath
  • What’s inside
  • Comparison to “standard” math.
  • Precision
  • Accelerator requirements
  • Example
Debugging with Warp
  • Visual Studio Tools
  • GPU Threads
  • Parallel Stacks
  • Parallel Watch
Floating Point Numbers
  • How they are handled
  • Why they are different from CPU
  • Performance of float/double operations

Dag 3



Tiling
  • Syntax
  • Determining tile size
  • Memory Coalescence
  • Memory Collisions
  • Tile Synchronization
AMP Atomic Operations
  • atomic_exchange()
  • atomic_fetch*()
Parallel patterns with AMP
  • Map
  • Reduce
  • Scan
  • Pack
AMP Accelerators
  • Accelerator properties
  • Shared memory
  • Using multiple accelerators
Concurrency::graphics
  • Exploiting the texture cache
AMP Error Handling
  • Exceptions
  • Detecting/Recovering from TDR










Bestill kurset her

arrow

Velg kurssted

    arrow

    Velg dato


    kurs merket med * har startgaranti

    arrow

    Mailen er sendt:

    Ditt tips er registrert og sendt!
    Vi håper snarlig å se deg på kurs hos oss!

    Feilmelding:

    OBS! Vi har problemer med å sende ditt tips!

    Vi anbefaler deg å sjekke om du har skrevet inn en gyldig mailadresse.

    Tips sjefen

    Lyst til å delta på dette kurset, men må overbevise sjefen først?

    Glasspaper har laget en tips funksjon, som gjør det enklere for deg å overbevise din sjef om at dette kurset er perfekt for deg.
    Det eneste du trenger å gjøre er å fylle ut kontaktinformasjon, så sender vi relevant informasjon om kurset rett til dine utvalgte kontaktpersoner.
    Bruk gjerne funksjonen til å tipse venner og kollegaer om at dette er et nyttig kurs for dem





    Kontakt oss

    Kursansvarlig

    Henning Solberg

    93 09 01 29

    henning@glasspaper.no


    Glasspaper er kåret til Årets Microsoft Kurspartner 2015 - 2014 - 2013 - 2012 - 2011 - 2010 - 2008!