Cloudera Educational Services

Upcoming Sessions

See All Upcoming Sessions

Overview This three-day hands-on training course delivers the key concepts and expertise developers need to improve the performance of their Apache Spark applications. During the course, participants will learn how to identify common sources of poor performance in Spark applications, techniques for avoiding or solving them, and best practices for Spark application monitoring. Apache Spark Application Performance Tuning presents the architecture and concepts behind Apache Spark and underlying data platform, then builds on this foundational understanding by teaching students how to tune Spark application code. The course format emphasizes instructor-led demonstrations illustrate both performance issues and the techniques that address them, followed by hands-on exercises that give students an opportunity to practice what they’ve learned through an interactive notebook environment. The course applies to Spark 2.4, but also introduces the Spark 3.0 Adaptive Query Execution framework. Download full course description What You'll Learn Students who successfully complete this course will be able to: Understand Apache Spark’s architecture, job execution, and how techniques such as lazy execution and pipelining can improve runtime performance Evaluate the performance characteristics of core data structures such as RDD and DataFrames Select the file formats that will provide the best performance for your application Identify and resolve performance problems caused by data skew Use partitioning, bucketing, and join optimizations to improve SparkSQL performance Understand the performance overhead of Python-based RDDs, DataFrames, and user-defined functions Take advantage of caching for better application performance Understand how the Catalyst and Tungsten optimizers work Understand how Workload XM can help troubleshoot and proactively monitor Spark applications performance Learn about the new features in Spark 3.0 and specifically how the Adaptive Query Execution engine improves performance What to Expect This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This is not an introduction to Spark. Spark examples and hands-on exercises are presented in Python and the ability to program in this language is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.   Read more

This four-day hands-on training course delivers the key concepts and knowledge developers need to use Apache Spark to develop high-performance, parallel applications on the Cloudera Data Platform (CDP).  Hands-on exercises allow students to practice writing Spark applications that integrate with CDP core components. Participants will learn how to use Spark SQL to query structured data, how to use Hive features to ingest and denormalize data, and how to work with “big data” stored in a distributed file system. After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries. Download full course description  What you'll learn During this course, you will learn how to: Distribute, store, and process data in a CDP cluster Write, configure, and deploy Apache Spark applications Use the Spark interpreters and Spark applications to explore, process, and analyze distributed data Query data using Spark SQL, DataFrames, and Hive tables Deploy a Spark application on the Data Engineering Service What to expect This course is designed for developers and data engineers. All students are expected to have basic Linux experience, and basic proficiency with either Python or Scala programming languages. Basic knowledge of SQL is helpful.  Prior knowledge of Spark and Hadoop is not required. 2024-05-7 Virtual Classroom 9:00 - 17:00 (GMT+1) Read more

This four-day course teaches the architecture, deployment, configuration, and running of CDP Data Services on Embedded Containerized Services (ECS). CDP Data Services are state-of-the-art low code computing fusing together the entire data lifecycle into a single set of tools, reducing the costs of developing Use Cases while accelerating development and deployment. The course begins with practices recommended for managing Docker images and containers resulting in the building of a Docker private registry. The Docker private registry is used to deploy the Data Services cluster on ECS. Students will learn to install, configure, and validate Cloudera Data Engineering, Cloudera Data Warehouse, and Cloudera Machine Learning. Exercises focus on learning Kubernetes, installing Private Cloud Embedded Container Service (ECS), and deploying Cloudera Data Services. The course includes requirements for networking and hardware, and explanations of Kubernetes pods dynamically scaling to support CDP Data Services. Download full course description Who should take this course? This immersion course is intended for CDP Administrators who are advancing into CDP Data Services running in a private cloud environment. We recommend a minimum of 3 to 5 years of system administration experience in industry. Students must have proficiency in Linux Command Line Interface, knowledge of Identity Management, Transport Layer Security, and Kerberos. Experience with SQL select statements is helpful. Prior experience with Cloudera products is expected, experience with CDP, CDH, or HDP is sufficient. Students must have access to the Internet to reach Amazon Web Services. Read more

About This Course This course is part of the Skillup series. As businesses look to scale-out storage, they need a storage layer that is performant, reliable, and scalable. With Apache Ozone on the Cloudera Data Platform (CDP), your customers can implement a scale-out model and build out their next generation storage architecture without sacrificing security, governance, and lineage. In part one of this course, you'll learn about the benefits of Apache Ozone, a high performance object store, and how, paired with CDP, it can drastically improve data storage performance. You'll then learn the key business and operational benefits of adopting Apache Ozone and how to migrate existing big data workloads to perform at scale. In the second part of this course, you'll learn about advanced details of Apache Ozone, its roadmap at Cloudera, and the migration of data from a file system to this Big Data Object Store. This course consists of 35 minutes of video content. Audience and Prerequisites This OnDemand course is suitable for data engineers, data administrators, and data operators. Read more

DO NOT START THIS CERTIFICATION EXAM HERE! Once you have been enrolled, you will receive an email with additional instructions to schedule your exam. Read more

DO NOT START THIS CERTIFICATION EXAM HERE! Once you have been enrolled, you will receive an email with additional instructions to schedule your exam. Read more

Shopping Cart

Your cart is empty