Upcoming Sessions
-
October
13
ILT - DSCI-272: Predicting with MLOps on Cloudera AI - 4677380
Starting:2025/10/13 @ 09:00 AM BerlinEnding:2025/10/16 @ 05:00 PM Berlin -
October
27
ILT - DANA-262: Analyzing with Cloudera Data Warehouse - 4676272
Starting:2025/10/27 @ 09:00 AM BerlinEnding:2025/10/30 @ 05:00 PM Berlin
See All Upcoming Sessions

Overview This three-day hands-on training course delivers the key concepts and expertise developers need to optimize the performance of their Apache Spark applications. During the course, participants will learn how to identify common sources of poor performance in Spark applications, techniques for avoiding or solving them, and best practices for Spark application monitoring. Optimizing Apache Spark Applications presents the architecture and concepts behind Apache Spark and underlying data platform, then builds on this foundational understanding by teaching students how to tune Spark application code. The course format emphasizes instructor-led demonstrations illustrate both performance issues and the techniques that address them, followed by hands-on exercises that give students an opportunity to practice what they've learned through an interactive notebook environment. Download full course description What You'll Learn Students who successfully complete this course will be able to: Understand Apache Spark's architecture, job execution, and how techniques such as lazy execution and pipelining can improve runtime performance Evaluate the performance characteristics of core data structures such as RDD and DataFrames Select the file formats that will provide the best performance for your application Identify and resolve performance problems caused by data skew Use partitioning, bucketing, and join optimizations to improve SparkSQL performance Understand the performance overhead of Python-based RDDs, DataFrames, and user-defined functions Take advantage of caching for better application performance Understand how the Catalyst and Tungsten optimizers work Understand how Workload XM can help troubleshoot and proactively monitor Spark applications performance Learn how the Adaptive Query Execution engine improves performance What to Expect This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This is not an introduction to Spark. Spark examples and hands-on exercises are presented in Python and the ability to program in this language is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful. DATE: December 8-10, 2025 Virtual Classroom, EMEA 9:00 - 17:00 (CET TIMEZONE) Read more

Overview The Cloudera platform is intended to meet the most demanding technical audit standards. The significant improvements in Cloudera architecture and components make Cloudera “Secure by Design.” This four-day hands-on course is presented as a project plan for Cloudera administrators to build fully secured Cloudera clusters. The course begins with implementing Perimeter Security by installing host level security and Kerberos. Next, students protect Data by implementing Transport Layer Security using Auto-TLS and data encryption using Key Management System and Key Trustee Server (KMS/KTS). Following this, in the third stage, students control access for users and to data using Apache Ranger and Apache Atlas. The fourth stage focuses on visibility practices, teaching students how to audit systems, users, and data usage. Finally, the course introduces Cloudera practices for Risk Management in a fully secured Cloudera platform. This course is 60% exercise and 40% lecture. Who should take this course? This immersion course is designed for Linux Administrators transitioning to Cloudera Administrator roles. Students must have proficiency in Linux (e.g., navigating the file system, using basic commands) and Linux text editors (e.g., vi, nano). Familiarity with Directory Services, Transport Layer Security, Kerberos, and SQL select statements is recommended. Prior experience with Cloudera products is required. Students must have reliable internet access to connect to the classroom environments hosted on Amazon Web Services. DATE: December 8-11, 2025 Virtual Classroom, EMEA 9:00 - 17:00 (CET TIMEZONE) Read more

One of the most critical functions of a data-driven enterprise is the ability to manage ingest and data flow across complex ecosystems. Does your team have the tools and skill sets to succeed at this? Apache NiFi and this four-day course provides the fundamental concepts and experience necessary to automate the ingress, flow, transformation, and egress of data using NiFi. The course also covers tuning, troubleshooting, and monitoring the dataflow process as well as how to integrate a dataflow within the Cloudera CDP Hybrid ecosystem and external systems. Download full course description What you'll learn During this course, you learn how to: Define, configure, organize, and manage dataflows Transform and trace data as it flows to its destination Track changes to dataflows with NiFi Registry Use the NiFi Expression Language to control dataflows Optimize dataflows for better performance and maintainability Connect dataflows with other systems, such as Apache Kafka, Apache Hive, and HDFS Utilize the Data Flow Service What to expect This course is designed for developers, data engineers, administrators, and others with an interest in learning NiFi’s innovative no-code, graphical approach to data ingest. Although programming experience is not required, basic experience with Linux is presumed, and previous exposure to big data concepts and applications is helpful. December 1-4, 2025 Virtual Classroom, EMEA 9:00 - 17:00 (CET TIMEZONE) Read more

This four-day hands-on training course delivers the key concepts and knowledge developers need to use Apache Spark to develop high-performance, parallel applications on the Cloudera Data Platform. Hands-on exercises allow students to practice writing Spark applications that integrate with Cloudera Data Platform core components. Participants will learn how to use Spark SQL to query structured data, how to use Hive features to ingest and denormalize data, and how to work with “big data” stored in a distributed file system. After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries. Download full course description What you'll learn During this course, you will learn how to: Distribute, store, and process data in a cluster Write, configure, and deploy Apache Spark applications Use the Spark interpreters and Spark applications to explore, process, and analyze distributed data Query data using Spark SQL, DataFrames, and Hive tables Deploy a Spark application on the Data Engineering Service What to expect This course is designed for developers and data engineers. All students are expected to have basic Linux experience, and basic proficiency with either Python or Scala programming languages. Basic knowledge of SQL is helpful. Prior knowledge of Spark and Hadoop is not required. DATE: November 17-20, 2025 Virtual Classroom, EMEA 9:00 - 17:00 (CET TIMEZONE) Read more

DESCRIPTION DATE: November 17-20, 2025 Virtual Classroom, AMER 9:00 - 17:00 (Central US TIMEZONE) Read more

About This Training This 1-day course by Cloudera Education introduces Cloudera Observability, a service designed to help you interactively understand your environment, data services, workloads, clusters, and resources. Its wide range of metrics and health tests empowers you to identify and troubleshoot both existing and potential problems. Cloudera Observability also provides prescriptive guidance and recommendations, enabling you to quickly address issues and optimize solutions. After a workload is completed, the Cloudera Manager Management Service's Telemetry Publisher collects diagnostic information about the job or query and the processing cluster, sending it to Cloudera Observability for analysis. Download full course description What Skills You Will Gain Participants will develop the following skills: Comprehend the benefits of Cloudera Observability. Configure Cloudera Observability for CDP Public Cloud. Understand Cloudera Observability deployment architecture for both CDP Public and Private Cloud environments. Determine system requirements for workload clusters. Install Cloudera Observability. Classify workloads for analysis using Workload Views. Manage user access to workloads. Configure cost center criteria within Cloudera Observability. Set up action alerts for jobs and queries using Auto Actions. Utilize Cloudera Observability Metastore Analytics. Troubleshoot effectively with Cloudera Observability. What to Expect This course is ideal for new and existing Cloudera Private or Public Cloud users seeking to leverage the benefits of Cloudera Observability. The course is particularly valuable for platform administrators, data practitioners, budget owners/controllers, and solutions architects. A general knowledge of monitoring concepts is helpful. Course Details Introduction What is Cloudera Observability Cloudera Observability Capabilities Cloudera Observability Tech Cloudera Observability Online Cloudera Observability Air-Gapped Strategic Features Observability Essential Adoption Configuration Tasks for CDP Public Cloud Deployment Architecture for CDP Private Cloud Security Considerations System Requirements Network Port Requirements Configuring Telemetry Publisher Redacting Data Adding a Proxy Server Observability Installation Managing Workloads and Users (Self-Service Analytics) Auto-Generating Workload Views Manually Generating Workload Views Managing User Access to Workloads Observability Access Roles Exercise: Creating Workload Views Exercise: Assigning Access Roles in Cloudera Observability Working with Alerts, Costs, & Reports (Financial Governance) Analyzing Environment Costs with Cloudera Observability Triggering Action Alerts Across Jobs and Queries – Auto Actions Working with Cluster Reports Exercise: Configuring Cloudera Observability Cost Center Criteria Exercise: Creating a Cloudera Observability Cost Center Exercise: Displaying Costs Associated with a Cost Center Exercise: Creating an Auto Action Event Understanding, Identifying, & Addressing Problems with Cloudera Observability (Service Health Monitoring) Analyzing Tables with Cloudera Observability Metastore Analytics Validations in Cloudera Observability Analyzing Hive Queries Exercise: Analyzing Hive Queries Exercise: Analyzing Impala Queries Troubleshooting (Expedited Issue Resolution) Troubleshooting Abnormal Job Durations Troubleshooting Job Durations: Task Duration Troubleshooting Failed Jobs Troubleshooting with the Job Comparison Feature Exercise: Analyzing Spark Jobs Exercise: Analyzing MapReduce Jobs November 10, 2025 Virtual Classroom, EMEA 9:00 - 17:00 (CET TIMEZONE) Read more
Shopping Cart
Your cart is empty