Cloudera Educational Services

Upcoming Sessions

See All Upcoming Sessions

Designing Edge to AI Applications is a 4-day learning event that addresses advanced big data architecture topics for building edge to AI applications to cover streaming, operational data processing, analytics, and machine learning.  The workshop brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system. Throughout the highly interactive workshop, participants apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for participants to learn techniques for architecting big data systems, not only from Cloudera’s experience but also from the experiences of fellow participants.  More specifically, this workshop addresses advanced big data architecture topics, including, data formats, transformation, transactions, real-time, batch and machine learning processing, scalability, fault tolerance, security, and privacy, minimizing the risk of an unsound architecture and technology selection. What you'll learn Cloudera Data Platform Big Data Architecture Building Scalable applications Building Fault Tolerant Solutions Security and Privacy Deployment on Public, Private, and Hybrid Cloud What to expect Participants should mainly be architects, developer team leads, big data developers, data engineers, senior analysts, dev ops admins and machine learning developers who are working on big data or streaming applications and have an interest in how to design and develop such applications on CDP. To gain the most from the workshop, participants should have working knowledge of popular Big Data and streaming technologies such as HDFS, Spark, Kafka, Hive/Impala, Data Formats, and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities and instead the focus will be on architecture design. The workshop will be divided into small groups to discuss the problems, develop solutions, and present their solutions.   DATE: September 28 - October 1, 2026 9:00 - 17:00 (GMT+2 TIMEZONE) Virtual Classroom, EMEA Read more

Apache Ozone is the next-generation hybrid storage service offering versatility and out-of-the-box compatibility. Ozone is an object storage format exceeding the limitations of HDFS. This course teaches architecture, internal operations, installation, file system usage, best practices, security, maintenance, monitoring, tuning and testing. Download full course description  What you'll learn This course teaches the Ozone internal architecture and how to install, use, maintain, monitor, tune, integrate, and test the the Ozone service in a secure environment. Participants will gain the following skills: Understanding the Benefits of Using Ozone Installing and Configuring Secure Ozone Managing Files and Objects in Ozone Performance Tuning and Doing Baseline Tests Controlling Replication and Understanding Failover and Recovery Performing Maintenance Tasks  Monitoring Ozone Using Recon Service Integrating Hive, Impala, Spark, Nifi, and Flink with Ozone Migrating Data from HDFS to Ozone What to expect This advanced course is for administrators who are currently using CDP Private Cloud Base. The course will appeal to technicians, such as data engineers and applications developers, who are migrating data and applications to Apache Ozone. Prior experience of Cloudera Data Platform, to include HDFS, YARN, and Hive, is expected. Students must have access to the Internet to reach the classroom environments, which are located on Amazon Web Services.   DATE: June 29 - July 2, 2026 9:00 - 17:00 (GMT+2 TIMEZONE) Virtual Classroom, EMEA Read more

One of the most critical functions of a data-driven enterprise is the ability to manage ingest and data flow across complex ecosystems.  Does your team have the tools and skill sets to succeed at this? Apache NiFi and this four-day course provides the fundamental concepts and experience necessary to automate the ingress, flow, transformation, and egress of data using NiFi. The course also covers tuning, troubleshooting, and monitoring the dataflow process as well as how to integrate a dataflow within the Cloudera CDP Hybrid ecosystem and external systems. Download full course description  What you'll learn During this course, you learn how to:  Define, configure, organize, and manage dataflows  Transform and trace data as it flows to its destination  Track changes to dataflows with NiFi Registry  Use the NiFi Expression Language to control dataflows  Optimize dataflows for better performance and maintainability Connect dataflows with other systems, such as Apache Kafka, Apache Hive, and HDFS Utilize the Data Flow Service What to expect This course is designed for developers, data engineers, administrators, and others with an interest in learning NiFi’s innovative no-code, graphical approach to data ingest. Although programming experience is not required, basic experience with Linux is presumed, and previous exposure to big data concepts and applications is helpful.   DATE: July 27-30, 2026 9:00 - 17:00 (GMT+2 TIMEZONE) Virtual Classroom, EMEA Read more

Overview This three-day hands-on training course delivers the key concepts and expertise developers need to optimize the performance of their Apache Spark applications. During the course, participants will learn how to identify common sources of poor performance in Spark applications, techniques for avoiding or solving them, and best practices for Spark application monitoring. Optimizing Apache Spark Applications presents the architecture and concepts behind Apache Spark and underlying data platform, then builds on this foundational understanding by teaching students how to tune Spark application code. The course format emphasizes instructor-led demonstrations illustrate both performance issues and the techniques that address them, followed by hands-on exercises that give students an opportunity to practice what they've learned through an interactive notebook environment. Download full course description What You'll Learn Students who successfully complete this course will be able to: Understand Apache Spark's architecture, job execution, and how techniques such as lazy execution and pipelining can improve runtime performance Evaluate the performance characteristics of core data structures such as RDD and DataFrames Select the file formats that will provide the best performance for your application Identify and resolve performance problems caused by data skew Use partitioning, bucketing, and join optimizations to improve SparkSQL performance Understand the performance overhead of Python-based RDDs, DataFrames, and user-defined functions Take advantage of caching for better application performance Understand how the Catalyst and Tungsten optimizers work Understand how Workload XM can help troubleshoot and proactively monitor Spark applications performance Learn how the Adaptive Query Execution engine improves performance What to Expect This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This is not an introduction to Spark. Spark examples and hands-on exercises are presented in Python and the ability to program in this language is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.   DATE: August 3-5, 2026 9:00 - 17:00 (GMT+2 TIMEZONE) Virtual Classroom, EMEA Read more

About This Training Generative AI (GenAI) and Large Language Models (LLMs) are extremely powerful new tools that are changing every industry. To fully take advantage of GenAI and LLMs, these new capabilities need to be combined with your existing enterprise data. This two-day course teaches how to use Cloudera AI to train, augment, fine tune, and host LLMs to create powerful enterprise AI solutions. What Skills You Will Gain Through lecture and Hands-On exercises, you will learn how to: Select the right LLM model for a use case Configure a Prompt for an LLM Use Retrieval Augmented Generation (RAG) Fine Tune an LLM Model with Enterprise Data Use the AI Model Registry and host an LLM Create an AI Agent with Crew AI Who Should Take This Course This course is designed for data scientists and machine learning engineers who need to understand how to utilize Cloudera AI to leverage the full power of their enterprise data, generative AI, and Large Language Models and deliver powerful business solutions.   DATE: August 24-25, 2026 9:00 - 17:00 (GMT+2 TIMEZONE) Virtual Classroom, EMEA Read more

This four-day Analyzing with Data Warehouse course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages. Download full course description What you'll learn Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the ecosystem, learning how to: Use Apache Hive and Apache Impala to access data through queries Identify distinctions between Hive and Impala, such as differences in syntax, data formats, and supported features Write and execute queries that use functions, aggregate functions, and subqueries Use joins and unions to combine datasets Create, modify, and delete tables, views, and databases Load data into tables and store query results Select file formats and develop partitioning schemes for better performance Use analytic and windowing functions to gain insight into their data Store and query complex or nested data structures Process and analyze semi-structured and unstructured data Optimize and extend the capabilities of Hive and Impala Determine whether Hive, Impala, an RDBMS, or a mix of these is the best choice for a given task Utilize the benefits of CDP Public Cloud Data Warehouse   What to expect This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.     DATE: August 17-20, 2026 9:00 - 17:00 (GMT+2 TIMEZONE) Virtual Classroom, EMEA Read more

Shopping Cart

Your cart is empty