Cloudera Educational Services

Upcoming Sessions

See All Upcoming Sessions

Note: Enrolling here will not give you access to the actual course. The course is available by purchasing the Full OnDemand Library subscription.  About This Course Cloudera Data Science Workbench Training prepares learners to complete data science and machine learning projects using Cloudera Data Science Workbench (CDSW). Course Length This module includes over 6 hours of video content. Audience and Prerequisites This OnDemand course is designed for learners at organizations using CDSW under a trial license or a commercial license. The learner must have access to a CDSW environment on a CDP cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required. Note: Enrolling here will not give you access to the actual course. The course is available by purchasing the Full OnDemand Library subscription.  Read more

About This Course Cloudera Data Science Workbench Training prepares learners to complete data science and machine learning projects using Cloudera Data Science Workbench (CDSW). Course Length This module includes over 4 hours of video content. Exercises are brief and require the learner to have access to a CDSW environment on a CDP cluster running Apache Spark 2.  Audience and Prerequisites This OnDemand course is designed for learners at organizations using CDSW under a trial license or a commercial license. The learner must have access to a CDSW environment on a CDP cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required. Read more

About This Module This module is an introduction to Apache Kafka. For more about Kafka, see Streaming Processing, Management, and Analytics with CDF. This module is included as part of that learning path.  Course Length This module includes 40 minutes of video content. Hands-on exercises will take approximately 15 minutes.  Audience and Prerequisites This module is designed for Data Engineers, Administrators, and others who want to understand stream processing administration, configuration, and applications within CDF.  Though programming experience is not required, code samples are provided in Java, and basic experience with Linux is presumed. Exposure to big data concepts and applications is helpful. Read more

About This Module This module introduces data engineers and data analysts to the Cloudera Data Warehouse (CDW) service. It introduces basic concepts, and then provides a choice to continue with a Data Engineer track, which shows how to create and tune important entities within CDW, or with a Data Analyst track, which shows how to access tables and views using different interface methods. Module Length This module includes 52 minutes of video content. The videos for the Data Analyst track add to 38 minutes; the videos for the Engineer track add to 33 minutes. Note: In order to complete the hands-on exercises for this course, students must have access to CDW through their organization. Audience and Prerequisites This module is designed for data analysts and data engineers. There are no prerequisites, though access to a working Cloudera Data Platform with CDW is required in order to complete the hands-on exercises. Grading for This Module There are seven chapters in this module, but you only need to complete four quizzes with 75% or better to pass. (One chapter does not include a quiz.)  After the first three chapters, you can choose the two Data Engineer track chapters, or the two Data Analyst track chapters. Your reported grade will include the quizzes for chapters you did not take, but the overall passing level has been lowered to 50%. This means you do not need to take the quizzes for the other track. However, you are welcome to complete all seven modules (and take all six quizzes).  Read more

About This Course The Cloudera Operational Database Fundamentals course provides an overview of what an operational database (OpDb) is, the motivation and use cases behind using an OpDb in the enterprise, and how an OpDb fits within the data lifecycle. The course describes the operational database capabilities available in the various form factors, including CDP Private Cloud and CDP Public Cloud, and how to work with an OpDb from the shell, a SQL client, and programmatically in code. Course Length This module includes 1.5 hours of video content. There are no hands-on exercises. Audience and Prerequisites This course is designed for managers, administrators, and developers. We highly recommend that participants be familiar with CDH, HDP, CDP Private Cloud, or CDP Public Cloud to be able to set up an operational database. The CDP Public Cloud Administration and CDP Private Cloud Fundamentals trainings are good starting points to prepare for this training.  Additionally, knowledge of Python is required for using Thrift and knowledge of Java for working with the HBase API. Basic Linux knowledge is required which includes the ability to SSH into a node and starting up a CLI. Knowledge of Apache HBase and Apache Phoenix is not required. Read more

About This Module During this series, Mark Payne, a Principal Software Engineer at Cloudera and co-creator of Apache NiFi, will explain several common ways that people use NiFi incorrectly or inefficiently. After explaining the weaknesses of each approach, Mark then shows how to improve those flows to make better use of NiFi's design and architecture.   Part 1: Flows Overview examines a flow that splits and rejoins data, treats structured/semi-structured data as unstructured text, and blurs the line between FlowFile content and attributes. Part 2: Flow Layout illustrates how a disorganized dataflow can make it difficult to understand and maintain. Mark shares tips for laying out the dataflow to make it clean, simple, and easy for others to follow. Part 3: Load Balancing explains how to make your dataflows more scalable by balancing the load across a cluster of nodes. Mark also references his Cloudera technical blog post that shows how NiFi can process more than one billion events per second. Part 4: Scheduling covers scheduling and concurrency anti-patterns. Mark discusses common problems related to thread pools, scheduling processors, and how to configure settings for best performance. Part 5: Primary Node Only looks at the primary node and how it is sometimes misused. Please note: This course does not award a course completion certificate. Module Length This course includes 1 hour of video content.   Read more

Shopping Cart

Your cart is empty