Big data made easy with azure databricks

Big Data processing is being democratised. Tools such as Azure Databricks, mean you do not need to be a Java expert to be a Big Data Engineer anymore. Databricks has made your life much easier! While it is easier, there is still a lot to learn and knowing where to start can be quite daunting.

Too often training courses are academic, teaching theory and not application. We have created an applied Azure Databricks course. We have built this course based on demand and real-world problems faced by our customers. It will teach you how to implement different scenarios in Databricks, but most importantly it will tell you why, when to implement and when not to implement.

This course is designed to take a data professional from Zero to Hero in just 3 days. You will leave this course with all the skills you need to get started on your Big Data Journey. You will learn by experimentation, this is a lab heavy training session. If you are starting a new project and want to know if Databricks is suitable for your problem, then we also offer tailored training around your problem domain.

The course will be delivered by Terry McCann, Microsoft MVP. Terry is recognised for his ability to convert deep technical material in to bite sized understandable chunks.

Cost

On-site delivery: £1,650 per delegate (minimum 8 delegates)
Tailored course: POA

“Very good and understandable explanations of quite complex stuff. A very good kickoff to get started with Databricks. Great combination of theory and real life experiences.”

— Previous course attendee

Agenda

(A Full agenda is available upon request) - This course is evolving with the upcoming release of Spark 3.0

Introduction

General introduction

Engineering Vs Data Science

Intro to Big Data Processing

Introduction to Big Data Processing - why we do what we do.
Introduce you to the skills required
Introduction to Spark
Introduce Azure Databricks

Exploring Azure Databricks

Getting set up
Exploring Azure Databricks

The languages

The languages (Scala/Python/R/Java)
Introduction to Scala
Introduction to PySpark
PySpark deep dive
Working with the additional Spark APIs

Data Engineering

Managing Databricks

Managing Secrets
Orchestrating Pipelines
Troubleshooting Query Performance
Source Controlling Notebooks
Cluster Sizing
Installing packages on our cluster / All clusters

Data Engineering

Cloud ETL Patterns
Design patterns

Loading Data
Schema Management
Transforming Data
Storing Data
Managing Lakes

Data Factory Data Flows

Creating Data Flows
Execution Comparison

Data Science

Introduction to Data Science
Batch Machine Learning vs Interactive.
Python for Machine learning
How to train a model
Enrich our existing data - Batch machine learning

Spark ML

What is SparkML
SparkML components
Creating a regression model in SparkML
Creating a classification model in SparkML
Tuning models at scale
Persisting models & retraining
Model deployment scenarios

Databricks Delta

Databricks Delta Tables

Introduction to Delta, What is is how it works
Datalake management
Problems with Hadoop based lakes
Creating a Delta Table
The Transaction Log
Managing Schema change
Time travelling

Bring it all back together

How this all fits in to a wider architecture.
Projects we have worked on.
Managing Databricks in production
Deploying with Azure DevOps

Labs

Getting set up (Building a new instance, getting connected, creating a cluster)
Creating all the required assets.
Running a notebooks
An introduction to the key packages we will be working with.
Cleaning data
Transforming data
Creating a notebook to move data from blob and clean it up.
Scheduling a notebook to run with Azure Data Factory
Creating a streaming application
Creating a Machine learning model
Deploying a machine learning model
Reading a stream and enriching our stream
Databricks Delta

If you would like to enquire about training then please complete the form below and we will get back to you.

Advanced Databricks Training

Big data made easy with azure databricks

Cost

Agenda

Introduction

General introduction

Intro to Big Data Processing

Exploring Azure Databricks

The languages

Data Engineering

Managing Databricks

Data Engineering

Data Factory Data Flows

Data Science

Data Science

Spark ML

Databricks Delta

Databricks Delta Tables

Bring it all back together

Labs