Big data made easy with azure databricks
Big Data processing is being democratised. Tools such as Azure Databricks, mean you do not need to be a Java expert to be a Big Data Engineer anymore. Databricks has made your life much easier! While it is easier, there is still a lot to learn and knowing where to start can be quite daunting.
Too often training courses are academic, teaching theory and not application. We have created an applied Azure Databricks course. We have built this course based on demand and real-world problems faced by our customers. It will teach you how to implement different scenarios in Databricks, but most importantly it will tell you why, when to implement and when not to implement.
This course is designed to take a data professional from Zero to Hero in just 3 days. You will leave this course with all the skills you need to get started on your Big Data Journey. You will learn by experimentation, this is a lab heavy training session. If you are starting a new project and want to know if Databricks is suitable for your problem, then we also offer tailored training around your problem domain.
The course will be delivered by Terry McCann, Microsoft MVP. Terry is recognised for his ability to convert deep technical material in to bite sized understandable chunks.
Cost
On-site delivery: £1,650 per delegate (minimum 8 delegates)
Tailored course: POA
Agenda
(A Full agenda is available upon request) - This course is evolving with the upcoming release of Spark 3.0
Introduction
General introduction
- Engineering Vs Data Science
Intro to Big Data Processing
- Introduction to Big Data Processing - why we do what we do.
- Introduce you to the skills required
- Introduction to Spark
- Introduce Azure Databricks
Exploring Azure Databricks
- Getting set up
- Exploring Azure Databricks
The languages
- The languages (Scala/Python/R/Java)
- Introduction to Scala
- Introduction to PySpark
- PySpark deep dive
- Working with the additional Spark APIs
Data Engineering
Managing Databricks
- Managing Secrets
- Orchestrating Pipelines
- Troubleshooting Query Performance
- Source Controlling Notebooks
- Cluster Sizing
- Installing packages on our cluster / All clusters
Data Engineering
- Cloud ETL Patterns
- Design patterns
- Loading Data
- Schema Management
- Transforming Data
- Storing Data
- Managing Lakes
Data Factory Data Flows
- Creating Data Flows
- Execution Comparison
Data Science
Data Science
- Introduction to Data Science
- Batch Machine Learning vs Interactive.
- Python for Machine learning
- How to train a model
- Enrich our existing data - Batch machine learning
Spark ML
- What is SparkML
- SparkML components
- Creating a regression model in SparkML
- Creating a classification model in SparkML
- Tuning models at scale
- Persisting models & retraining
- Model deployment scenarios
Databricks Delta
Databricks Delta Tables
- Introduction to Delta, What is is how it works
- Datalake management
- Problems with Hadoop based lakes
- Creating a Delta Table
- The Transaction Log
- Managing Schema change
- Time travelling
Bring it all back together
- How this all fits in to a wider architecture.
- Projects we have worked on.
- Managing Databricks in production
- Deploying with Azure DevOps
Labs
- Getting set up (Building a new instance, getting connected, creating a cluster)
- Creating all the required assets.
- Running a notebooks
- An introduction to the key packages we will be working with.
- Cleaning data
- Transforming data
- Creating a notebook to move data from blob and clean it up.
- Scheduling a notebook to run with Azure Data Factory
- Creating a streaming application
- Creating a Machine learning model
- Deploying a machine learning model
- Reading a stream and enriching our stream
- Databricks Delta
If you would like to enquire about training then please complete the form below and we will get back to you.