seminar
Microsoft Azure Databricks

Open seminar (2 days) in Wiesbaden; in-house: tailor-made ! You tell us your topics!

The workshop is aimed at business practitioners who want to use Microsoft Azure Databricks.

Date e 2025 in Wiesbaden: 20/21 March 2025

LEARNING OBJECTIVES AND AGENDA

Learning objectives:

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

Understanding Azure Databricks.
Conduct data analysis.
Use of collaborative notebooks.
Application of Apache Spark.
Data management with Delta Lake.
Creation of data pipelines.
Workload orchestration.

Day 2: Implementing a Machine Learning Solution with Azure Databricks

Understand the fundamentals of machine learning.
Data preparation.
Model training.
Using MLflow.
Hyperparameter optimization.
Automation with AutoML.
Training deep learning models.
Managing machine learning in production.

OPEN or WORKSHOP

Workshop: You tell us your topics!
Duration : one or two days

Price: 1090€ (open, 2 days)
Workshop on request

plus statutory VAT and travel expenses if applicable

All workshop content is individually tailored and taught to specific target groups .

We are happy to conduct the workshop at your location, in Wiesbaden or online.

Rental fees for training notebook (on request): 60,- Euro (per day, per training computer)

BOOK

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

On Day 1, you will learn how to leverage the power of Apache Spark and high-performance clusters running on the Azure Databricks platform to execute large-scale data engineering workloads in the cloud.

Module 1: Exploring Azure Databricks

Azure Databricks is a cloud service that provides a scalable platform for data analytics with Apache Spark.

Introduction
Getting started with Azure Databricks
Identifying Azure Databricks workloads
Understanding key concepts
Data governance with Unity Catalog and Microsoft Purview
Exercise – Explore Azure Databricks

Module 2: Performing Data Analysis with Azure Databricks

Learn how to conduct data analysis with Azure Databricks. Explore different ingestion methods and how to integrate data from sources such as Azure Data Lake and Azure SQL Database. This module guides you through the use of collaborative notebooks for exploratory data analysis (EDA), enabling you to visualize, manipulate, and investigate data to detect patterns, anomalies, and correlations.

Introduction
Ingesting data with Azure Databricks
Data exploration tools in Azure Databricks
Data analysis with DataFrame APIs
Exercise – Explore data with Azure Databricks

Module 3: Using Apache Spark in Azure Databricks

Azure Databricks is built on Apache Spark and allows data engineers and analysts to run Spark jobs to transform, analyze, and visualize data at scale.

Introduction
Getting to know Spark
Creating a Spark cluster
Using Spark in notebooks
Working with data files in Spark
Visualizing data
Exercise – Use Spark in Azure Databricks

Module 4: Managing Data with Delta Lake

Delta Lake is a data management solution in Azure Databricks that offers features such as ACID transactions, schema enforcement, and time travel to ensure data consistency, integrity, and versioning.

Introduction
Getting started with Delta Lake
Managing ACID transactions
Implementing schema enforcement
Data versioning and time travel in Delta Lake
Ensuring data integrity with Delta Lake
Exercise – Use Delta Lake in Azure Databricks

Module 5: Building Data Pipelines with Delta Live Tables

Building data pipelines with Delta Live Tables enables real-time, scalable, and reliable data processing using the advanced features of Delta Lake in Azure Databricks.

Introduction
Exploring Delta Live Tables
Data ingestion and integration
Real-time processing
Exercise – Build a data pipeline with Delta Live Tables

Module 6: Deploying Workloads with Azure Databricks Workflows

Deploying workloads with Azure Databricks Workflows involves orchestrating and automating complex data processing pipelines, machine learning workflows, and analytics tasks. In this module, you will learn how to deploy workloads with Databricks Workflows.

Introduction
What are Azure Databricks Workflows?
Understanding the key components of Azure Databricks Workflows
Exploring the benefits of Azure Databricks Workflows
Deploying workloads with Azure Databricks Workflows
Exercise – Create an Azure Databricks Workflow

Day 2: Implementing a Machine Learning Solution with Azure Databricks

Azure Databricks is a cloud-based platform for data analytics and machine learning. Data scientists and machine learning engineers can use Azure Databricks to implement large-scale machine learning solutions.

Module 1: Training a Machine Learning Model in Azure Databricks

Machine learning involves using data to train a predictive model. Azure Databricks supports multiple common machine learning frameworks that can be used for modeling.

Introduction
Understanding the basic principles of machine learning
Machine learning in Azure Databricks
Preparing data for machine learning
Training a machine learning model
Evaluating a machine learning model
Exercise – Train a machine learning model in Azure Databricks

Module 2: Using MLflow in Azure Databricks

MLflow is an open-source platform for managing the machine learning lifecycle, natively supported in Azure Databricks.

Introduction
Key features of MLflow
Running experiments with MLflow
Registering and deploying models with MLflow
Exercise – Use MLflow in Azure Databricks

Module 3: Optimizing Hyperparameters in Azure Databricks

Hyperparameter optimization is a crucial step in machine learning. In Azure Databricks, you can use the Hyperopt library to automatically tune hyperparameters.

Introduction
Optimizing hyperparameters with Hyperopt
Reviewing Hyperopt trials
Scaling Hyperopt trials
Exercise – Optimize machine learning hyperparameters in Azure Databricks

Module 4: Using AutoML in Azure Databricks

AutoML in Azure Databricks simplifies the process of creating an effective machine learning model for your data.

Introduction
What is AutoML?
Using AutoML in the Azure Databricks UI
Running an AutoML experiment with code
Exercise – Use AutoML in Azure Databricks

Module 5: Training Deep Learning Models in Azure Databricks

Deep learning leverages neural networks to train highly effective models for complex predictions, computer vision, natural language processing, and other AI workloads.

Introduction
Understanding deep learning concepts
Training models with PyTorch
Distributing PyTorch training with TorchDistributor
Exercise – Train deep learning models in Azure Databricks

Module 6: Managing Machine Learning in Production with Azure Databricks

Machine learning enables data-driven decision-making and automation, but deploying models into production for real-time insights can be challenging. Azure Databricks streamlines this process by providing a unified platform to build, train, and deploy machine learning models at scale, supporting collaboration between data scientists and engineers.

Introduction
Automating data changes
Exploring model development
Deployment strategies for models
Model versioning and lifecycle management
Exercise – Manage a machine learning model

CONTENT

In our two-day seminar on the topics "Implementing a Data Lakehouse Analytics Solution with Azure Databricks (Day 1)" and "Implementing a Machine Learning Solution with Azure Databricks (Day 2)", participants will acquire the essential skills to leverage the Azure Databricks platform.

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

Day 1 begins with a comprehensive introduction to Azure Databricks. Participants will explore the fundamental concepts and features of this cloud-based platform that enables the use of Apache Spark for large-scale data analytics. Through hands-on exercises, they will learn how to identify and effectively implement different Azure Databricks workloads. An important aspect of this unit is data governance using Unity Catalog and Microsoft Purview, which are essential for efficient and secure data management.

In the second unit, the focus is on data analysis with Azure Databricks. Participants will learn how to efficiently ingest data and work with Azure Databricks’ integrated data exploration tools. The use of DataFrame APIs to perform complex data analyses will also be covered. Through practical exercises, participants will gain the ability to identify patterns and insights from large datasets, enabling them to make data-driven decisions.

The third unit introduces participants to Apache Spark in more detail. They will learn how to create a Spark cluster and use Spark within notebooks to perform powerful data processing tasks. Working with data files and visualizing data will also be covered, which greatly simplifies data analysis. These skills are highly valuable for data engineers and analysts who need to process and analyze large amounts of data efficiently.

In the fourth unit, the focus shifts to managing and processing data with Delta Lake. Participants will learn how to manage ACID transactions and enforce schema compliance. By understanding data versioning and time travel in Delta Lake, they will gain valuable insights into ensuring data integrity and consistency. Practical exercises will allow participants to apply the theory and effectively utilize the functionalities of Delta Lake.

The fifth unit emphasizes the creation of data pipelines using Delta Live Tables. Participants will explore the advantages and capabilities of this technology to enable real-time processing and integration. Hands-on exercises will help participants develop robust and scalable data pipelines tailored to their specific use cases.

Finally, in the sixth unit of Day 1, participants will learn how to use Azure Databricks Workflows to deploy workloads. They will understand how to orchestrate and automate complex data processing pipelines. By exploring the key components and benefits of Azure Databricks Workflows, participants will be able to effectively implement and manage their data-driven applications.

Learning Objectives Day 1:

Understand Azure Databricks: Gain knowledge about the features and capabilities of Azure Databricks as a cloud-based data analytics service.
Perform data analysis: Learn how to carry out data analyses with Azure Databricks, including the integration of data from sources such as Azure Data Lake and Azure SQL Database.
Use collaborative notebooks: Learn how to use collaborative notebooks to perform exploratory data analysis (EDA) and visualize data.
Apply Apache Spark: Acquire practical knowledge in using Apache Spark within the Azure Databricks platform to process and analyze large datasets.
Manage data with Delta Lake: Understand Delta Lake functionalities, including ACID transactions and schema enforcement, to ensure data consistency and integrity.
Build data pipelines: Gain the ability to develop and implement data pipelines with Delta Live Tables for real-time data processing.
Orchestrate workloads: Acquire knowledge in deploying and automating complex workloads with Azure Databricks Workflows.

Day 2: Implementing a Machine Learning Solution with Azure Databricks

On Day 2 of the seminar "Implementing a Machine Learning Solution with Azure Databricks", participants will learn how to leverage the powerful Azure Databricks platform for developing and deploying machine learning solutions. Azure Databricks provides a cloud-based environment specifically designed for data analytics and machine learning. Participants will acquire knowledge about the integration of Apache Spark and will be able to use both online and offline large language models (LLMs) to develop scalable and effective AI applications.

This part of the seminar focuses on the fundamentals of machine learning in Azure Databricks. Participants will learn how to prepare data for machine learning projects, train models, and evaluate them to generate reliable predictions. MLflow will also be introduced as an open-source platform for managing the machine learning lifecycle, which is natively supported in Azure Databricks. Hands-on exercises will allow participants to strengthen their skills in using MLflow.

Participants will further deepen their knowledge in advanced topics such as hyperparameter optimization with Hyperopt and the use of AutoML. These techniques are critical for improving the efficiency and accuracy of machine learning models. In addition, they will learn how to train and distribute deep learning models with PyTorch to tackle complex tasks in fields such as computer vision and natural language processing.

Another key aspect of the seminar is managing machine learning in production with Azure Databricks. Participants will learn how to automate data changes, develop models, and implement appropriate strategies for model deployment and versioning. Through practical exercises and projects, they will have the opportunity to apply their knowledge to realistic scenarios.

This seminar provides a comprehensive introduction to implementing machine learning solutions with Azure Databricks and is ideal for professionals who want to expand their expertise in data science and artificial intelligence.

Learning Objectives Day 2:

Understand the basics of machine learning: Acquire knowledge of the core principles of machine learning and their application in Azure Databricks.
Prepare data: Develop skills in preparing data for machine learning projects in Azure Databricks.
Train models: Learn how to train and evaluate machine learning models in Azure Databricks.
Use MLflow: Gain familiarity with MLflow functionalities for managing the machine learning lifecycle, including running experiments and registering models.
Optimize hyperparameters: Learn how to optimize hyperparameters with Hyperopt and evaluate the results.
Automate with AutoML: Understand AutoML and learn how to effectively use it in Azure Databricks, both through the user interface and with code.
Train deep learning models: Learn how to train and distribute deep learning models with PyTorch in Azure Databricks.
Manage machine learning in production: Gain knowledge of automating data changes, model development, deployment strategies, and lifecycle management.

seminar Microsoft Azure Databricks

Module 1: Exploring Azure Databricks

Module 2: Performing Data Analysis with Azure Databricks

Module 3: Using Apache Spark in Azure Databricks

​

Module 4: Managing Data with Delta Lake

Module 5: Building Data Pipelines with Delta Live Tables

Module 6: Deploying Workloads with Azure Databricks Workflows

Module 1: Training a Machine Learning Model in Azure Databricks

Module 2: Using MLflow in Azure Databricks

Module 3: Optimizing Hyperparameters in Azure Databricks

Module 4: Using AutoML in Azure Databricks

Module 5: Training Deep Learning Models in Azure Databricks

Module 6: Managing Machine Learning in Production with Azure Databricks

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

​

Day 2: Implementing a Machine Learning Solution with Azure Databricks

​

seminar
Microsoft Azure Databricks