top of page
  • Xing

seminar
Microsoft Azure Databricks

Open seminar (2 days) in Wiesbaden; in-house: tailor-made ! You tell us your topics!

The workshop is aimed at business practitioners who want to use Microsoft Azure Databricks.

Date e 2025 in Wiesbaden: 20/21 March 2025

LEARNING OBJECTIVES AND AGENDA

Learning objectives:

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

  • Understanding Azure Databricks.

  • Conduct data analysis.

  • Use of collaborative notebooks.

  • Application of Apache Spark.

  • Data management with Delta Lake.

  • Creation of data pipelines.

  • Workload orchestration.

Day 2: Implementing a Machine Learning Solution with Azure Databricks

  • Understand the fundamentals of machine learning.

  • Data preparation.

  • Model training.

  • Using MLflow.

  • Hyperparameter optimization.

  • Automation with AutoML.

  • Training deep learning models.

  • Managing machine learning in production.

OPEN or WORKSHOP

Workshop: You tell us your topics!
Duration : one or two days

Price: 1090€ (open, 2 days)
Workshop on request

plus statutory VAT and travel expenses if applicable

All workshop content is individually tailored and taught to specific target groups .

We are happy to conduct the workshop at your location, in Wiesbaden or online.

Rental fees for training notebook (on request): 60,- Euro (per day, per training computer)

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

On Day 1, you will learn how to leverage the power of Apache Spark and high-performance clusters running on the Azure Databricks platform to execute large-scale data engineering workloads in the cloud.

​

Module 1: Exploring Azure Databricks

Azure Databricks is a cloud service that provides a scalable platform for data analytics with Apache Spark.

  • Introduction

  • Getting started with Azure Databricks

  • Identifying Azure Databricks workloads

  • Understanding key concepts

  • Data governance with Unity Catalog and Microsoft Purview

  • Exercise – Explore Azure Databricks

Module 2: Performing Data Analysis with Azure Databricks

Learn how to conduct data analysis with Azure Databricks. Explore different ingestion methods and how to integrate data from sources such as Azure Data Lake and Azure SQL Database. This module guides you through the use of collaborative notebooks for exploratory data analysis (EDA), enabling you to visualize, manipulate, and investigate data to detect patterns, anomalies, and correlations.

  • Introduction

  • Ingesting data with Azure Databricks

  • Data exploration tools in Azure Databricks

  • Data analysis with DataFrame APIs

  • Exercise – Explore data with Azure Databricks

Module 3: Using Apache Spark in Azure Databricks

​

Azure Databricks is built on Apache Spark and allows data engineers and analysts to run Spark jobs to transform, analyze, and visualize data at scale.

  • Introduction

  • Getting to know Spark

  • Creating a Spark cluster

  • Using Spark in notebooks

  • Working with data files in Spark

  • Visualizing data

  • Exercise – Use Spark in Azure Databricks

Module 4: Managing Data with Delta Lake

Delta Lake is a data management solution in Azure Databricks that offers features such as ACID transactions, schema enforcement, and time travel to ensure data consistency, integrity, and versioning.

  • Introduction

  • Getting started with Delta Lake

  • Managing ACID transactions

  • Implementing schema enforcement

  • Data versioning and time travel in Delta Lake

  • Ensuring data integrity with Delta Lake

  • Exercise – Use Delta Lake in Azure Databricks

Module 5: Building Data Pipelines with Delta Live Tables

Building data pipelines with Delta Live Tables enables real-time, scalable, and reliable data processing using the advanced features of Delta Lake in Azure Databricks.

  • Introduction

  • Exploring Delta Live Tables

  • Data ingestion and integration

  • Real-time processing

  • Exercise – Build a data pipeline with Delta Live Tables

Module 6: Deploying Workloads with Azure Databricks Workflows

Deploying workloads with Azure Databricks Workflows involves orchestrating and automating complex data processing pipelines, machine learning workflows, and analytics tasks. In this module, you will learn how to deploy workloads with Databricks Workflows.

  • Introduction

  • What are Azure Databricks Workflows?

  • Understanding the key components of Azure Databricks Workflows

  • Exploring the benefits of Azure Databricks Workflows

  • Deploying workloads with Azure Databricks Workflows

  • Exercise – Create an Azure Databricks Workflow

Day 2: Implementing a Machine Learning Solution with Azure Databricks

Azure Databricks is a cloud-based platform for data analytics and machine learning. Data scientists and machine learning engineers can use Azure Databricks to implement large-scale machine learning solutions.

​

Module 1: Training a Machine Learning Model in Azure Databricks

Machine learning involves using data to train a predictive model. Azure Databricks supports multiple common machine learning frameworks that can be used for modeling.

  • Introduction

  • Understanding the basic principles of machine learning

  • Machine learning in Azure Databricks

  • Preparing data for machine learning

  • Training a machine learning model

  • Evaluating a machine learning model

  • Exercise – Train a machine learning model in Azure Databricks

Module 2: Using MLflow in Azure Databricks

MLflow is an open-source platform for managing the machine learning lifecycle, natively supported in Azure Databricks.

  • Introduction

  • Key features of MLflow

  • Running experiments with MLflow

  • Registering and deploying models with MLflow

  • Exercise – Use MLflow in Azure Databricks

Module 3: Optimizing Hyperparameters in Azure Databricks

Hyperparameter optimization is a crucial step in machine learning. In Azure Databricks, you can use the Hyperopt library to automatically tune hyperparameters.

  • Introduction

  • Optimizing hyperparameters with Hyperopt

  • Reviewing Hyperopt trials

  • Scaling Hyperopt trials

  • Exercise – Optimize machine learning hyperparameters in Azure Databricks

Module 4: Using AutoML in Azure Databricks

AutoML in Azure Databricks simplifies the process of creating an effective machine learning model for your data.

  • Introduction

  • What is AutoML?

  • Using AutoML in the Azure Databricks UI

  • Running an AutoML experiment with code

  • Exercise – Use AutoML in Azure Databricks

Module 5: Training Deep Learning Models in Azure Databricks

Deep learning leverages neural networks to train highly effective models for complex predictions, computer vision, natural language processing, and other AI workloads.

  • Introduction

  • Understanding deep learning concepts

  • Training models with PyTorch

  • Distributing PyTorch training with TorchDistributor

  • Exercise – Train deep learning models in Azure Databricks

Module 6: Managing Machine Learning in Production with Azure Databricks

Machine learning enables data-driven decision-making and automation, but deploying models into production for real-time insights can be challenging. Azure Databricks streamlines this process by providing a unified platform to build, train, and deploy machine learning models at scale, supporting collaboration between data scientists and engineers.

  • Introduction

  • Automating data changes

  • Exploring model development

  • Deployment strategies for models

  • Model versioning and lifecycle management

  • Exercise – Manage a machine learning model

​

CONTENT

​

In our two-day seminar on the topics "Implementing a Data Lakehouse Analytics Solution with Azure Databricks (Day 1)" and "Implementing a Machine Learning Solution with Azure Databricks (Day 2)", participants will acquire the essential skills to leverage the Azure Databricks platform.

​

Day 1: Implementing a Data Lakehouse Analytics Solution with Azure Databricks

​

Day 1 begins with a comprehensive introduction to Azure Databricks. Participants will explore the fundamental concepts and features of this cloud-based platform that enables the use of Apache Spark for large-scale data analytics. Through hands-on exercises, they will learn how to identify and effectively implement different Azure Databricks workloads. An important aspect of this unit is data governance using Unity Catalog and Microsoft Purview, which are essential for efficient and secure data management.

​

In the second unit, the focus is on data analysis with Azure Databricks. Participants will learn how to efficiently ingest data and work with Azure Databricks’ integrated data exploration tools. The use of DataFrame APIs to perform complex data analyses will also be covered. Through practical exercises, participants will gain the ability to identify patterns and insights from large datasets, enabling them to make data-driven decisions.

​

The third unit introduces participants to Apache Spark in more detail. They will learn how to create a Spark cluster and use Spark within notebooks to perform powerful data processing tasks. Working with data files and visualizing data will also be covered, which greatly simplifies data analysis. These skills are highly valuable for data engineers and analysts who need to process and analyze large amounts of data efficiently.

​

In the fourth unit, the focus shifts to managing and processing data with Delta Lake. Participants will learn how to manage ACID transactions and enforce schema compliance. By understanding data versioning and time travel in Delta Lake, they will gain valuable insights into ensuring data integrity and consistency. Practical exercises will allow participants to apply the theory and effectively utilize the functionalities of Delta Lake.

The fifth unit emphasizes the creation of data pipelines using Delta Live Tables. Participants will explore the advantages and capabilities of this technology to enable real-time processing and integration. Hands-on exercises will help participants develop robust and scalable data pipelines tailored to their specific use cases.

​

Finally, in the sixth unit of Day 1, participants will learn how to use Azure Databricks Workflows to deploy workloads. They will understand how to orchestrate and automate complex data processing pipelines. By exploring the key components and benefits of Azure Databricks Workflows, participants will be able to effectively implement and manage their data-driven applications.

​

Learning Objectives Day 1:

  • Understand Azure Databricks: Gain knowledge about the features and capabilities of Azure Databricks as a cloud-based data analytics service.

  • Perform data analysis: Learn how to carry out data analyses with Azure Databricks, including the integration of data from sources such as Azure Data Lake and Azure SQL Database.

  • Use collaborative notebooks: Learn how to use collaborative notebooks to perform exploratory data analysis (EDA) and visualize data.

  • Apply Apache Spark: Acquire practical knowledge in using Apache Spark within the Azure Databricks platform to process and analyze large datasets.

  • Manage data with Delta Lake: Understand Delta Lake functionalities, including ACID transactions and schema enforcement, to ensure data consistency and integrity.

  • Build data pipelines: Gain the ability to develop and implement data pipelines with Delta Live Tables for real-time data processing.

  • Orchestrate workloads: Acquire knowledge in deploying and automating complex workloads with Azure Databricks Workflows.

Day 2: Implementing a Machine Learning Solution with Azure Databricks

​

On Day 2 of the seminar "Implementing a Machine Learning Solution with Azure Databricks", participants will learn how to leverage the powerful Azure Databricks platform for developing and deploying machine learning solutions. Azure Databricks provides a cloud-based environment specifically designed for data analytics and machine learning. Participants will acquire knowledge about the integration of Apache Spark and will be able to use both online and offline large language models (LLMs) to develop scalable and effective AI applications.

This part of the seminar focuses on the fundamentals of machine learning in Azure Databricks. Participants will learn how to prepare data for machine learning projects, train models, and evaluate them to generate reliable predictions. MLflow will also be introduced as an open-source platform for managing the machine learning lifecycle, which is natively supported in Azure Databricks. Hands-on exercises will allow participants to strengthen their skills in using MLflow.

Participants will further deepen their knowledge in advanced topics such as hyperparameter optimization with Hyperopt and the use of AutoML. These techniques are critical for improving the efficiency and accuracy of machine learning models. In addition, they will learn how to train and distribute deep learning models with PyTorch to tackle complex tasks in fields such as computer vision and natural language processing.

Another key aspect of the seminar is managing machine learning in production with Azure Databricks. Participants will learn how to automate data changes, develop models, and implement appropriate strategies for model deployment and versioning. Through practical exercises and projects, they will have the opportunity to apply their knowledge to realistic scenarios.

This seminar provides a comprehensive introduction to implementing machine learning solutions with Azure Databricks and is ideal for professionals who want to expand their expertise in data science and artificial intelligence.

Learning Objectives Day 2:

  • Understand the basics of machine learning: Acquire knowledge of the core principles of machine learning and their application in Azure Databricks.

  • Prepare data: Develop skills in preparing data for machine learning projects in Azure Databricks.

  • Train models: Learn how to train and evaluate machine learning models in Azure Databricks.

  • Use MLflow: Gain familiarity with MLflow functionalities for managing the machine learning lifecycle, including running experiments and registering models.

  • Optimize hyperparameters: Learn how to optimize hyperparameters with Hyperopt and evaluate the results.

  • Automate with AutoML: Understand AutoML and learn how to effectively use it in Azure Databricks, both through the user interface and with code.

  • Train deep learning models: Learn how to train and distribute deep learning models with PyTorch in Azure Databricks.

  • Manage machine learning in production: Gain knowledge of automating data changes, model development, deployment strategies, and lifecycle management.

bottom of page