Ovim treningom polaznici će usvojiti vještine i znanja potrebne za implementiranje i upravljanje radnim opterećenjima podatkovnog inženjeringa na Microsoft Azureu, koristeći Azure usluge kao što su Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks i druge.
Fokus treninga je na zadatke podatkovnog inženjeringa kao što su orkestriranje prijenosa podataka i transformacije pipelinea, rad s podatkovnim datotekama u podatkovnom jezeru, stvaranje i učitavanje relacijskih skladišta podataka, prikupljanje tokova podataka u stvarnom vremenu te praćenje podatkovne imovine i porijekla.
Svim polaznicima treninga DP-203 osigurali smo besplatno pohađanje AZ-900: Microsoft Azure Fundamentals i DP-900: Microsoft Azure Data Fundamentals.
Što ćete naučiti
- Explore compute and storage options for data engineering workloads in Azure.
- Run interactive queries using serverless SQL pools.
- Perform data Exploration and Transformation in Azure Databricks.
- Explore, transform, and load data into the Data Warehouse using Apache Spark.
- Ingest and load Data into the Data Warehouse.
- Transform Data with Azure Data Factory or Azure Synapse Pipelines.
- Integrate Data from Notebooks with Azure Data Factory or Azure Synapse Pipelines.
- Support Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link.
- Perform end-to-end security with Azure Synapse Analytics.
- Perform real-time Stream Processing with Stream Analytics.
- Create a Stream Processing Solution with Event Hubs and Azure Databricks.
Kome je namijenjeno
Primarno profesionalcima koji rade s podacima, dizajniraju podatke i BI profesionalcima koji žele usvojiti vještine i znanja o inženjerstvu podataka i izgradnji analitičkih rješenja pomoću data platform tehnologija dostupnih u Azureu. Sekundarno podatkovnim znanstvenicima i analitičarima koji rade s analitičkim rješenjima izgrađenim na Azureu.
Preduvjeti
U cilju pripreme za kvalitetno praćenje treninga svim polaznicima seminara DP-203 osigurali smo besplatno pohađanje treninga AZ-900: Microsoft Azure Fundamentals i DP-900: Microsoft Azure Data Fundamentals na Algebrinoj digitalnoj platformi za samoučenje (LMS) uz video nastavni materijal na hrvatskom jeziku.
Nastavni plan
-
Pregledaj
- Module 1: Introduction to data engineering on Azure
After completing this module, students will be able to:
- Identify common data engineering tasks
- Describe common data engineering concepts
- Identify Azure services for data engineering
- Describe the key features and benefits of Azure Data Lake Storage Gen2
- Enable Azure Data Lake Storage Gen2 in an Azure Storage account
- Compare Azure Data Lake Storage Gen2 and Azure Blob storage
- Describe where Azure Data Lake Storage Gen2 fits in the stages of analytical processing
- Describe how Azure data Lake Storage Gen2 is used in common analytical workloads
- Identify the business problems that Azure Synapse Analytics addresses
- Describe core capabilities of Azure Synapse Analytics
- Determine when to use Azure Synapse Analytics
- Identify capabilities and use cases for serverless SQL pools in Azure Synapse Analytics
- Query CSV, JSON, and Parquet files using a serverless SQL pool
- Create external database objects in a serverless SQL pool
- Use a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement to transform data
- Encapsulate a CETAS statement in a stored procedure
- Include a data transformation stored procedure in a pipeline
- Understand lake database concepts and components
- Describe database templates in Azure Synapse Analytics
- Create a lake database
- Identify core features and capabilities of Apache Spark
- Configure a Spark pool in Azure Synapse Analytics
- Run code to load, analyze, and visualize data in a Spark notebook
- Use Apache Spark to modify and save data frames
- Partition data files for improved performance and scalability
- Transform data with SQL
- Describe core features and capabilities of Delta Lake
- Create and use Delta Lake tables in a Synapse Analytics Spark pool
- Create Spark catalog tables for Delta Lake data
- Use Delta Lake tables for streaming data
- Query Delta Lake tables from a Synapse Analytics SQL pool
- Design a schema for a relational data warehouse
- Create fact, dimension, and staging tables
- Use SQL to load data into data warehouse tables
- Use SQL to query relational data warehouse tables
- Load staging tables in a data warehouse
- Load dimension tables in a data warehouse
- Load time dimensions in a data warehouse
- Load slowly-changing dimensions in a data warehouse
- Load fact tables in a data warehouse
- Perform post-load optimizations in a data warehouse
- Describe core concepts for Azure Synapse Analytics pipelines
- Create a pipeline in Azure Synapse Studio
- Implement a data flow activity in a pipeline
- Initiate and monitor pipeline runs
- Describe notebook and pipeline integration
- Use a Synapse notebook activity in a pipeline
- Use parameters with a notebook activity
- Describe Hybrid Transactional / Analytical Processing patterns
- Identify Azure Synapse Link services for HTAP
- Configure an Azure Cosmos DB Account to use Azure Synapse Link
- Create an analytical store enabled container
- Create a linked service for Azure Cosmos DB
- Analyze linked data using Spark
- Analyze linked data using Synapse SQL
- Understand key concepts and capabilities of Azure Synapse Link for SQL
- Configure Azure Synapse Link for Azure SQL Database
- Configure Azure Synapse Link for Microsoft SQL Server
- Understand data streams
- Understand event processing
- Understand window functions
- Get started with Azure Stream Analytics
- Describe common stream ingestion scenarios for Azure Synapse Analytics
- Configure inputs and outputs for an Azure Stream Analytics job
- Define a query to ingest real-time data into Azure Synapse Analytics
- Run a job to ingest real-time data, and consume that data in Azure Synapse Analytics
- Configure a Stream Analytics output for Power BI
- Use a Stream Analytics query to write data to Power BI
- Create a real-time data visualization in Power BI
- Evaluate whether Microsoft Purview is appropriate for data discovery and governance needs
- Describe how the features of Microsoft Purview work to provide data discovery and governance
- Catalog Azure Synapse Analytics database assets in Microsoft Purview
- Configure Microsoft Purview integration in Azure Synapse Analytics
- Search the Microsoft Purview catalog from Synapse Studio
- Track data lineage in Azure Synapse Analytics pipelines activities
- Provision an Azure Databricks workspace
- Identify core workloads and personas for Azure Databricks
- Describe key concepts of an Azure Databricks solution
- Describe key elements of the Apache Spark architecture
- Create and configure a Spark cluster
- Describe use cases for Spark
- Use Spark to process and analyze data stored in files
- Use Spark to visualize data
- Describe how Azure Databricks notebooks can be run in a pipeline
- Create an Azure Data Factory linked service for Azure Databricks
- Use a Notebook activity in a pipeline
- Pass parameters to a notebook
Za koji certifikat te priprema
Certifikacijski ispit: Exam DP-203: Data Engineering on Microsoft Azure Certifikat: Microsoft Certified: Azure Data Engineer Associate