Implementing Table-Level Access Control on Data Lake Tables with AWS Glue 5.0 and Lake Formation

AWS Glue 5.0 now supports fine-grained access control (FGAC) by integrating with AWS Lake Formation, enabling granular permissions at the table, column, and row levels on data lake resources. This enhanced control helps organizations meet governance and compliance requirements, especially when handling sensitive datasets.
Lake Formation allows for defining and enforcing permissions through familiar SQL-like GRANT and REVOKE commands, ensuring seamless policy enforcement across services like Amazon Athena, Amazon EMR (Spark), and Redshift Spectrum. With Glue 5.0, these rules are now honored during AWS Glue Spark jobs and interactive sessions, streamlining data security across the analytics stack.
How FGAC Works in AWS Glue 5.0
Glue 5.0 introduces a dual-profile mechanism:
- User Profile: Executes your custom Spark code and builds the query plan. It has no direct S3 or Glue catalog access.
- System Profile: Uses a privileged role to enforce Lake Formation permissions, retrieve metadata, and launch executors.
The process:
- A user invokes StartJobRun on a Glue job enabled for Lake Formation.
- The user driver prepares the job plan without executing code.
- A secure TLS channel is established to the system driver, running Spark with full permissions.
- The system driver enforces access control, accesses data, and orchestrates executors.
- Executors run portions of your code or policy-enforced data reads according to FGAC rules.
This separation ensures user code cannot bypass access restrictions. Only the system profile can query tables with sensitive data.
Enabling FGAC in AWS Glue Jobs
To activate FGAC in AWS Glue 5.0 ETL jobs:
- In the AWS Glue Console, select your ETL job and ensure the Glue Version is set to Glue 5.0 – Supports Spark 3.5, Scala 2, Python 3.
Under Job parameters, add:
ini
CopyEdit
--enable-lakeformation-fine-grained-access = true
- Save the configuration.
For interactive notebooks, use the configuration cell:
python
CopyEdit
%%configure
{"--enable-lakeformation-fine-grained-access":"true", "glue_version":"5.0"}
Example Use Case: CSV and Iceberg Tables
The blog demonstrates a typical scenario:
- S3 bucket creation with CSV dataset.
- Data Catalog tables defined for both CSV and Iceberg formats via AWS Glue and Athena CTAS.
- Lake Formation permissions applied to enforce FGAC across both table types.
- Glue PySpark jobs executed to read data with FGAC enforced, and write filtered results back to S3.
The sample dataset includes:
- product_id
- category
- product_name
- quantity_available
- last_update_time
- op (operation type: I/U/D)
After setting up Lake Formation’s row- and column-level filters, two Glue jobs demonstrate compliance with access policies using PySpark scripts.
Step-by-Step Implementation
Prerequisites
Before starting, ensure:
- AWS account with IAM roles to manage S3, Glue, Catalog, and Lake Formation
- Lake Formation is configured, and you're an administrator or have Data Lake admin permissions
- Working in a supported AWS region (example: eu-west-1)
Implementation Outline
- S3 Bucket and Dataset
- Upload input CSV data.
- Upload input CSV data.
- Data Catalog Tables
- Register a standard table and an Iceberg table via Athena CTAS.
- Register a standard table and an Iceberg table via Athena CTAS.
- Lake Formation Policies
- Define row and column filters for both CSV and Iceberg tables to enforce FGAC.
- Define row and column filters for both CSV and Iceberg tables to enforce FGAC.
- Glue Job Configuration
- Enable the --enable-lakeformation-fine-grained-access
- Run Glue PySpark scripts that respect Lake Formation rules.
- Enable the --enable-lakeformation-fine-grained-access
- Result Validation
- Output from the jobs confirms only permitted data is read or written as per policies.
Conclusion
With AWS Glue 5.0, integrating Lake Formation's FGAC brings unified, secure data access to your Spark ETL pipelines. The mechanism ensures:
- Centralized policy enforcement across services
- Transparent, secure separation between user and system execution profiles
- Full FGAC support on table formats like CSV, Iceberg, and beyond
Adopting this integration enhances your data lake governance, ensures compliance, and minimizes risk when working with sensitive data.
Business News
Passing the Torch: Warren Buffett Bows Out, but Not Away
John Ridding Bids Farewell: The End of an Era at Financial Times
Cleveland-Cliffs CEO Declares War on Japan as He Eyes U.S. Steel Takeover
Harnessing AI: Transforming the Workplace for Enhanced Productivity
Navigating Economic Turbulence: The Inflation Conundrum