Extract Generation - Synthetic Data Alignment¶
Home > Packages > Extract Generation - Synthetic Data Alignment
Related: Extract Generation | Synthetic Data Generation
This document describes how the Extract Generation package has been aligned to work with the Ingenious Fabric synthetic data tables.
Overview¶
The Extract Generation package sample configurations have been updated to work seamlessly with the synthetic data tables generated by the synthetic_data_generation package. This alignment ensures that extract samples work out-of-the-box without requiring modifications to the synthetic data generation process.
Alignment Approach¶
1. Compatibility Views¶
Created database views that map synthetic data tables to the schema expected by the original extract samples:
dbo.v_customers_mapped- Maps the synthetic customers table to the expected format- Combines
first_nameandlast_nameintocustomer_name - Maps
address_line1toaddress -
Maps
registration_datetocreated_date -
dbo.v_transactions- Creates a transaction-like view from orders and order_items - Uses
order_item_idastransaction_id - Joins orders and order_items to simulate transaction data
-
Maintains compatibility with transaction-based extracts
-
reporting.v_sales_summary- Sales analytics view based on orders data - Aggregates order data by year, month, and country
-
Uses order status to filter completed transactions
-
reporting.v_product_sales_summary- Product performance analytics - Aggregates sales by product category and brand
- Provides units sold and revenue metrics
2. New Sample Configurations¶
Added extract configurations specifically designed for synthetic data tables:
| Extract Name | Source | Description |
|---|---|---|
| SAMPLE_CUSTOMERS_DAILY | customers table | Full customer data export |
| SAMPLE_PRODUCTS_CATALOG | products table | Product catalog in Parquet format |
| SAMPLE_ORDERS_EXPORT | orders table | Daily orders with GZIP compression |
| SAMPLE_ORDER_ITEMS_EXPORT | order_items table | Order items with file splitting |
| SAMPLE_CUSTOMER_SEGMENTS | sp_generate_customer_segment_report | Customer analytics report |
| SAMPLE_PRODUCT_SALES | v_product_sales_summary | Product sales analysis |
3. Stored Procedures¶
Created stored procedures that work with synthetic data:
finance.sp_generate_financial_report- Quarterly financial summarydbo.sp_validate_orders_extract- Order data validationreporting.sp_generate_customer_segment_report- Customer segmentation analysis
File Structure¶
extract_generation/
├── ddl_scripts/
│ └── warehouse/
│ ├── 009_sample_data_insert.sql # Original samples (legacy)
│ ├── 009_sample_data_insert_aligned.sql # Synthetic data aligned samples
│ └── 010_synthetic_data_alignment_views.sql # Compatibility views
Usage¶
1. Deploy the Alignment Views¶
First, execute the alignment views to create the compatibility layer:
2. Load Sample Configurations¶
Use the aligned sample configurations:
3. Run Extracts¶
The extracts will now work seamlessly with synthetic data:
# Example: Extract customers data
extract_name = "SAMPLE_CUSTOMERS_DAILY"
# Will extract from the actual customers table with all synthetic data columns
Benefits¶
- No Changes to Synthetic Data - Preserves the synthetic data generation as-is
- Backward Compatibility - Original extract samples still available
- Flexibility - Can extract either raw synthetic data or mapped views
- Real-World Testing - Extract configurations work with realistic data volumes
Example Extracts¶
Customer Data Export¶
-- Uses actual customers table
SELECT * FROM [dbo].[customers]
-- Output: All synthetic data columns including first_name, last_name, etc.
Legacy Format Export¶
-- Uses compatibility view
SELECT * FROM [dbo].[v_customers_mapped]
-- Output: customer_name (combined), address (mapped), etc.
Sales Analytics¶
-- Uses orders and customers data
SELECT * FROM [reporting].[v_sales_summary]
-- Output: Aggregated sales metrics by period and country
Best Practices¶
- Use Raw Tables - When possible, configure extracts to use raw synthetic data tables
- Use Views for Compatibility - Only use compatibility views when required by downstream systems
- Test with Real Volumes - Synthetic data provides realistic data volumes for performance testing
- Validate Extracts - Use the validation procedures to ensure data quality
Migration Guide¶
If you have existing extract configurations expecting the old schema:
- Update to use compatibility views (e.g.,
v_customers_mappedinstead ofcustomers) - Or update your downstream processes to handle the new schema
- Consider using the
output_formatand column mapping features for transformation
Future Enhancements¶
- Additional compatibility views for other synthetic data scenarios
- Column-level mapping configuration in extract details
- Automatic schema detection and mapping
- Integration with data quality validation framework