23 lines
1.7 KiB
Plaintext
23 lines
1.7 KiB
Plaintext
U-SQL Scripts for Processing a TPC-DS Data Set
|
||
The U-SQL scripts for processing a TPC-DS data set demonstrate how to use Azure Data Lake Analytics to prepare raw data for import into an Azure Analysis Services data model. For a detailed discussion, see the blog article “Using Azure Analysis Services on Top of Azure Data Lake Storage” on the Analysis Services Team Blog.
|
||
To use these scripts, the TPC-DS data set must be generated by using the dsdgen tool, which can be downloaded as source code from the TPC-DS web site. Run the dsdgen tool with /PARALLEL 100 and /CHILD ids ranging from 1 – 100 to generate the source files with the expected file naming conventions and place the source files in an Azure Blob Storage account, as discussed in “Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2” on the Analysis Services Team Blog. Finally, edit the U-SQL scripts and replace the storage account placeholder (@<blob storage account name>) with your actual storage account.
|
||
The subfolders containing the U-SQL scripts highlight different scenarios:
|
||
* all_single These scripts create a single csv file per table containing all the source data.
|
||
* large_multiple These scripts 4 csv files for each of the large tables (catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, and web_sales) and a single csv file for each of the remaining tables.
|
||
* last_available_year These scripts create a single csv file per table containing only the source data for the last year in the data set, which is the year 2003.
|
||
* modelling These scripts create a data set for modelling purposes with a single csv file per table containing up to 100 rows of data.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|