23 lines
1.7 KiB
Plaintext
23 lines
1.7 KiB
Plaintext
|
U-SQL Scripts for Processing a TPC-DS Data Set
|
|||
|
The U-SQL scripts for processing a TPC-DS data set demonstrate how to use Azure Data Lake Analytics to prepare raw data for import into an Azure Analysis Services data model. For a detailed discussion, see the blog article <20>Using Azure Analysis Services on Top of Azure Data Lake Storage<67> on the Analysis Services Team Blog.
|
|||
|
To use these scripts, the TPC-DS data set must be generated by using the dsdgen tool, which can be downloaded as source code from the TPC-DS web site. Run the dsdgen tool with /PARALLEL 100 and /CHILD ids ranging from 1 <20> 100 to generate the source files with the expected file naming conventions and place the source files in an Azure Blob Storage account, as discussed in <20>Building an Azure Analysis Services Model on Top of Azure Blob Storage<67>Part 2<> on the Analysis Services Team Blog. Finally, edit the U-SQL scripts and replace the storage account placeholder (@<blob storage account name>) with your actual storage account.
|
|||
|
The subfolders containing the U-SQL scripts highlight different scenarios:
|
|||
|
* all_single<6C><65><EFBFBD>These scripts create a single csv file per table containing all the source data.
|
|||
|
* large_multiple <20><>These scripts 4 csv files for each of the large tables (catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, and web_sales) and a single csv file for each of the remaining tables.
|
|||
|
* last_available_year<61><72><EFBFBD>These scripts create a single csv file per table containing only the source data for the last year in the data set, which is the year 2003.
|
|||
|
* modelling <20><><EFBFBD>These scripts create a data set for modelling purposes with a single csv file per table containing up to 100 rows of data.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|