Analysis-Services/UsqlScripts/readme.rtf

18 lines
2.8 KiB
Plaintext
Raw Normal View History

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033\deflangfe1033{\fonttbl{\f0\fswiss\fprq2\fcharset0 Calibri Light;}{\f1\fswiss\fprq2\fcharset0 Calibri;}{\f2\fnil\fcharset2 Symbol;}}
{\colortbl ;\red0\green0\blue255;\red5\green99\blue193;}
2017-08-03 05:16:29 +08:00
{\*\generator Riched20 10.0.15063}{\*\mmathPr\mdispDef1\mwrapIndent1440 }\viewkind4\uc1
\pard\widctlpar\expndtw-10\kerning28\f0\fs56 U-SQL Scripts for Processing a TPC-DS Data Set\par
\pard\widctlpar\sa160\sl252\slmult1\expndtw0\kerning0\f1\fs22 The U-SQL scripts for processing a TPC-DS data set demonstrate how to use Azure Data Lake Analytics to prepare raw data for import into an Azure Analysis Services data model. For a detailed discussion, see the blog article \ldblquote Using Azure Analysis Services on Top of Azure Data Lake Storage\rdblquote on the {{\field{\*\fldinst{HYPERLINK "https://blogs.msdn.microsoft.com/analysisservices/" }}{\fldrslt{\ul\cf1\cf2\ul Analysis Services Team Blog}}}}\f1\fs22 .\par
To use these scripts, the TPC-DS data set must be generated by using the dsdgen tool, which can be downloaded as source code from the {{\field{\*\fldinst{HYPERLINK "http://www.tpc.org" }}{\fldrslt{\ul\cf1\cf2\ul TPC-DS web site}}}}\f1\fs22 . Run the dsdgen tool with /PARALLEL 100 and /CHILD ids ranging from 1 \endash 100 to generate the source files with the expected file naming conventions and place the source files in an Azure Blob Storage account, as discussed in \ldblquote Building an Azure Analysis Services Model on Top of Azure Blob Storage\emdash Part 2\rdblquote on the {{\field{\*\fldinst{HYPERLINK "https://blogs.msdn.microsoft.com/analysisservices/" }}{\fldrslt{\ul\cf1\cf2\ul Analysis Services Team Blog}}}}\f1\fs22 . Finally, edit the U-SQL scripts and replace the storage account placeholder (@<blob storage account name>) with your actual storage account.\par
The subfolders containing the U-SQL scripts highlight different scenarios:\par
2017-08-03 05:16:29 +08:00
\pard{\pntext\f2\'B7\tab}{\*\pn\pnlvlblt\pnf2\pnindent0{\pntxtb\'B7}}\fi-360\li720\sa160\sl252\slmult1\b all_single\b0\~\~\~These scripts create a single csv file per table containing all the source data.\par
{\pntext\f2\'B7\tab}\b large_multiple\b0 \~\~These scripts 4 csv files for each of the large tables (catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, and web_sales) and a single csv file for each of the remaining tables.\par
{\pntext\f2\'B7\tab}\b last_available_year\b0\~\~\~These scripts create a single csv file per table containing only the source data for the last year in the data set, which is the year 2003.\par
{\pntext\f2\'B7\tab}\b modelling\b0 \~\~\~These scripts create a data set for modelling purposes with a single csv file per table containing up to 100 rows of data.\par
\pard\widctlpar\sa160\sl252\slmult1\par
\par
}