Data Science tools already included in Azure Linux Data science VM:
- Microsoft R Open
- Anaconda Python distribution (v 2.7 and v3.5), including popular data analysis libraries
- Jupyter Notebook (R, Python)
- Azure Storage Explorer
- Azure Command Line for managing Azure resources
- PostgresSQL Database
- Machine learning Tools
- Computational Network Toolkit (CNTK): a deep learning software from Microsoft Research
- Vowpal Wabbit: a fast machine learning system supporting techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
- XGBoost: a tool providing fast and accurate boosted tree implementation
- Rattle (the R Analytical Tool To Learn Easily) : Tool that makes getting started with data analytics and machine learning in R very easy with a GUI based data exploration and modeling with automatic R code generation.
- Azure SDK in Java, Python, node.js, Ruby, PHP
- Libraries in R and Python for use in Azure Machine Learning and other Azure services
- Development tools and editors (Eclipse, Emacs, gedit, vi)
Steps to deploy the Linux VM:
For the new users there is some free credit available. I think it is around $200 at the moment which can be used as you may want.
Once you are signed into Azure portal, you should see your dashboard.
Step 2. Click on ‘New’ button on top left corner of the dashboard
Step 3. Type ‘Data Science’ in the search bar and select ‘Linux Data Science Virtual Machine’
Step 4. Click ‘Create’. Make sure deployment model is set to ‘Resource Manager’.
Step 5. Fill in the basic details. Make sure you get a green check mark for each field to ensure you are meeting the requirements for that field and that there are no naming conflicts. Once you are done click ‘OK’
Step 6. Select the size of the Virtual Machine you want to deploy and click ‘Select’. You can start with the smallest(cheapest) machine available and later on scale it up as your requirements grow I.e. you want to run more complex and cpu/memory intensive jobs
Step 7. Some additional settings. You can choose ‘Standard’ storage to start. Premium(SSD) storage comes in handy if you are running Machine Learning algorithms which need to iterate over data and read/write data multiple times from/to the storage disk. You can also change this later. Side note – Apache Spark is specially good when it comes to Machine Learning performance since it stored the data in-memory making the ML iterations much faster. Maybe I can discuss this in detail in a separate blog post.
As for the rest of the settings, if this is the first VM you are creating, system will automatically create names for Storage Account, VNet, Subnet, Public Ip and Network Security Group. You can go with these or customize the names. If you have created VMs before in your account, you might already have existing resources like VNet which you can reuse.
Click ‘OK’ once you are satisfied with the settings.
Step 8. Click ‘OK’ on the summary screen
Step 8. Next screen shows the hourly pricing for the VM you are deploying. This is the rate you will be charged for the VM for every hour it is runs. Keep an eye on your overall account usage because if you have multiple cloud resources deployed and running all the time, the cost can add up pretty quickly.
For the machine I have selected for this example, the price shown below is around $50/Month. There are multiple cheaper VM configurations available in Microsoft Azure Cloud at the moment starting as low as $15/Month
Click ‘Purchase’ if the numbers look good.
Step 9. Azure portal should now display the Azure dashboard with a notification in the top bar showing that the deployment has started along with the tile on the dashboard for the new VM
Once the deployment is complete you should see the below message on the top right corner of the portal.
Once you the message above your Azure Linux VM is ready fro you to login.
To login follow the below steps:
Step 1. On your Microsoft Azure Dashboard click on the new VM tile that has appeared or you can click on ‘Virtual Machines’ on the navigation bar on the left to see all the VMs that exist in your account and click on the new VM
Step 2. From the VM screen note down the Public IP address of your VM if you want to use that or if you want to use a familiar name for your VM, click on Public IP Address>All Settings>Configuration
Step 3. Specify the familiar DNS name you want to use instead of the IP. Skip this step if you want to use the IP. Copy the entire address including the region and ‘Cloudapp.azure.com’ suffix.
Step 4. You will need an SSH client installed on your local computer to login. You can use Putty if you are using windows.
Paste the Hostname as shown below and port 22. Click ‘Open’
Step 5. Type in user name and password
And here you go….you have Azure Data science Linux VM ready to go. You can start exploring various preinstalled data science tools like Microsoft R and Python libraries.
In the following post I will talk more about how you can get started with the various data science tools available in this VM
- To open R console type ‘R’ and press return key
- To open Python console type ‘python3’ and press return key