Friday, February 14, 2014

Chemistry on the Amazon EC2



We are trying out the Amazon EC2 compute cloud for running computations in the Jensen Group. This is a note on how things are going so far.

It was actually extremely easy to set up. Within minutes of having created the Amazon Web Services (AWS) account, I had a free instance of Ubuntu 12.04.3 LTS up and running and was able to SSH into it
You have access to one free virtual box and 750 free hours per month for the first year so it is free to get started. My free instance had some Intel processor, 0.5GB RAM and 8GB disk space (I think the spec change from time to time).
 
I copied binaries for PHAISTOS (the program we are looking to run) over and they ran successfully, and things pretty much went without a hitch.
After trying out the free instance, I just saved the image (you can do that via the web interface), and every other instance I just started from the same image so no configuration was needed after the first time.
I mounted a folder located on the university server via SSHFS which I use to store output data from the instance directly to our server. This way I  don't lose data if the instance is terminated, and I don't have to log in to the instance to check output or log-files.

The biggest problem for me was the vast number of different types of instance. You can select everything form memory-optimized to CPU, storage, interconnect or GPU instances, and these come in several different types each. This takes a bit of research and there is a lot of fine print. E.g. Amazon doesn't specifiy the physical core count, but rather "vCPU" which may or may not include hyperthreading (i.e. the vCPU number may be twice what you actually get!)
Also the price varies depending where the data center where you spawn your instances is located. I picked N. Virginia data center which was the cheapest. I don't know why I would pick one of their other data centers? The closest to me is located in Ireland, but it is about 15% more expensive. Asia seems to be even more expensive.

Managing payment is also surprisingly easy. I had my own free account which I used in the beginning. +Jan Jensen created an account using the university billing account number. From there we used the Consolidated Billing option to add my account to having the bill sent to Jan's account.


Our current project is pretty much only CPU-intensive and barely requires any storage or memory, so naturally I had to benchmark the instance types that are CPU optimized.

I tested out the largest (by CPU count) instances I could launch in the General Purpose (m3 tier), Compute Optimized (c3 tier) and Compute Optimized//previous generation (c1 tier). These are the m3.2xlarge, c3.2xlarge and c1.xlarge instances.

In short these machines are:

name = core count (processor type) ~ hourly price (geographical location of server)

m3.2xlarge = 4 physical cores (Intel E5-2670 @ 2.60 GHz) ~ \$0.90/hour (N. Virginia)
c3.2xlarge = 4 physical cores (E5-2680v2 @ 2.80 GHz) ~ \$0.60/hour (N. Virginia)
c1.xlarge = 8 physical cores* (E5-2650 @ 2.00 GHz) ~  \$0.58/hour (N. Virginia)

The c1.xlarge didn't support hyper threading from what I could gather. The m3.2xlarge is more expensive, because it has faster disks and more RAM. Initially, I thought the m3.2xlarge had 8 physical cores, but turns out I was merely fooled by the "vCPU" number and several pages of fine print in the pricing list.


As a test, I launched a Metropolis-Hastings simulation in PHAISTOS starting from the native structure of Protein G with the PROFASI force field at 300K with the same seed (666) in all the tests, and noted the iteration speed as a function of cores.

The maximum number of total iterations (all threads, collectively) per day for the three instances was comparable (see below) maxing out at around 500-600 millions/day.
A slight win for the quad core c3.2xlarge instance when it is hyperthreading on 8 cores.
No real benefit to spawn more than 8 concurrent threads either.



What is probably more important is the throughput for each USD you spend. Again, the c3.2xlarge wins (when hyperthreading on 8 cores) and is the cheapest for our purpose.






18 comments:

  1. Hi Jan

    Not fully official yet, but AWS is going to proved an Amber-gpu instance that anyone in the world can run (and pay for). Easy and very fast...

    ReplyDelete
  2. Hi Anders

    Presumably the same could be done for GAMESS (US)?

    (The only problem I would anticipate is the fortran part of the process).

    ReplyDelete
  3. For pretty much any program (also GAMESS) you would do the exact same. Just compile the program as you would do on any other platform, and run it just like your linux box.

    ReplyDelete
  4. Thanks for that, Anders. Will look into it.

    ReplyDelete
  5. Thanks for this great post! - This provides good insight. You might also be interested to know more about generating more leads and getting the right intelligence to engage prospects.
    Techno Data Group implements new lead gen ideas and strategies for generating more leads and targeting the right leads and accounts.
    Amazon AWS Users Email & Mailing List

    ReplyDelete
  6. Actually it is extremely easy to set up. Within minutes of having created the Amazon Web Services (AWS) account through cheap custom British essay writing service at greatessay.biz. This is a note on how things are going so far. This project is pretty much only CPU intensive and barely requires any storage or memory.

    ReplyDelete
  7. I have completely read your post and the content is crisp and clear. Thank you for posting such an informative article, I have decided to follow your blog so that I can myself updated. Amazon Web Services Training in Chennai

    ReplyDelete
  8. Thanks for the nice blog. It was very useful for me. I'm happy I found this blog. Thank you for sharing with us,I too always learn something new from your post.
    Amazon

    ReplyDelete
  9. Amazon has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help businesses scale and grow.For more information visit.
    aws online training
    aws training in hyderabad
    amazon web services(AWS) online training
    amazon web services(AWS) training online

    ReplyDelete
  10. Thanks for sharing the information,Looking forward for new posts.

    AWS Technical Essentials Training

    ReplyDelete
  11. Thank you for providing such an awesome article and it is very useful blog for others to read.
    AWS Training Institute in delhi

    ReplyDelete
  12. Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man learn AWS Online Training Hyderabad

    ReplyDelete
  13. I like your post. It is good to see you verbalize from the heart and clarity on this important subject can be easily observed...
    aws course

    ReplyDelete
  14. Big thank you for for sharing this post it's the content i looking for if anyone looking AutoCAD training institute in delhi Contact Here-+91-9311002620 Or Visit Website- https://www.htsindia.com/AutoCAD-training-courses

    ReplyDelete
  15. Really I enjoy your site with effective and useful information. אשרת תייר בקנדה

    ReplyDelete