Jupyter Notebook and Plotting
by Arthur Messenger

I have been using Jupyter Notebook instead of PowerPoint slides for some of the advanced courses ROI offers. Notebook have a lot features that make teaching much more interactive and productive.

As with any large package, there are things I would like to do that are not instantly available and a workaround needs to be created. This is a short blog of one such workaround.

The original Introduction to Machine Learning course had some 40 to 50 Python scripts showing how the different models work. (The current version has been divided into two different courses each with about 40 Python scripts.)  All of the scripts were designed and tested using scikit-learn.org tools with Anaconda 3’s Python on the bash command line and in Python 3 notebooks. We found only one glitch in moving to using %run cells inside of a Jupyter Notebook.

The program in Figure 1 has been modified for this blog post to show only the glitch being addressed.

This script displays the graphic in a separate window and writes to the command line the prompt “Press enter key to exit!” When showing multiple graphics, “exit” was replaced with “continue.”

When the code is moved to a Jupyter Notebook, the code works the same. See Figure 2: graphic.py Execute in Jupyter Notebook.

Jupyter displays the prompt, “Press enter key to exit!” followed by an input box.

The green arrow points to the * which indicates the program has not terminated. This is expected, as the program is waiting for the enter key to be tapped. All is more or less fine.

However, the question is, “Why the “Press enter key to exit!”? This is confusing, since we don’t know if the script exits or does Jupyter exit the notebook? This question is not needed so let’s get rid of it.

Remove line 36, the input line, and the program terminates and Jupyter moves to the next cell. Not so fine. I now have two different programs, one for the bash command line and one for the cell environment in Jupyter.

We need some way to tell what environment we are in. There are two solutions for this; both a little off the normal path.

The first one is to use a small function to check if sys.stdout is attached to a tty (terminal interface).  See Figure 3: is_command_line()

The try except block is needed as Jupyter Notebook does not implement .fileno().

The second one is to see if program is running interactively under ipython.  See Figure 4: is_ipython()

This one depends on get_ipython.config being False if the script is not running under ipython. The try except block is needed because get_ipython is not available when running python.

Both ways work. They are equally “hacky”. They should only be used if the environment is being changed. Which one to use will depend on the aims of the script.



A Hands-On Comparison of
by Doug Rehnstrom

Google Cloud Platform and Amazon Web Services offer products for running applications, networking, and storing data in the cloud.  

There are many similarities. Both have data centers all over the world and are divided into regions and zones. Both allow you to create virtual machines, both allow you to create software defined networks called VPCs. Both have services for storing binary, relational, and NoSQL data. Both have services for data warehousing and analysis and both have services for machine learning.

There are also many differences. GCP networks are global, while AWS networks are regional. AWS offers more services. GCP is a bit more automated.

I started using the cloud about 10 years ago and have been teaching both AWS and GCP and other cloud platforms since. Students will sometimes ask me which is better. I’m not in the opinion business though, I’m in the teaching business. So, I decided to give you some hands-on labs with each platform.  

You will learn more if you see how each works. Do the labs in order for each platform, and let me know which you think is better.

Getting Started with GCP Getting Started with AWS
GCP Networking AWS Networking
Compute Engine Virtual Machines EC2 Virtual Machines
Automating Deployments with Kubernetes Engine Automating Deployments with CodePipeline

These labs just touch the surface of the capabilities of GCP and AWS. If you would like to learn more, check out these courses provided by ROI Training.



3 Reasons to Choose Google Cloud Platform
by Doug Rehnstrom

While teaching Google Cloud Onboard events in DC and New York the last couple weeks, I  was coincidentally asked the same thing at both events. “Give me a few technical reasons why I would choose Google over other cloud providers.”

1. The Network

When you create a network in Google Cloud Platform, at first it looks like all the other cloud providers. A network in a collection of subnets. When creating virtual machines, you pick a zone and that zone determines which subnet that machine is in. Just like all the other providers, right? Wrong.

In GCP, networks are global and subnets are regional. With everyone else, networks are regional and subnets are zonal. What does that mean to me?

This allows you to put machines in data centers all over the world and treat them as if they were all on the same LAN. Machines in Asia can communicate with machines in the US via their internal IPs. This makes high performance, worldwide networking, high availability, and disaster recovery easy. You can simply deploy resources in multiple regions within the same project.

Because networks are global, you can create load balancers that balance traffic to machines all over the world. Google’s load balancers automatically route requests to machines closest to the user without you configuring anything. It just works the way it should work.

Google owns all the fiber connecting their data centers together. Once you are in the network, you can pass data between data centers without leaving the network.

2. App Engine

A student asked what management tools Google provides to help them manage their applications which require thousands of virtual machines.

Well, the short answer is, you don’t need to manage your machines at all. App Engine will do it for you.

App Engine deployment is completely automated with a single command. Applications contain one or more services. Multiple versions of each service can exist simultaneously. You can split traffic between versions for A/B testing. When deploying new versions, there is zero downtime. You can rollback to older versions in a second if you ever need to.

Auto scaling is completely automated. Instances start in a couple hundred milliseconds. Because instances start so quickly, App Engine applications can scale to zero instances when there is no traffic. When an application has zero instances, you are charged nothing. Thus, you don’t have to worry about stopping old versions of services over time because they clean themselves up. App Engine is designed to run at Google’s scale, which means it runs at everyone’s scale.

Load balancing is completely automated. You don’t configure anything, it just works. Health checks are completely automated. All requests are queued automatically, so you don’t have to worry about that. App Engine includes a free caching service, so you don’t have to set that up.

While other providers offer competing products, there really is nothing else like App Engine.

3. Security

All data stored in GCP is encrypted by default. There’s nothing to configure and you couldn’t turn encryption off if you wanted to. Files are not saved onto a single disk, files are divided into chucks and the chunks are saved onto different physical disks in a massively distributed file system.

All data passed between services within the network is also encrypted. Because Google owns all the fiber connecting its data centers, traffic between regions doesn’t leave the network.

Because you are running on the same infrastructure Google uses, you get their network security for free. So, denial of service and intrusion detection are just there.

For more details on Google Security, read the documentation at: https://cloud.google.com/security/.


Why Become a Certified Google Cloud Architect?
by Doug Rehnstrom

The life of a Cloud Architect…

A software architect’s job is to draw rectangles with arrows pointing at them.  

A cloud architect’s job is a little more complicated. First, you draw a computer, a phone, and a tablet on the left. Then, you draw a cloud. Then, you draw some rectangles to the right of the cloud. Lastly, you point arrows at things. Some architects will get fancy and strategically place a cylinder or two on the drawing. They might even draw rectangles within rectangles! Like this:

Sounds easy right? The trick is, you have to label the rectangles.  

If your only tool is a hammer, then every problem is a nail.

If you want to start using Google Cloud Platform, you might tell your IT guys to learn about Google Cloud infrastructure. They would likely go off and learn about Compute Engine and Networking. Then, they might fill in the rectangles as shown below:

If you told some programmers, go learn how to program on Google Cloud Platform, they might fill in the rectangles as shown here:

Both drawings might be “correct” in the sense that we could use either to get a system up and running. The question is, are we optimizing the use of the platform if we are only using one or two services?

Platform… what is he talking about?

Google Cloud Platform has many services: Compute Engine, App Engine, Dataflow, Dataproc, BigQuery, Pub/Sub, BigTable, and many more. To be a competent Google Cloud Platform Architect, you have to know what the services are, what they are intended to be used for, what they cost, and how to combine them to create solutions that are optimized. Optimized for what?  Cost, scalability, availability, durability, performance, etc.  

When someone takes the Google Cloud Architect certification exam, they are being tested on their ability to architect optimal systems. They are being tested on whether they know which service to use for which use cases. They are being tested on whether they can design a system that meets application performance requirements at the lowest cost.

Why being certified is important to your company.

Recently, a guy was complaining about his 4 million dollar per year bill for his Hadoop cluster running on GCP. He didn’t have to be spending that much. A bit of architecture training could have saved his company, oh I don’t know, 3.5 million dollars!

Send your IT people and your programmers to my Google Cloud Architect Certification Workshop. I’ll show them the right way to label the rectangles and help them pass the exam. Maybe we can even save you some money.



Understanding Denormalization for BigQuery
by Doug Rehnstrom

Understanding Denormalization for BigQuery

A long time ago in a galaxy far, far away...

In order to understand denormalization, we need to take a trip back in time; back to the last century. This was a time when CPU speeds were measured in MegaHertz and hard drives were sold by the MegaByte. Passing ruffians were sometimes seen saying “neee” to old ladies, and modems made funny noises when connecting online services. ​Oh these were dark days.

In these ancient times we normalized our databases. But why? It was simple really. Hard drive space was expensive and computers were slow. When saving data, we wanted to use as little space as possible, and data retrieval had to use as little compute power as possible. Normalization saved space. Data separated into many tables could be combined in different ways for flexible data retrieval. Indexes made querying from multiple tables fast and efficient.

It was, however, complicated. Sometimes databases were poorly designed. Other times, data requirements changed over time, causing a good design to deteriorate. Sometimes there were conflicting data requirements. A proper design for one use case might be a poor design for a different use case. And what about when you wanted to combine data from different databases or data sources that were not relational? Oh the humanity...

Neanderthals developed tools...

Then Google said, “Go west young man and throw hardware at the problem.”
“What do you  mean?” asked the young prince.    

If hard drives are slow, just connect a lot of them together, and combined, they will be faster. And don’t worry about indexes. Just dump the data in files and read the files with a few thousand computers.      

And the road to BigQuery was paved... 

Run, Forest, Run! 

When data is put in BigQuery, each field is stored in a separate file on a massively distributed  file system. Storage is cheap; only a couple cents per GB per month. Storage is plentiful; there  is no limit to the amount a data that can be put into BigQuery. Storage is fast, super fast!  Terabytes can read in seconds. 

Data processing is done on a massive cluster of computers which are separate from storage. Storage and compute are connected with a Petabit network. Processing is fast and plentiful. If  the job is big, just use more machines!   

Danger, Will Robinson! 

Ah, but there is a caveat. Joins are expensive in BigQuery. It doesn’t mean you can’t do a join, it  just means there might be a more efficient way.    

Denormalize the data! ​(​said in a loud, deep voice with a little echo​)   

BigQuery tables support nested, hierarchical data. So, in addition to the usual data types like  strings, numbers, and booleans, fields can be records which are composite types made up of  multiple fields. Fields can also be repeated like an array. Thus, a field can be an array of a  complex type. So, you don’t need two tables for a one-to-many relationship. Mash it together  into one table.   

Don’t store your orders in one table and order details in a different table. In the Orders table,  create a field called Details, which is an array of complex records containing all the information  about each item ordered. Now there is no need for a join, which means we can run an efficient  query without an index.   

“But doesn’t this make querying less flexible?”, asked the young boy in the third row with the  glasses and Steph Curry tee-shirt.    

Yes, I guess that’s true. But storage is cheap and plentiful. So, just store the data multiple times  in different ways when you want to query it in different ways.   

“Heresy” screamed the old man as he was dragged away clinging to his ergonomic keyboard  and trackball. 

Preparing for the Google Cloud Architect Certification – Getting Started
by Doug Rehnstrom


Cloud computing is all the rage these days, and getting certified is a great way to demonstrate your proficiency.  

The Google Certified Professional - Cloud Architect exam itself is very effective in making sure you are a practitioner, not just someone who memorized a bunch of terms and statistics. This is a true test of experience and solution-oriented thinking.

Here are some tips for preparing.


If you are new to Google Cloud Platform (GCP)

Some think cloud computing is just virtual machines running in someone else’s data center. While that is part of it, there is a lot more. Being a competent architect requires a broad understanding cloud services.   

To get an overview of the products offered by GCP, go to https://cloud.google.com. Spend a couple hours exploring the links and reading the case studies.


There’s no substitute for getting your hands dirty

Go to https://console.cloud.google.com, log in with your Google or GMail account, and then sign up for the free trial. You will get a $300 credit that you can use to learn GCP. This credit is good for a year.

Once you have an account, do the following labs (don’t worry if you don’t understand everything).


Take a Class

The first class to take is a 1-day overview titled Google Cloud Fundamentals: Core Infrastructure.

To help defray the cost, when you register for the course, use the promotional code DOUG and you will get in for $99 dollars.

You can also take this course for free on Coursera, https://www.coursera.org/learn/gcp-fundamentals.

The second course is Architecting with Google Cloud Platform: Infrastructure. Come to the first course, and I will make sure you get a big discount on the second one too!

Soon we will also be releasing two new Google Cloud Certification Workshops. Stayed tuned...


Next Steps

I’ve reached my word count for one post. Get these things done and I’ll follow up with another in a little while. 

Let me know how you are doing, doug@roitraining.com.


Preparing for
the Gig Economy
by Steve Blais

The “gig economy”. That is supposedly where we are headed. There are predictions based on a study by Intuit that by 2020 over 40% of the workforce will be working “gigs” rather than full time permanent employment. This is nothing new. Thirty years ago, Tom Peters predicted the “Corporation of One” in which everyone working for a company would be consultants rather than employees.