Get on the Node.js Express to Google Cloud Functions
by Doug Rehnstrom

What is Google Cloud Functions?

Cloud Functions is a serverless environment for deploying Node.js applications. It is completely no-ops; you don’t have to provision or manage servers. It is inexpensive; it costs 40 cents per million requests, but the first 2 million are free.

Triggers

Cloud functions run in response to triggers. A trigger could be a change to a Cloud Storage bucket, a message sent to a Pub/Sub topic, or a request to a URL.

There are many examples. Monitor a bucket and when a picture is uploaded to the bucket, use the Vision API to label the picture. Or, monitor a Pub/Sub topic and when a message comes in, process it and send it to BigQuery.

Cloud Functions are Microservices

A microservice is a service that does one thing. So I got to thinking, why can’t my micro-service be a web application? That’s just one thing.

Node.js Express

If I were going to build a Node.js web app, I would use a framework called Express. Node.js Express is similar to Python Flask or ASP.NET MVC.

Express allows you to define routes. Routes are URLs that point to functions that return data. One route could be a page request which returns HTML. Other routes could define API calls that return JSON data. In the end, it’s the same thing. In response to an HTTP request, run some code and return a string.

There are just a couple tricks you need to know to deploy your Express app to Cloud Functions.

Do this tutorial. It walks you through creating an Express app. Once the Express app is built, it shows you how to expose it as a JavaScript function and then deploy it to Google Cloud Functions.

Be Careful Though

When your app goes viral, after 2 million requests, you will have to pay 40 cents every time another million requests come in!

Happy coding!

Comparing Architecture Costs on Google Cloud Platform
by Doug Rehnstrom

Let’s say you want to deploy a web application that has a moderate amount of traffic. Not Amazon.com or SnapChat—just your average company website. You want the site to be available all the time, so it must be deployed for high availability. You want it to autoscale to meet spikes in demand. You also need some place to store the data, so you need a database.

Traditional Architecture Utilizing IaaS

Let’s consider a traditional Web development architecture. You have a Java or .NET web application running on Linux or Windows machines with a MySQL database. Since you want high availability, you need two web servers and two database servers running all the time. You need a load balancer. You would also set up an autoscaler to turn machines on and off to meet demand. The architecture might look similar to what is shown below.

So, what would this cost? Let’s assume 10 GB of storage, which is the minimum for the database, and n1-standard-2 machines for both the web servers and the database servers (these machines have 2 CPUs and 7.5 GB RAM). Adding in a small charge for the load balancer, this comes to about $300 per month.

This would handle a reasonable amount of traffic. Obviously, if you have lots of traffic and spikes in demand, the cost would go up. If your database was larger, the price would go up.

Cloud Native Architecture Using PaaS

Let’s say you alter your application to use Google App Engine and use Google Cloud Datastore for the data. Now the architecture looks like this.

App Engine automatically takes care of autoscaling, health checks, and load balancing for you. Datastore is just there and highly available. In both cases, there is nothing to manage.

So, what would this cost? App Engine programs shut themselves off when not in use and scale automatically when they need to. Let’s say you averaged 3 instances per hour. This allows for dead time when there are zero instances, and bursts of traffic when more start up. That would cost about $67 per month. Datastore has a free tier. To be consistent with the MySQL use case above, 10 GB of Datastore data, plus let’s say a million reads and a million writes would be 2 or 3 dollars a month. This comes to about $70 per month. That’s less than a quarter the cost of the IaaS solution above.

Obviously, if your app goes viral and you turn into SnapChat the cost will go up, but then you go public, raise $8 billion in your IPO, and buy Venice Beach, CA. We’re just considering the case of an average company website.

Let’s Go Serverless

The cool kids don’t write websites in Java or .NET anymore. Let’s say you rewrite your application using a JavaScript framework like Angular or React. Since the front-end code runs in the browser, you don’t need a web server anymore. Just deploy the website to a Cloud Storage bucket. However, your front-end needs an API at the server that it can call to save or retrieve data. Since you are programming in JavaScript, just use Cloud Functions to provide that API. As always, you need to store your data—let’s use Datastore for that. This architecture now looks like this.

So, what would this cost? The Cloud Storage bucket would be free because there is a 5GB free tier. Cloud Functions costs 40 cents per million requests, but you get 2 million requests per month for free. Uh, I’m going to go out on a limb and call that free. As before, if you assume 10GB of Cloud Storage and a million reads and writes, this might cost a couple dollars per month. If you add in some network traffic, maybe you’ll hit 5 dollars per month. Although, if you never go beyond a free tier, you may never get billed at all.

So, What’s the Moral of the Story?

Deploying your applications to Google Cloud Platform can be inexpensive, scalable, and highly available, if you do some work up front to take advantage of what is provided.

Want to learn more? Click here: Google Cloud Platform Training.

Please join my LinkedIn Google Cloud Student Resource Group.

 

0

Python 2.7 to Python 3.6 Conversion
by Arthur Messenger

Introduction

Python 2.7 is scheduled to reach end of life (EOL) sometime in the year 2020. Several companies will continue maintenance on 2.7 for some time. This brings about two problems…

First, no matter how hard they try, over time the versions will differ and there is the strong possibility of vendor lock in. Second, the reason Python changed from Python 2 to Python 3 is because some of the decisions in the design of Python 2 were making change and upgrades to the language almost impossible.

Organizations now face a tough decision. Do we stay with Python 2.7 or do we move to Python 3.x?

When I first looked at this, the tool was 2 to 3. It crashed then. It doesn’t crash but it refused to convert a very simple program. I found it is no longer recommended.

I really wanted to know if there was any new work done in making the conversion. I was told about book, Python 2 and 3 Compatibility, from Apress. The book is about how to use the six and Python future libraries to write code that is executable by both Python 2 and Python 3. The ideas are well presented, and you can use them to write code that is both Python 2 and Python 3 executable. The libraries add a large burden on the programmer; the code is not as clean as I would expect Python to be, and I did not see a clear path moving from this Python 2/3 hybrid to just Python 3. If this was my only choice, I would maintain a Python 2 program and rewrite the code in Python 3. Fortunately, this is not the only solution.

In looking around, I found the site python-future.org. From the Overview:

It allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead.

The general approach is to convert the program to run on Python 3 and then import modules that allow the script to run on Python 2. I see this as a possible method of moving from 2 to 3.

There are two steps. “Simple” conversion and Complex conversion. The site suggests doing them in stages. I always did both stages even when the simple state would make the conversion.

At the moment, I am interested in moving a code base in Python 2.7 to Python 3.6. I have divided my review into four sections and will report on what I find in each section.

Figure 1: Review Sections shows the sequence I will follow.

Setup

I installed Anaconda 2 for Python 2 and Anaconda 3 for Python 3 on a MacBook pro.

These installations modify the PATH variable to place the anaconda bin directories as the first directory to search. I added the two bash scripts, go2 and go3, seen in Figure 2 and Figure 3, to be able to change between using Python 2, source go2, and Python 3, source go3.

To use the methods of the site, you need to install the future package. This is done with the conda command once for Python 2 and once for Python 3. See Figure 4: Install future

I built the directory structure shown in Figure 5.

Python 2 is used to hold the original Python 2 program and Python 3 is where the converted script is written.

The best documentation on using the conversion program futurize was to execute futurize –help and read the screen.

I chose to use the following options shown in Figure 6.

Print Command

Figure 7: print commands shows on the left the original python 2 only version and on the right shows the python 3 version with the additions to make it work in python 2. The only change necessary to make it Python 3 only is to remove the from __future__ line. I have added blank line and change the version number to make it easier to read.

Input

Python 3 has dropped the Python 2 function input and renamed raw_input as input. Figure 8 shows the conversion for the raw_input and Figure 9 shows the conversion for the input command.

The input of Python 2 was converted to an input of Python 3. Python 3’s input always returns a string. The string is evaluated by eval which convert string integers to an integer, string floats to a float, and a string to a string. Neat trick and it works under Python 2 and Python 3.

Exceptions

Exceptions have become more orderly, a better inheritance hierarchy, in Python 3. Simple exceptions are shown in Figure 10: Simple Exceptions

There are some differences in the output between Python 2 and Python 3.

The specific exception returned by running the program under Python 3 is different than running it under Python 2. The error messages are the same and that is what most programmers print out. This is a minor problem.

You can create your own exceptions in Python 2 by creating a child of the Exception class.

            class MyErrors(Exception): pass

And then call it with a raise statement.

            raise MyErrors, “Number must be between 10 and 99”

You would then catch the exceptions with:

             except MyError, error_string:

Figure 12: simple_user_exception.py has a script that uses this simple user exception

Figure 13: simple_user_exception.py After Conversion, shows the conversion.

There are several syntax changes and the specific exceptions is still wrong. It works under Python 2 and Python 3 and there is still an easy path to making it Python 3 only.

Figure 14: class_user_exception.py shows an exception written as a child of exception. Most textbooks on Python 2 use the simple method show in Figure 12 and 13.

I was happy to see no changes in the simple class definition. I saw nothing except the syntax changes and the required imports to make this all work.

Iterators

Very often functions that returned a list in Python 2, return an iterator in Python 3. There has been a shift towards creating an iterator instead of a list. Figure 16: dictionary.py, shows a very simple script testing access to a dictionary and its conversion.

Notice in the Python 3 version that the dictionary.keys(), which in  Python 3 returns an iterator, must  be enclosed in a list. The  first,  count  = len(list(dictionary.keys())),  is  not the best programming.  This should have been converted to len(dictionary). A much more difficult task than just turning all generators into lists. In the second conversion in the for statement, the for, for very large dictionaries, could error out. It does make the script executable under 2 and 3. Yet there is a potential error when running this code in Python 3. I would have liked this to be highlighted as a potentially dangerous change.

Final Thought

A very promising beginning. I am looking forward to seeing how the constructs listed in Section 2 are converted.

 

 

0

Blockchain & IoT Convergence: Is It Happening?
Arthur Messenger

Interested in knowing more about blockchains?

 

This article at EE Times is a very good article on the state of blockchains and IOT. It is a 3 minute read and well worth the time.

In the article is mentioned a very good introductory demonstration on blockchains which takes about 15 to 20 minutes. If you have no idea what a blockchain is and you are looking for a non-technical explanation, this is a good place to start: https://anders.com/blockchain/

0

Jupyter Notebook and Plotting
by Arthur Messenger

I have been using Jupyter Notebook instead of PowerPoint slides for some of the advanced courses ROI offers. Notebook have a lot features that make teaching much more interactive and productive.

As with any large package, there are things I would like to do that are not instantly available and a workaround needs to be created. This is a short blog of one such workaround.

The original Introduction to Machine Learning course had some 40 to 50 Python scripts showing how the different models work. (The current version has been divided into two different courses each with about 40 Python scripts.)  All of the scripts were designed and tested using scikit-learn.org tools with Anaconda 3’s Python on the bash command line and in Python 3 notebooks. We found only one glitch in moving to using %run cells inside of a Jupyter Notebook.

The program in Figure 1 has been modified for this blog post to show only the glitch being addressed.

This script displays the graphic in a separate window and writes to the command line the prompt “Press enter key to exit!” When showing multiple graphics, “exit” was replaced with “continue.”

When the code is moved to a Jupyter Notebook, the code works the same. See Figure 2: graphic.py Execute in Jupyter Notebook.

Jupyter displays the prompt, “Press enter key to exit!” followed by an input box.

The green arrow points to the * which indicates the program has not terminated. This is expected, as the program is waiting for the enter key to be tapped. All is more or less fine.

However, the question is, “Why the “Press enter key to exit!”? This is confusing, since we don’t know if the script exits or does Jupyter exit the notebook? This question is not needed so let’s get rid of it.

Remove line 36, the input line, and the program terminates and Jupyter moves to the next cell. Not so fine. I now have two different programs, one for the bash command line and one for the cell environment in Jupyter.

We need some way to tell what environment we are in. There are two solutions for this; both a little off the normal path.

The first one is to use a small function to check if sys.stdout is attached to a tty (terminal interface).  See Figure 3: is_command_line()

The try except block is needed as Jupyter Notebook does not implement .fileno().

The second one is to see if program is running interactively under ipython.  See Figure 4: is_ipython()

This one depends on get_ipython.config being False if the script is not running under ipython. The try except block is needed because get_ipython is not available when running python.

Both ways work. They are equally “hacky”. They should only be used if the environment is being changed. Which one to use will depend on the aims of the script.

 

0

A Hands-On Comparison of
AWS and GCP
by Doug Rehnstrom

Google Cloud Platform and Amazon Web Services offer products for running applications, networking, and storing data in the cloud.  

There are many similarities. Both have data centers all over the world and are divided into regions and zones. Both allow you to create virtual machines, both allow you to create software defined networks called VPCs. Both have services for storing binary, relational, and NoSQL data. Both have services for data warehousing and analysis and both have services for machine learning.

There are also many differences. GCP networks are global, while AWS networks are regional. AWS offers more services. GCP is a bit more automated.

I started using the cloud about 10 years ago and have been teaching both AWS and GCP and other cloud platforms since. Students will sometimes ask me which is better. I’m not in the opinion business though, I’m in the teaching business. So, I decided to give you some hands-on labs with each platform.  

You will learn more if you see how each works. Do the labs in order for each platform, and let me know which you think is better.

GCP AWS
Getting Started with GCP Getting Started with AWS
GCP Networking AWS Networking
Compute Engine Virtual Machines EC2 Virtual Machines
Automating Deployments with Kubernetes Engine Automating Deployments with CodePipeline

These labs just touch the surface of the capabilities of GCP and AWS. If you would like to learn more, check out these courses provided by ROI Training.

 

0

3 Reasons to Choose Google Cloud Platform
by Doug Rehnstrom

While teaching Google Cloud Onboard events in DC and New York the last couple weeks, I  was coincidentally asked the same thing at both events. “Give me a few technical reasons why I would choose Google over other cloud providers.”

1. The Network

When you create a network in Google Cloud Platform, at first it looks like all the other cloud providers. A network in a collection of subnets. When creating virtual machines, you pick a zone and that zone determines which subnet that machine is in. Just like all the other providers, right? Wrong.

In GCP, networks are global and subnets are regional. With everyone else, networks are regional and subnets are zonal. What does that mean to me?

This allows you to put machines in data centers all over the world and treat them as if they were all on the same LAN. Machines in Asia can communicate with machines in the US via their internal IPs. This makes high performance, worldwide networking, high availability, and disaster recovery easy. You can simply deploy resources in multiple regions within the same project.

Because networks are global, you can create load balancers that balance traffic to machines all over the world. Google’s load balancers automatically route requests to machines closest to the user without you configuring anything. It just works the way it should work.

Google owns all the fiber connecting their data centers together. Once you are in the network, you can pass data between data centers without leaving the network.

2. App Engine

A student asked what management tools Google provides to help them manage their applications which require thousands of virtual machines.

Well, the short answer is, you don’t need to manage your machines at all. App Engine will do it for you.

App Engine deployment is completely automated with a single command. Applications contain one or more services. Multiple versions of each service can exist simultaneously. You can split traffic between versions for A/B testing. When deploying new versions, there is zero downtime. You can rollback to older versions in a second if you ever need to.

Auto scaling is completely automated. Instances start in a couple hundred milliseconds. Because instances start so quickly, App Engine applications can scale to zero instances when there is no traffic. When an application has zero instances, you are charged nothing. Thus, you don’t have to worry about stopping old versions of services over time because they clean themselves up. App Engine is designed to run at Google’s scale, which means it runs at everyone’s scale.

Load balancing is completely automated. You don’t configure anything, it just works. Health checks are completely automated. All requests are queued automatically, so you don’t have to worry about that. App Engine includes a free caching service, so you don’t have to set that up.

While other providers offer competing products, there really is nothing else like App Engine.

3. Security

All data stored in GCP is encrypted by default. There’s nothing to configure and you couldn’t turn encryption off if you wanted to. Files are not saved onto a single disk, files are divided into chucks and the chunks are saved onto different physical disks in a massively distributed file system.

All data passed between services within the network is also encrypted. Because Google owns all the fiber connecting its data centers, traffic between regions doesn’t leave the network.

Because you are running on the same infrastructure Google uses, you get their network security for free. So, denial of service and intrusion detection are just there.

For more details on Google Security, read the documentation at: https://cloud.google.com/security/.

0

Why Become a Certified Google Cloud Architect?
by Doug Rehnstrom

The life of a Cloud Architect…

A software architect’s job is to draw rectangles with arrows pointing at them.  

A cloud architect’s job is a little more complicated. First, you draw a computer, a phone, and a tablet on the left. Then, you draw a cloud. Then, you draw some rectangles to the right of the cloud. Lastly, you point arrows at things. Some architects will get fancy and strategically place a cylinder or two on the drawing. They might even draw rectangles within rectangles! Like this:

Sounds easy right? The trick is, you have to label the rectangles.  

If your only tool is a hammer, then every problem is a nail.

If you want to start using Google Cloud Platform, you might tell your IT guys to learn about Google Cloud infrastructure. They would likely go off and learn about Compute Engine and Networking. Then, they might fill in the rectangles as shown below:

If you told some programmers, go learn how to program on Google Cloud Platform, they might fill in the rectangles as shown here:

Both drawings might be “correct” in the sense that we could use either to get a system up and running. The question is, are we optimizing the use of the platform if we are only using one or two services?

Platform… what is he talking about?

Google Cloud Platform has many services: Compute Engine, App Engine, Dataflow, Dataproc, BigQuery, Pub/Sub, BigTable, and many more. To be a competent Google Cloud Platform Architect, you have to know what the services are, what they are intended to be used for, what they cost, and how to combine them to create solutions that are optimized. Optimized for what?  Cost, scalability, availability, durability, performance, etc.  

When someone takes the Google Cloud Architect certification exam, they are being tested on their ability to architect optimal systems. They are being tested on whether they know which service to use for which use cases. They are being tested on whether they can design a system that meets application performance requirements at the lowest cost.

Why being certified is important to your company.

Recently, a guy was complaining about his 4 million dollar per year bill for his Hadoop cluster running on GCP. He didn’t have to be spending that much. A bit of architecture training could have saved his company, oh I don’t know, 3.5 million dollars!

Send your IT people and your programmers to my Google Cloud Architect Certification Workshop. I’ll show them the right way to label the rectangles and help them pass the exam. Maybe we can even save you some money.

 

0

Understanding Denormalization for BigQuery
by Doug Rehnstrom

Understanding Denormalization for BigQuery

 
A long time ago in a galaxy far, far away...

In order to understand denormalization, we need to take a trip back in time; back to the last century. This was a time when CPU speeds were measured in MegaHertz and hard drives were sold by the MegaByte. Passing ruffians were sometimes seen saying “neee” to old ladies, and modems made funny noises when connecting online services. ​Oh these were dark days.

In these ancient times we normalized our databases. But why? It was simple really. Hard drive space was expensive and computers were slow. When saving data, we wanted to use as little space as possible, and data retrieval had to use as little compute power as possible. Normalization saved space. Data separated into many tables could be combined in different ways for flexible data retrieval. Indexes made querying from multiple tables fast and efficient.

It was, however, complicated. Sometimes databases were poorly designed. Other times, data requirements changed over time, causing a good design to deteriorate. Sometimes there were conflicting data requirements. A proper design for one use case might be a poor design for a different use case. And what about when you wanted to combine data from different databases or data sources that were not relational? Oh the humanity...

Neanderthals developed tools...

Then Google said, “Go west young man and throw hardware at the problem.”
“What do you  mean?” asked the young prince.    

If hard drives are slow, just connect a lot of them together, and combined, they will be faster. And don’t worry about indexes. Just dump the data in files and read the files with a few thousand computers.      

And the road to BigQuery was paved... 

Run, Forest, Run! 

When data is put in BigQuery, each field is stored in a separate file on a massively distributed  file system. Storage is cheap; only a couple cents per GB per month. Storage is plentiful; there  is no limit to the amount a data that can be put into BigQuery. Storage is fast, super fast!  Terabytes can read in seconds. 

Data processing is done on a massive cluster of computers which are separate from storage. Storage and compute are connected with a Petabit network. Processing is fast and plentiful. If  the job is big, just use more machines!   

Danger, Will Robinson! 

Ah, but there is a caveat. Joins are expensive in BigQuery. It doesn’t mean you can’t do a join, it  just means there might be a more efficient way.    

Denormalize the data! ​(​said in a loud, deep voice with a little echo​)   

BigQuery tables support nested, hierarchical data. So, in addition to the usual data types like  strings, numbers, and booleans, fields can be records which are composite types made up of  multiple fields. Fields can also be repeated like an array. Thus, a field can be an array of a  complex type. So, you don’t need two tables for a one-to-many relationship. Mash it together  into one table.   

Don’t store your orders in one table and order details in a different table. In the Orders table,  create a field called Details, which is an array of complex records containing all the information  about each item ordered. Now there is no need for a join, which means we can run an efficient  query without an index.   

“But doesn’t this make querying less flexible?”, asked the young boy in the third row with the  glasses and Steph Curry tee-shirt.    

Yes, I guess that’s true. But storage is cheap and plentiful. So, just store the data multiple times  in different ways when you want to query it in different ways.   

“Heresy” screamed the old man as he was dragged away clinging to his ergonomic keyboard  and trackball. 

Preparing for the Google Cloud Architect Certification – Getting Started
by Doug Rehnstrom

 

Cloud computing is all the rage these days, and getting certified is a great way to demonstrate your proficiency.  

The Google Certified Professional - Cloud Architect exam itself is very effective in making sure you are a practitioner, not just someone who memorized a bunch of terms and statistics. This is a true test of experience and solution-oriented thinking.

Here are some tips for preparing.

 

If you are new to Google Cloud Platform (GCP)

Some think cloud computing is just virtual machines running in someone else’s data center. While that is part of it, there is a lot more. Being a competent architect requires a broad understanding cloud services.   

To get an overview of the products offered by GCP, go to https://cloud.google.com. Spend a couple hours exploring the links and reading the case studies.

 

There’s no substitute for getting your hands dirty

Go to https://console.cloud.google.com, log in with your Google or GMail account, and then sign up for the free trial. You will get a $300 credit that you can use to learn GCP. This credit is good for a year.

Once you have an account, do the following labs (don’t worry if you don’t understand everything).

 

Take a Class

The first class to take is a 1-day overview titled Google Cloud Fundamentals: Core Infrastructure.

To help defray the cost, when you register for the course, use the promotional code DOUG and you will get in for $99 dollars.

You can also take this course for free on Coursera, https://www.coursera.org/learn/gcp-fundamentals.

The second course is Architecting with Google Cloud Platform: Infrastructure. Come to the first course, and I will make sure you get a big discount on the second one too!

Soon we will also be releasing two new Google Cloud Certification Workshops. Stayed tuned...

 

Next Steps

I’ve reached my word count for one post. Get these things done and I’ll follow up with another in a little while. 

Let me know how you are doing, doug@roitraining.com.