What is Big Data?

Organizations today have data from many sources. There are relational and NoSQL databases. There is data in Web logs. There is Google AdWords, Apple iAd data, and Google Analytics data. There’s the CRM system, the call center system, the bug-reporting system, the comments on the blog, the Facebook posts, and the Twitterfeed. You get the idea.

At some point, management will want to analyze this data, and that requires it to be mashed together.

There are different ways to solve the Big Data problem.

A programmer might want to write code to gather the data, loop through it, and produce a report. This might work, but it’s not flexible and the code will run slower as the size of the data grows.

A database person might want to build a data warehouse and import the data into it. Then, he can use SQL to extract the data. This sounds like a good idea, but requires an investment in hardware, installation, maintenance, and programming.

If there is enough data, someone might decide to set up a Hadoop cluster to analyze it quickly. This requires a significant upfront investment and requires very specialized skills and training.

It would be nice if there were a prebuilt solution that was easy, flexible, and inexpensive!

That’s where Google BigQuery comes in. The way it works is simple. First, you take the data you want to analyze and save it as text.

You then upload the text files to BigQuery and BigQuery converts them into tables. I don’t mean relational database tables. I mean massively scalable, super-fast and efficient tables with BigQuery magic sprinkled all over them.

Once you have the tables, you write BigQuery SQL against them to extract the data. If you know ANSI SQL, you already know BigQuery SQL.

BigQuery will quickly execute queries against datasets big or small. BigQuery has been used internally at Google for years.

Google BigQuery integrates with many existing tools. You can run queries from Excel or Google Sheets. There are third-party visualization tools that connect to BigQuery. You can integrate BigQuery into custom applications using ODBC or JDBC. You can use R to do statistical analysis against BigQuery data.

With BigQuery there is no upfront investment in hardware, no installation, and no ongoing maintenance. You pay 2 cents per gigabyte per month per storage. You pay $5 per terabyte of data processed, but the first terabyte is free. So, getting started with BigQuery and using BigQuery to process small datasets is practically free.

It will not take you months to learn BigQuery. At ROI Training, we have a three-day course that will teach you what you need to know. (Need link here to CPB200)

Leave a Reply

Your email address will not be published.