Google Data Analytics Professional Certificate (part 3)

Ganapathi Kakkirala
6 min readJun 4, 2021

This is my review article on second course series in Google Data Analytics professional certificate part(3 of 8). I shall walk you through the summary of skills and best practices that you would gain in this course.

Course Name: Prepare Data for Exploration

Instructor: Hallie, Analytical Lead, Google.

Course Duration: 20 hours

Skills Covered:

  • Ensuring ethical data analysis practices
  • Addressing issues of bias and credibility
  • Accessing databases and importing data
  • Writing simple queries
  • Organizing and protecting data
  • Connecting with the data community (optional)

Week 1:

In our day to day lives we generate lots of data . In this part of the course, we will check out how we generate data and how as an analyst you decide which data to collect for analysis. We will dive deep into the types of data like structured and unstructured data, data types and more..

  1. Qualitative data is usually listed as a name, category, or description of the collected data.
  2. Quantitative data is measurable or countable which can be expressed as a number.
  3. Nominal data is a type of qualitative data that’s categorized without a set order. For example, if we ask some strangers on the road for the address of a particular place, we might get an answer as yes, no or maybe which can not be categorized in any order.
  4. Ordinal data, on the other hand, is a type of qualitative data with a set order or scale. Example: Ratings for an android app on a scale of 5, heights of a group of people categorized as short, mid tall, tall etc..
How do we collect data?

From the above image, we can observe the considerations we keep while collecting data.

We have different types of data depending on the collection method and the source of data such as,

  1. Internal data, which is data that lives within a company’s own systems.
  2. External data, data that lives and is generated outside of an organization.
  3. If data is collected personally then it comes under first party. While we have Second-party data that is collected directly by another group and then sold. Third-party data is sold by a provider that didn’t collect the data themselves.

We will now move to data modelling step, the following pyramid shows different types of data modelling techniques.

Let us see each modeling type briefly,

  1. Conceptual data modeling gives you a high-level view of your data structure, such as how you want data to interact across an organization.
  2. Logical data modeling focuses on the technical details of the model such as relationships, attributes, and entities.
  3. Physical data modeling should actually depict how the database was built. By this stage, you are laying out how each database will be put in place and how the databases, applications, and features will interact in specific detail.

This part also gives an introduction to the Kaggle platform and navigates to some of the features of the amazing data platform.

Finally it covers data transformation and data organization in detail.

Data transformation: It is the process of changing the data’s format, structure, or values. As a data analyst, there is a good chance you will need to transform data at some point to make it easier for you to analyze it.

Week 2:

This part covers the significant and sensitive work that data analyst often jumps while gathering data and using them in the processes. It is understanding bias, credibility, privacy, ethics and access of the data.

Briefly, this week’s content comprises of:

  1. Data Bias, it is a type of error that systematically skews results in a certain direction.
  2. There are three more types of data bias, which are observer bias, interpretation bias, and confirmation bias.
  3. Good Data and Bad Data.

What is good and bad data?

Often data credibility is the biggest care a data analyst must be aware of and picking good data is crucial.

We found that if the data set is reliable, original, comprehensive, current and cited, we call it’s a good data. Well, a bad data is simply that is not reliable, not original, old and data without evidence.

Understanding data ethics and privacy:

  1. Data ethics refers to well- founded standards of right and wrong that dictate how data is collected, shared, and used.
  2. There are lots of different aspects of data ethics but primarily the course covers: ownership, transaction transparency, consent, currency, privacy, and openness.
  3. Data privacy means preserving a data subject’s information and activity any time a data transaction occurs.

Qwiklab on SQL: It introduces to some basic sql features in the google cloud using big query.

Data anonymization is the process of protecting people’s private or sensitive data by eliminating that kind of information. As a data analyst, we might not be primarily responsible for data anonymization, but while working with the copies of the data, it is the role of data analyst to hide and protect some sensitive information.

Open Data: It is a debatable concept where people are still figuring about the ways to provide access to free and open data resources without compromising the privacy and concern of the data. There are many resources to find open datasets such as

Kaggle, Data Search, Google Cloud etc..

Week 3:

Here, you will get to know all about databases in data analytics.

  1. It covers the important features of databases and the ways to explore available datasets on google cloud’s BigQuery.
  2. Metadata helps data analysts interpret the contents of the data within a database. So, we learn about the importance of metadata and different types of metadata like descriptive, structural, and administrative.
  3. Metadata also makes data more reliable by making sure it’s accurate, precise, relevant, and timely. This also makes it easier for data analysts to identify the root causes of any problems that might pop up.
  4. You will learn about data governance and the importance of handling data properly as a data analyst.
  5. Often data exists in different formats and comes from different sources, we will cover how to read the csv data to spreadsheets and types of data sources we can make use of.
  6. Organized data always gives an edge to the analyst to perform analysis in a better way. So, filtering and sorting are the two efficient practices to learn as a data analyst. It covers some hands on with these methods.
  7. Databases are not always exist within the organizations, we use different cloud services for the database options. One such powerful and most commonly used service is BigQuery which you will dive deep with some hands on SQL queries at this part.

Week 4:

  1. Effectively organizing data anywhere in the work of an analyst is one of the most appreciated practices. It helps them to work efficiently with their files within the folders.
  2. Naming conventions helps the analysts to easily understand about the data in the file, These are consistent guidelines that describe the content, date, or version of a file in its name.
  3. Securing data and providing the freedom for the analysts to access is the aspect that needs to be balanced and executed effectively. Data security means protecting data from unauthorized access or corruption by adopting safety measures.

Week 5 (optional):

  1. Although we are constantly upskilling and making technologically savvy in the analytics domain, it is equally important to keep a good network of professionals working in the same industry.
  2. Maintaining online presence is key to succeed as a data analyst. There are platforms like LinkedIn, Medium, Github, Kaggle to connect, follow and learn from the fellow analysts. This optional module covered in detail about making an effective network.

Thank You.

--

--

Ganapathi Kakkirala

A technology and business enthusiast with a passion to write and share knowledge through blogs.