Skewed Classes (Binary Classification)

Instructions:

Number of questions: 10
Time limit: 20 minutes
Must be finished in one sitting. You cannot save and finish later.
Questions displayed per page: 1
Will allow you to go back and change your answers.
Will not let you finish with any questions unattempted.

1 / 10

What is the definition of a balanced dataset in a binary classification problem (e.g., is this a cat or a dog)?

A balanced dataset is when the number of examples of each class are approximately equal (e.g., 5000 cat images and 4800 dog images)

A balanced dataset is when the number of examples for both classes (e.g., cats and dogs) are significantly different (e.g., 1 cat image and 999 dog images in a dataset with 1000 images)

A balanced dataset is when 100% of the examples in a dataset belong to the desired class e.g., 1000 cat images in a dataset with 1000 images)

A balanced dataset is when none of the examples in a Dataset belong to the desired class e.g., 0 cat images but 1000 dog images)

None of the above

2 / 10

A balanced dataset for a binary classification problem (e.g., is this a cat or a dog) becomes imbalanced when there are significantly more datapoints for one class than another. What is the threshold at which a dataset becomes imbalanced?

Number of datapoints for both classes are evenly split e.g., 50%-50% split between cat and dog images

60%-40% split

70%-30% split

80%-20% split

90%-10% split

95%-5% split

3 / 10

In classification, skewed classes in a training dataset could lead to a model that has the following problem:

High variance

High bias

Both high variance and bias

None of the above

4 / 10

A dataset was recorded with the results of a cancer test on patients. 99.5% of the patients got a negative result (non-cancerous tumor) and only 0.5% of them got a positive result (cancerous tumor). What is the type of the dataset in this scenario?

Extremely imbalanced dataset

Slightly imbalanced dataset

Balanced dataset

None of the above

5 / 10

A dataset was generated by testing samples of Covid-19 patients in 2020. It shows 90% of the tests are negative (no virus detected) and 10% of them are positive (virus detected). What is the dataset type in this scenario?

Extremely imbalanced dataset

Slightly imbalanced dataset

Balanced dataset.

None of the above

6 / 10

A dataset was created by collecting the results of a survey randomly asking 900 people about their favorite summer activities, which are hiking and cycling. Results are that 300 people like cycling while 600 people like hiking. What is the dataset type in this scenario?

Extremely imbalanced dataset

Slightly imbalanced dataset

Balanced dataset

None of the above

7 / 10

Google is training their Google Home voice assistants with a new wake word (e.g., Ok Jarvis!). During testing, 100 sentences were spoken to simulate a dining room environment where people were talking. “Ok Jarvis!” was mentioned 30 times in those 100 sentences to see if the voice assistant would wake up: the wake word was not mentioned in the remaining sentences. What kind of a dataset is this collection of 100 sentences? Positive case is when a wake word is mentioned.

Imbalanced dataset since the positive-negative split is 30-70

Balanced dataset since the positive-negative split is 30-70

Balanced dataset since the positive-negative split is 70-30

Imbalanced dataset since the positive-negative split is 70-30

8 / 10

Amazon is training their Alexa devices (voice assistants) with a new wake word (e.g., Alexa!). During testing, 300 sentences were spoken to simulate a dining room environment where people were talking. “Alexa!” was mentioned ten times in those 300 sentences to see if the voice assistant would wake up: the wake word was not mentioned in the remaining sentences. What kind of a dataset is this collection of 300 sentences? Positive case is when a wake word is mentioned.

Imbalanced dataset since the positive-negative split is 10-290

Balanced dataset since the positive-negative split is 10-290

Balanced dataset since the positive-negative split is 290-10

Imbalanced dataset since the positive-negative split is 290-10

9 / 10

Google is training their Google Home voice assistants with a new wake word (e.g., Hey Google!). During testing, 500 sentences were spoken to simulate a dining room environment where people were talking. “Ok Google!” was mentioned 250 times in those 500 sentences to see if the voice assistant would wake up: the wake word was not mentioned in the remaining sentences. What kind of a dataset is this collection of 500 sentences? Positive case is when a wake word is mentioned.

Imbalanced dataset since the positive-negative split is 250-250

Balanced dataset since the positive-negative split is 250-250

Numerical dataset since the positive-negative split is 250-250

Categorical dataset since the positive-negative split is 250-250

10 / 10

Google is training their Google Home voice assistants with a new wake word (e.g., Ok Google!). During testing, 100 sentences were spoken to simulate a dining room environment where people were talking. “Ok Google!” was mentioned eight times in those 100 sentences to see if the voice assistant would wake up: the wake word was not mentioned in the remaining 92 sentences. What kind of a dataset is this collection of 100 sentences? Positive case is when a wake word is mentioned.

Imbalanced dataset since the positive-negative split is 8-92

Balanced dataset since the positive-negative split is 8-92

Balanced dataset since the positive-negative split is 92-8

Imbalanced dataset since the positive-negative split is 92-8

Your score is

0%

Statements

SOLUTIONS

SOCIAL MEDIA