Exploratory data analysis

ByAdmin July 10, 2021

Week – 2 Exploratory data analysis >>> How to Win a Data Science Competition Learn from Top Kagglers

Z = X / Y

Z = X Y

Z = X - Y

Z = X + Y

2

(note that it is not the same variable $X$ as in previous questions).

Which hypotheses about variable $X$ do NOT contradict with the plots? In other words: what hypotheses we can’t reject (not in statistical sense) based on the plots and our intuition?

2 points

2\leq X<3

happens more frequently than

3\leq X<4

X

is a counter or label encoded categorical feature

X

can be the temperature (in Celsius) in different cities at different times

X

can take a value of zero

X

takes only discrete values

We use target variable $y$ to colorcode the points.

The other three plots were produced by jittering $X$ and $Y$ values

That is, we add Gaussian noise to the features before drawing scatter plot.

Select the correct statements.

2 points

Target is completely determined by coordinates

(x, y)

, i.e. the label of the point is completely determined by point’s position

(x, y)

. Saying the same in other words: if we only had two features

(x, y)

, we could build a classifier, that is accurate 100% of time.

It is always beneficial to jitter variables before building a scatter plot

Top right plot is “better” than top left one. That is, every piece of information we can find on the top left we can also find on the top right, but not vice versa.

Standard deviation for Jittering is the largest on the bottom right plot.

We need to jitter variables not only for a sake of visualization, but also because it is beneficial for a model.

How to Win a Data Science Competition

Graded Soft/Hard Quiz of How to Win a Data Science Competition

ByAdmin July 10, 2021

Week – 1 Graded Soft/Hard Quiz of How to Win a Data Science Competition Learn from Top Kagglers 1. Which library provides the most convenient way to perform matrix multiplication?…

How to Win a Data Science Competition

Recap of How to Win a Data Science Competition

ByAdmin July 10, 2021

Week – 1 Recap of How to Win a Data Science Competition 1. What back propagation is usually used for in neural networks? 1 point To propagate signal through network…

How to Win a Data Science Competition

Feature extraction from text and images

ByAdmin July 10, 2021

Week – 1 Feature extraction from text and images >>> How to Win a Data Science Competition: Learn from Top Kagglers 1. Select true statements about n-grams 2 points N-grams…

How to Win a Data Science Competition

Validation

ByAdmin July 10, 2021

Week – 2 Validation >>> How to Win a Data Science Competition: Learn from Top Kagglers 1. Select true statements 1 point Performance increase on a fixed cross-validation split guaranties…

How to Win a Data Science Competition

Metrics

ByAdmin July 11, 2021

week – 3 Metrics >>> How to Win a Data Science Competition Learn from Top Kagglers 1. Suppose we solve a binary classification task and our solution is scores with…

How to Win a Data Science Competition

Mean encodings

ByAdmin July 11, 2021July 11, 2021

Week – 3 Mean encodings >>> How to Win a Data Science Competition: Learn from Top Kagglers 1. What can be an indicator of usefulness of mean encodings? 1 point…

Week – 2 Exploratory data analysis >>> How to Win a Data Science Competition Learn from Top Kagglers

Similar Posts