The quintessential linear algebra problem will ask for the solution of a set of linear equations.

Example: Find the solution $(x,y)$ for the linear system

$\begin{matrix}
3x-y=2 \\
2x+3=4
\end{matrix}$

There are two ways we can interpret the solution to these equations:

The point(s) at which these equations intersect when plotted as a line in $\mathbb{R}^2$

For our example, we can rewrite each equation in our system as a function $f(x)=y$ and plot each function on $\mathbb{R}^2$ to visualize the this interpretation of the solution:

If we rewrite the system as a vector equation, the solution becomes the set of scalar values that are multiplied with respective column vectors on the left hand side of the vector equation so that we obtain the right hand side

Don’t worry, I will be adding more content to this page in the near future

Sources

Linear Algebra and its Applications, 3rd Edition. Gilbert Strang (1986)

Brand T is a toothpaste company that controls 20% of the toothpaste market

A market researcher predicts the following effects of an ad campaign:

A consumer using brand T will continue using brand T with a 90% probability

A consumer not using brand T will switch to brand T with a 70% probability

For any given customer in the toothpaste market, let…

$T$ denote a state of using brand T

$T’$ denote the state of using a toothpaste brand other than T

Definitions:

A transition diagram is wieghted directed graph whose nodes denote various states a system can exist in and edges denote the probability of the system transitioning from one state to another after some time step $\Delta t$

A transition probability matrix $\mathrm{P}$ is a $n \times n$ matrix whose $(i,j)$ entries give the probabiliy a “sample” in the $i$th state of a $n$-state system will transition to its $j$th state

Importantly, the elemental sum of all rows in any $\mathrm{P}$ must equal 1

Here is transition probability matrix describing our toothpaste brand example:

An initial state distribution matrix $\mathrm{S_0}$ for a $n$-state system is a $n$-dimensional column vector whose $i$th entries denote the percentage of “samples” that are in state $i$ at time $t=0$

Here is the initial state distribution matrix for our toothpaste example:

The $i=1$st entry indicates that 20% of customers in the toothpaste market use brand T (state $T$) and the $i=2$nd entry indicates that 80% of customers in the toothpaste market use brand T’ (state $T’$)

A probability tree gives the probability a “sample” will transition to some state after some time

For our toothpaste example, let’s construct a probability tree that tells us the probability a person will use brand T vs. T’ after one month

In order to determine the probability a customer is using brand T after one week, we need to take the sum of all the possible products of the probability transitions that end in state T (Fig 5)

Here, there are two possible state transition sequences that end in state T (highlighted in orange and green):

The orange path gives the probability of a customer will use toothpaste brand T for the entire month ⇒ $P(T \rightarrow T) = (0.2)(0.9) = 0.18$

The green path gives the probability of a customer using brand T’ at the beginning of the month, then switching to band T at the end of the month ⇒ \[P(T’ \rightarrow T) = (0.8)(0.7) = 0.56\]

Now, we can sum the probabilities of each possible transition sequence ending in state $T$ in order to determine the total probability a custormer will be using brand T by the end of the month: $P(T) = P(T \rightarrow T) + P(T’ \rightarrow T) = (0.18)(0.56) = 0.74$

There is a 74% chance a random customer in the toothpaste market will be using brand T by the end of the month

Since we only have two possible states in our system, subtracting $P(T)$ from 1 will give us the probability a random customer will be using brand T’ by the end of the month: $P(T’) = 1 – P(T) = 1 – 0.74 = 0.26$

Importantly, we can construct a state distribution matrix giving the the probabilities for the toothpaste brand a randomly sampled customer will be using after one month:

Specifically, this is saying that the dot product between a state distribution matrix for time $i$th time step and a corresponding transition probability matrix will return the state distribution matrix for the $i+1$th time step

Let’s check and make sure this hold true for our example:

Mid-ninteenth century: mainstream science believed light particles were (mechanical) waves traveling through a medium called the luminiferous ether

Mid-ninteenth century definition of a wave:

Waves = a disturbance traveling through a medium

Note: this is essentially what our current definition of a mechanical wave is today

Medium = the material or substance through which a wave is traveling

Example 1: a dewdrop falling into a pond (Figure 1)

Here, the medium is the water in the pond

The initial disburbance is the dewdrop landing in the water

The water particles initially disturbed by the dewdrop further disturbs the position of surrounding water particles

This disturbance is further propagated throughout the medium (i.e., the pond)

Example 2: sound waves from clapping

Medium = air

Initial disturbance = compression of air molecules

Compressed air molecules causes them to collide with one another and generate sound waves

Note: both of these examples are mechanical waves

We knew from research such as Young’s double-slit experiment (1801) that light has wave-like properties

Specifically, it showed one of the hallmark signs of wave behavior: interference

Some unaswered questions of the mid-19th century

How could they define light in terms of its wave-like properties?

Note: they were trying to define light in terms of mehcanical waves (they didn’t know about electromagntic waves back then)

They theorized that light (e.g., light traveling from the Earth to the Sun) could be explained as a disturbance propagating through a medium

People called this medium the luminiferous ether

Big question: does the luminferous ether exist?

Luminiferous ether = medium through which light (supposedly) propogates

One major goal of mainstream science back then was to detect/validate the existence of this medium

Note: if there is a luminiferous ether, the Earth must be traveling fast relative to it

Not only is the Earth rotating on its own axis, but is also treaveling along an elliptical orbit around the Sun at $\approx 30 \mathrm{km/s}$

Moreover, the Sun is estimated to orbit around the center of the galaxy at $\approx 200 \mathrm{km/s}$

As far as our galaxy is concerned, we don’t really know what it’s doing, we just know its moving

Most scientists theorize our galaxy is rotating around a black hole

Take-home-message: if the luminiferous ether exists, Earth’s position should be constantly changing relative to it

Reasoning behing this:

The odds of us being stationary relative to such a medium are essentially zero

We should either be moving relative to the ether or the ether should be moving relative to us

Thus, we should be able to detect some sort of “ether wind” or the “current” associated with the luminiferous ether

Aside: waves propagate faster in the direction which current is moving

Example: a dew drop falling in a stream with a current flowing (Figure 2)

Here, the medium is the water in the stream and the initial disturbance is the dewdrop falling into the stream

Propagation of medium distortion (i.e., the waves/ripples in the stream) will occur more quickly in the direction of current (i.e., movment to the left)

The Michelson-Morely Experiment:

Experiment Background

Assuming there did exist a luminiferous ether, let $\overrightarrow{s}$ be the velocity of the its ether wind

From our dicussion on wave propogation speed and currents, light that is propagated in the same direction as $\overrightarrow{s}$ show travel at a faster velocity than light propagated in the $-\overrightarrow{s}$ direction

For a while, no one could figure out how to test this because the tools/technology did not yet exist that could detect velocities near the speed of light (thus, any differences they would have expected to find were inmeasureable)

Eventually Michelson and Morely designed an experiment that was able to work around this issue using wave interference

Recall: interference is a hallmark behavior of waves

Instead of attempting to measure the speed of light emitted in different directions, they split light into two different directions, recombined them, and observed the interference patterns

They reasoned, that if light emitted in different directions traveled at different speeds, then different interference patterns would result

However, this isn’t what happened!!

No matter how they oriented the apparatus (no matter the time of day and/or year), they always observed the same interference pattern

Conclusion: the luminiferous ether doesn’t seem to affect light waves ⇒ breakdown of the idea behind a luminiferous ether and/or an “absolute” inertial frame of reference through which light traveles

Titled one of “the most famous failed experiments”

Note: there were other experiments besides this one at the time that were also causing people to question the existence of a luminiferous ether

As it turns out, no matter the reference frame, light always travels at a constant speed!

Sources

Reiher, M., & Wolf, A. (2015). Relativistic quantum chemistry: The fundamental theory of molecular science. Weinheim: Wiley-VCH Verlag GmbH & KGaA.

The variance formula contains a squared exponent to account for any possible negative data values ⇒ prevents data points from cancelling eachother out

Although an operator such as the absolute value initially seems like an appropriate way to describe the variance of a data set, it does’nt always provide “good” answers (i.e., it can cause problems with certain data sets that squaring doesn’t create)

Any measure of variance should have squared units with respect to the units used in the data set we are taking the variance of

E.g., if our example data set has units in $\mathrm{mm}$, the units of the variance will be $\mathrm{mm^2}$

However, interpreting values that has squared units of what we were originally measuring can be counter-intuitive

So, we usually calculate something called the standard deviation to get a better intuition when interpreting the variance of a data set

For our example data set, $sigma = \sqrt{5.08333 \mathrm{mm^2}} \approx 2.2546 \mathrm{mm}$ (units aren’t squared!)

Take-home-message: the standard deviation is better for gaining intuition about the spread of data than variance because it gives a value in units rather than units squared

Review of probabilities

Suppose $A$ and $B$ are two independent events (i.e., the chances of event $B$ occuring are independent of whether event $A$ occurs and vice-versa)

Some notations:

Let $S$ denote the all possible events that can occur in some sample space

$P(A \cap B)=$ the probability of event $A$ AND event $B$ occuring

Ven Diagram (shaded region represents the case where both events occur):

We can find the probability of multiple independent events occuring by multipliying the probabilities of each individual event together

For our events $A$ and $B$, $\boxed{P(A \cap B)= P(A)P(B)}$

Example: the probability of flipping two heads in a row if we flip a coin twice

Let event $A$ be the event where the first coin toss is heads

For an unbiased coin, $P(A) = 0.5$

Let event $B$ be the event where the second coin toss is heads

For an unbiased coin, $P(B) = 0.5$

Since events $A$ and $B$ are independent of one another (our outcome of the second coin toss is independent of the outcome of our first coin toss and vice-versa), we can use our expression for $P(A \cap B)$ to determine the probability of getting two heads:

$P(A \cap B) = P(A)P(B) = 0.5 \cdot 0.5 = 0.25$

Thus, there is a 25% chance of getting two heads after flipping an unbiased coin twice

$P(A \cup B)=$ the probability of event $A$ OR event $B$ occuring

Ven Diagram (shaded region represents cases where at least one of our two events of interest occur):

We can find the probability of at least one of these two events occuring by summing the individual probabilities and subtracting any double counting

For our events $A$ and $B$, $\boxed P(A \cup B) = P(A) + P(B) – P(A \cap B)$

Example: if we roll two dice, what is the probability that at least one of die giving a even number?

Let event $A$ be the event that the first die gives an even number

$P(A) = 0.5$ is the probability of event $A$ occuring

Let event $B$ be the event that the sescond die gives an even number

$P(B) = 0.5$ is the probability of event $B$ occuring

Note: if we just added $P(A)$ and $P(B)$ together to determine $P(A \cup B)$, we would be double-counting the probability associated with both events $A$ and $B$ occuring ⇒ need to subtract $P(A \cap B)$ from our summed probabilities

$P(A \cup B) = P(A) + P(B) – P(A \cap B)$

First determine $P(A \cap B)$:

$P(A) = 0.5$

$P(B) = 0.5$

$P(A \cup B) = P(A)P(B) = 0.5(0.5) = 0.25$

$P(A \cup B) = 0.5 + 0.5 – 0.25 = 0.75$

Thus, there is a 75% chance of rolling at least one even number if we roll two die

$P(\mathrm{neither} A \mathrm{nor} B) = P(\mathrm{not} A) \cdot P(\mathrm{not} B)$

Venn Diagram (shaded region represents all possible events besides events $A$ or $B$):

Note: $P(A \cup B) = 1 – P(\mathrm{neither} A \mathrm{nor} B)$

Often times, it is convienent to find the probability of something by considering the probability that the “opposite” of what you are trying to find and subtracting that probability from 1

Example: use this strategy to compute the probability of rolling at least one even die if we roll two dice

The “opposite” of rolling at least one even number is to roll two odd numbers

Let $A$ denote the event where the first die gives an odd number

$P(A) = 0.5$

Let $B$ denote the event where the second die gives an odd number

$P(B) = 0.5$

Subtracting the product of the probabilities of our “opposite” events from 1 will give us the total probability of at least one “non-opposite” event occuring

Gravity is fundamental to everything humans have experienced in (written) history

Newton was the first person recorded in modern history to question why objects always fall towards the ground

Used gravity to explain this phenomena

Lead to the development of classical mechanics (later updated by Einstein)

Note: classical mechanics applies to macroscopic objects that are traveling far slower than the speed of light

Importantly, we currently do not have a very good understanding of what is causing gravity

Question: what is mass and why do bodies of mass “gravitate” towards each other?

Answer: we dont know

Comments/random thoughts: this is kind of like the opposite of diffusion (diffusion = the tendency for particles to move from areas of high concentration gradients – i.e. – areas with a lot of particles, to areas of low concentration gradients – i.e. – areas with few particles)

Like diffusion, gravity seems to occur spontaneously, so maybe it is the result of something that can be described as “energetically favorable”

Q: is there a “randomness” component to gravity like there is for the diffusion of particles?

Although we still dont exactly know why gravity exists, we are pretty good at describing how gravity behaves (in the context of systems that follow principles of classical mechanics)

Newton’s Law of Gravitation

Generally, gravity is defined as the attractive force $\overrightarrow{F}_g$ between two objects with positive nonzero masses $m_1$ and $m_2$ whose centers of mass are separated by a distance vector $\overrightarrow{r}$

Note: Newton’s law is stated in the context of particles

Equation for the gravitational force between two objects:

$\left\| \overrightarrow{F}_g \right\| = G \frac{m_1 m_2}{\left\| \overrightarrow{r} \right\|}$

Variables:

$\left\| \overrightarrow{F}_g \right\| = $ the magnitude of the gravitational force between object 1 and object 2 (SI units: Newton)