Logarithms

This Frequently Asked Question (FAQ) tutorial on logarithms is designed to give intro computer science students a refresher on logarithms: what they are, when and why they might be useful, and how to graph them using the Python matplotlib library.

Definition

The logarithm is the inverse of the exponential operation.

If b^x = n, then x = log_b n.

The logarithm answers the question “to what number would I raise the base to get this value?”

Examples, base-10

A common base for working with algorithms is base-10:

Because 10² = 100, we can say that log₁₀ of 100 = 2.
Because 10³ = 1000, log₁₀ 1000 = 3.
Because 10⁰ = 1, log₁₀ 1 = 0.
Because 10^1.4313 = 27, we can say that log₁₀ 27 = 1.4313.

Additional examples, working the other way:

log₁₀ 100,000 = 5, because 10⁵ = 100000.
log₁₀ 1 = 0, because 10⁰ = 1.
log₁₀ 50 = somewhere between 1 (because 10¹ = 10) and 2 (because 10² = 100).
log₁₀ 50 comes out to be around 1.7, because 10^1.7 = 50.

Logarithms in base-2

When working with computers, it’s common to use base-2 instead of base 10, particularly when analyzing algorithms. This is because problem-solving strategies that use a divide-and-conquer process can be easily described using base-2 values.

A list of n sorted values, for example, can be divided in half log₂ n times to get to a single value.

Example: n = 9 items in a list, divided in half 3 times:

Original list: 2, 5, 8, 9, 13, 15, 21, 26, 29
Division one: 2, 5, 8, 9, 13
Division two: 2, 5
Division three: 2

Example, base-2

Because 2⁵ = 32, log₂ 32 = 5.
Because 2⁰ = 1, log₂ 1 = 0.
Because 2^3.7 = 13, log₂ 13 = 3.7.

Or, working the other way:

log₂ 16 = 4, because 2⁴ = 16
log₂ 256 = 8, because 2⁸ = 256
log₂ 1000 = 9.967, because 2^9.967 = 1000

Where is the log₂ button on my calculator?

There probably isn’t one, but you can calculate it using the standard log button, which is log₁₀. Because

log_b a = log_d a / log_d b

we can see that

log₂ a = log₁₀ a / log₁₀ 2

So, to calculate log₂ 28, you would enter

log₁₀ 28 / log₁₀ 2

Why logarithms are useful

Some progressions of numbers can be easily understood in a summative way: adding n more items to a pile of existing items increases the total number of items in the pile by an amount n.

Other series can be better understood in a multiplicative way: increasing the pile by a factor of 10 (n¹) yields a smaller pile than if we increased the pile by a factor of 100 (n²).

For situations where a range of values covers a larger scale, using logarithms can help us better understand the way in which those values are changing.

Example

Examine this graph of y = x² (below). We can see the behavior of the function pretty well on the graph, but the y-axis shoots up so quickly, we aren’t able to use as much of the x-axis to to interpret the graph. The scales on the axes are linear scales where each tick mark along the axis indicates an equal increment.

Now examine this graph of the same function, y = x², but plotted on a “log-log” scale (ie. a logarithmic scale for both x and y-axes). Each tick mark along these axes represents not a linear increase but a factor increase: the tick distance between the values 1 and 10 is the same as the tick distance between 10 and 100, and 100 and 1000, and 1,000 and 10,000.

A lot of systems in nature behave as power functions or exponential functions, and it’s common to see graphs of these behaviors with logarithmic scales on one or both axes.

Axis scales for the computer scientist

As mentioned early, many algorithms have a performance related to the log of some value—log n, perhaps, or sometimes n log n (n times the log of n). If we have performance data (execution times for a series of increasingly large data sets, for example) and we want to identify it as behaving according to n log n, plotting that data on a log-log graph does not yield a straight line. A straight line on a log-log graph would reveal a power relationship, not a log relationship.

If we expect n log n behavior, however, we can plot performance time versus n log n. If time goes up linearly, it provides evidence that the performance time varies linearly as a function of n log n.

Example

A complete function showing how to create a sort time vs. n log n plot is given here, along with the corresponding output after running it with an implementation of the Quicksort algorithm.

def produce_graph(test_sizes, times):
    """This function takes a list of test_sizes and
    a corresponding list of times to complete sorting
    the elements of the given size, and produces a
    "time vs. n log n" plot to attempt to identify if
    there is a linear relationship, ie. the algorithm
    has a behavior of O(n log n).

    Requires the scipy stats package, numpy, matplotlib, math, and
    os libraries to be imported. Example import:
    
        import time
        import random
        import math
        import os
        import matplotlib.pyplot as plt
        import numpy as np
        from scipy import stats
    """
    
    # create numpy array of calculated `n log n` values
    n_log_n = np.array([test *  math.log(test, 2) for test in test_sizes])
    
    # create numpy array of the times to sort
    times = np.array(times)
    
    # set up figure that we can save later
    fig = plt.figure()     
    plt.title('Time vs n log n for Quicksort')
    plt.ylabel('sort time (seconds)')
    plt.xlabel('n log n, for array size n')
    
    # put data on figure, 'bo' = blue dots
    plt.plot(n_log_n, times, 'bo')      
    
    # use scipy to do linear regression on data
    slope, intercept, rval, pval, stderr = stats.linregress(n_log_n, times)
    
    # plot the regression line 
    plt.plot(n_log_n, slope * n_log_n + intercept, color='red')
    
    # put the regression formula and the R^2 value on the graph
    plt.annotate("y=%.5Ex+%.5E\nR$^2$=%.5f"%(slope, intercept, rval**2), xy=(0.15,0.75), xycoords='figure fraction')
    
    # Save the plot in a directory called plots
    if not os.path.exists('plots'):
        os.mkdir('plots')
    fig.savefig('plots/quicksort_performance.png')
    
    # Show the graph on screen
    plt.show()

The resulting graph:

Given the high R² value, which suggests a strong correlation between the data that was collected and the linear regression model given, we have good evidence that Quicksort, on average, has a performance of O(n log n).

Computer Science

Logarithms

Definition

Examples, base-10

Logarithms in base-2

Example, base-2

Where is the log₂ button on my calculator?

Why logarithms are useful

Example

Axis scales for the computer scientist

Example

References

Logarithms

Definition

Examples, base-10

Logarithms in base-2

Example, base-2

Where is the log2 button on my calculator?

Why logarithms are useful

Example

Axis scales for the computer scientist

Example

References

Where is the log₂ button on my calculator?