2  Names and values

2.1 Quiz

More details are available in the Names and values Quiz book chapter.

2.1.1 Quiz 1

Given the following data frame, how do I create a new column called “3” that contains the sum of 1 and 2? You may only use $, not [[. What makes 1, 2, and 3 challenging as variable names?

Answer

1 and 2 and 3 are challenging as variable names since they are non-syntactic names. Though they can still be used (here) via coercing:

2.1.2 Quiz 2

In the following code, how much memory does y occupy?

Answer

In this code, y takes slightly more memory than x, but not 3 times more (only 48 bytes). This is because y is a list containing 3 references to the 1 vector underlying x. List structure takes up 48 bytes, the vector takes up 8MB.

2.1.3 Quiz 3

On which line does a get copied in the following example?

Answer

a gets copied at line 3 of previous block b[[1]] <- 10, since that is when b is modified and a new vector is created to contain data that b points to.

2.2 Binding basics

More details are available in the Names and values Binding basics book chapter.

2.2.1 Ex. 1

Explain the relationship between a, b, c, and d in the following code:

Answers

This can be determined using their object addresses:

  • a points to a vector
  • b points to the same vector as a
  • c points to the same vector as a and b
  • d points to a different vector

2.2.2 Ex. 2

The following code accesses the mean function in multiple ways. Do they all point to the same underlying function object? Verify this with lobstr::obj_addr().

Answer

All expressions above point to the same function. We can investigate using the lobstr::obj_addr() function.

2.2.3 Ex. 3

By default, base R data import functions, like read.csv(), will automatically convert non-syntactic names to syntactic ones. Why might this be problematic? What option allows you to suppress this behaviour?

Answer

read.csv() is problematic because they by default creates syntactic names from headers. This changes data without user input. We can suppress this behaviour by setting check.names = FALSE.

2.2.4 Ex. 4

What rules does make.names() use to convert non-syntactic names into syntactic ones?

Answer

The rules for make.names() are:

  • Prepend X if first character is invalid
  • Translate all invalid characters to .
  • Append . if name matches R keyword
  • Deduplicate names

The function itself is written in C. The accessible R function is a wrapper around the C function that allows it to be vectorised and ensures the newly formatted names are unique.

2.2.5 Ex. 5

I slightly simplified the rules that govern syntactic names. Why is .123e1 not a syntactic name? Read ?make.names for the full details.

Answer

.123e1 is not a syntactic name because the remaining text following . can be intepreted as a scientific number 123e1 = 1.23

2.3 Copy-on-modify

More details are available in the Names and values Copy-on-modify book chapter.

2.3.1 Ex. 1

Why is tracemem(1:10) not useful?

Answer

The purpose of tracemem() is to tag a named object and see when its underlying data structure is copied. 1:10 is not a named object, so tracemem() wouldn’t be useful.

(In the case of this notebook, it is not useful also because webR does not support memory profiling.)

2.3.2 Ex. 2

Explain why tracemem() shows two copies when you run this code. Hint: carefully look at the difference between this code and the code shown earlier in the section.

Answer

tracemem() shows two copies because x[[3]] was replaced with a double. This means the object that x points to was modified (thus copied) twice; first to coerce a integer vector to a double vector, second to replace x[[3]] with 4

2.3.3 Ex. 3

Sketch out the relationship between the following objects:

Answer

The relationship is as follows:

  • a points to an ALTREP vector representing 1:10
  • b is a list containing 2 references to the ALTREP vector representing 1:10
  • c is a list containing a reference to b, a reference to a, and a reference to the ALTREP vector representing 1:10

2.3.4 Ex. 4

What happens when you run this code?

Answer

  • Line 1: Originally, x is a list with a reference to the vector 1:10
  • Line 2: x’s underlying list is then copied-on-modify; then the 2nd slot x[[2]] is changed to contain the original list object. Both list slots point to the same vector 1:10

2.4 Object size

More details are available in the Names and values Object size book chapter.

2.4.1 Ex. 1

In the following example, why are object.size(y) and obj_size(y) so radically different? Consult the documentation of object.size().

Answer

object.size(y) and obj_size(y) are different because object.size(y) does not take into account the degree of memory sharing in the list.

2.4.2 Ex. 2

Take the following list. Why is its size somewhat misleading?

Answer

The size are misleading because mean, sd, var are base R functions; the list does not contain exact copies, but only references to their C implementations. As in Ex. 1, this would be shown via object.size() and obj_size(). The obj_size() function includes the environment needed to run the function, which is why it is larger than object.size().

2.4.3 Ex. 3

Predict the output of the following code:

Answer

The object sizes are:

  1. 8MB, since a refers to a vector of 1 million doubles.
  2. 8MB, since b refers to two references to the same vector behind a.
  3. 8MB, since the list comprising a and b refer to the same vectorbehind a.
  4. 16MB, since b refers to two references to two different vectors; one is a copy of that behind a, and one is the original.
  5. 16MB, since of the 3 vector references it includes, two refers to the original vector behind a, and one refers to the modified copy behind b[[1]].
  6. 16MB, since b refers to two references to two different vectors; one is modified at step 4; one is modified at this step 6.
  7. 24MB, since a and b now refer to three different vectors: the original vector behind a, the modified copy behind b[[1]], and the modified copy behind b[[2]].
An even clearer answer:

As shown above; objects in parts 1 - 3 refer to one vector; objects in parts 4 - 6 refer to two vectors; the object in part 7 refer to three vectors.

2.5 Modify-in-place

More details are available in the Names and values Modify-in-place book chapter.

2.5.1 Ex. 1

Explain why the following code doesn’t create a circular list.

Answer

Because the 1st x refers to a different object than the 2nd x altogether. Upon line 2, a new list is created, with its first element pointing to the list behind the first x, then the new list is bound to x.

2.5.2 Ex. 2

Wrap the two methods for subtracting medians into two functions, then use the bench package to carefully compare their speeds. How does performance change as the number of columns increase?

NB: The methods are:

Answer

Method 2 would be faster than method 1 since fewer copies are performed.

The 2nd method seems to be 10x faster (likely due to needing fewer copies). We can test this again on a larger dataset.

2.5.3 Ex. 3

What happens if you attempt to use tracemem() on an environment?

Answer

tracemem() will not trace it, because environments are not copied when they are modified. They are always modified in place. In the context of this notebook, it will fail due to webR not supporting memory profiling.