The here package enables easy file referencing by using the top-level directory of a file project to easily build file paths. This is in contrast to using setwd()
, which is fragile and dependent on the way you order your files on your computer. Read more about project-oriented workflows:
What They Forgot to Teach You About R: “Project-oriented workflow” chapter by Jenny Bryan and Jim Hester
“Project-oriented workflow” blog post by Jenny Bryan
R for data science: “Workflow: projects” chapter by Hadley Wickham
For demonstration, this article uses a data analysis project that lives in /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
on my machine. This is the project root. The path will most likely be different on your machine, the here package helps deal with this situation.
The project has the following structure:
#> /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
#> ├── analysis
#> │ └── report.Rmd
#> ├── data
#> │ └── penguins.csv
#> ├── demo-project.Rproj
#> └── prepare
#> └── penguins.R
You can review the project on GitHub and also download a copy.
To start working on this project in RStudio, open the demo-project.Rproj
file. This ensures that the working directory is set to /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
, the project root. Opening only the .R
or the .Rmd
file may be insufficient!
Other development environments may have a different notion of a project. Either way, it is important that the working directory is set to the project root or a subdirectory of that path. You can check with:
setwd(project_path)
getwd()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project"
(See vignette("rmarkdown")
for an example where the working directory is set to a subdirectory on start.)
The intended use is to add a call to here::i_am()
at the beginning of your script or in the first chunk of your rmarkdown report.1 This achieves the following:
The first argument to here::i_am()
should be the path to the current file, relative to the project root. The penguins.R
script uses:
here::i_am("prepare/penguins.R")
#> here() starts at /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
here::i_am()
displays the top-level directory of the current project. Because the project has a prepare/
directory in its root that contains penguins.R
, it is correctly inferred as the project root.
After here::i_am()
, insert library(here)
to make the here()
function available:3
The top-level directory is also returned from the here()
function:
here()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project"
One important distinction from the working directory is that this remains stable even if the working directory is changed:
setwd("analysis")
getwd()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project/analysis"
here()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project"
setwd("..")
(I suggest to steer clear from ever changing the working directory. This may not always be feasible, in particular if the working directory is changed by code that you do not control.)
You can build a path relative to the top-level directory in order to build the full path to a file:
here("data", "penguins.csv")
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project/data/penguins.csv"
readr::read_csv(
here("data", "penguins.csv"),
col_types = list(.default = readr::col_guess()),
n_max = 3
)
#> # A tibble: 3 x 8
#> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Adelie Torge… 39.1 18.7 181 3750 male
#> 2 Adelie Torge… 39.5 17.4 186 3800 fema…
#> 3 Adelie Torge… 40.3 18 195 3250 fema…
#> # … with 1 more variable: year <dbl>
This works regardless of where the associated source file lives inside your project. With here()
, the path will always be relative to the top-level project directory.
here()
works very similarly to file.path()
or fs::path()
, you can pass path components or entire subpaths:
here("data/penguins.csv")
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project/data/penguins.csv"
As seen above, here()
returns absolute paths (starting with /
, <drive letter>:\
or \\
). This makes it safe to pass these paths to other functions, even if the working directory is changed along the way.
As of version 1.0.0, absolute paths passed to here()
are returned unchanged. This means that you can safely use both absolute and project-relative paths in here()
.
data_path <- here("data")
here(data_path)
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project/data"
here(data_path, "penguins.csv")
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project/data/penguins.csv"
The dr_here()
function explains the reasoning behind choosing the project root:
dr_here()
#> here() starts at /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project.
#> - This directory contains a file "prepare/penguins.R"
#> - Initial working directory: /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
#> - Current working directory: /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
The show_reason
argument can be set to FALSE
to reduce the output to one line:
dr_here(show_reason = FALSE)
#> here() starts at /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
The declaration of the active file via here::i_am()
also protects against accidentally running the script from a working directory outside of your project. The example below calls here::i_am()
from the temporary directory, which is clearly outside our project:
withr::with_dir(tempdir(), {
print(getwd())
here::i_am("prepare/penguins.R")
})
#> [1] "/tmp/RtmplUGHIO"
#> Error: Could not find associated project in working directory or any parent directory.
#> - Path in project: prepare/penguins.R
#> - Current working directory: /tmp/RtmplUGHIO
#> Please open the project associated with this file and try again.
This can also happen when a file has been renamed or moved without updating the here::i_am()
call. In the future, a helper function will assist with installing and updating suitably formatted here::i_am()
calls in your scripts and reports.
Other packages also export a here()
function. Loading these packages after loading here masks our here()
function:
library(plyr)
#>
#> Attaching package: 'plyr'
#> The following object is masked from 'package:here':
#>
#> here
here()
#> Error in here(): argument "f" is missing, with no default
One way to work around this problem is to use here::here()
:
here::here()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project"
The conflicted package offers an alternative: it detects that here()
is exported from more than one package and allows you to use neither until you indicate a preference.
library(conflicted)
#> Error in library(conflicted): there is no package called 'conflicted'
here()
#> Error in here(): argument "f" is missing, with no default
conflicted::conflict_prefer("here", "here")
#> Error in loadNamespace(name): there is no package called 'conflicted'
here()
#> Error in here(): argument "f" is missing, with no default
To eliminate potential confusion, here::i_am()
accepts a uuid
argument. The idea is that each script and report calls here::i_am()
very early (in the first 100 lines) with a universally unique identifier. Even if a file location is reused across projects (e.g. two projects contain a “prepare/data.R” file), the files can be identified correctly if the uuid
argument in the here::i_am()
call is different.
If a uuid
argument is passed to here::i_am()
:
here::i_am()
call that passes this very uuid
is among those 100 lines, and will be matcheduuid
is not found in the textUse uuid::UUIDgenerate()
to create universally unique identifiers:
uuid::UUIDgenerate()
#> [1] "48dec871-c60d-4c30-ba61-e1977219c0fc"
Ensure that the uuid
arguments are actually unique across your files! In the future, a helper function will assist with installing and updating suitably formatted here::i_am()
calls in your scripts and reports.
It is advisable to start a fresh R session as often as possible, especially before focusing on another project. There still may be legitimate cases when it is desirable to reset the project root.
To start, let’s create a temporary project for demonstration:
temp_project_path <- tempfile()
dir.create(temp_project_path)
scripts_path <- file.path(temp_project_path, "scripts")
dir.create(scripts_path)
script_path <- file.path(scripts_path, "script.R")
writeLines(
c(
'here::i_am("scripts/script.R")',
'print("Hello, world!")'
),
script_path
)
fs::dir_tree(temp_project_path)
#> /tmp/RtmplUGHIO/file5d2caf7faa0
#> └── scripts
#> └── script.R
writeLines(readLines(script_path))
#> here::i_am("scripts/script.R")
#> print("Hello, world!")
The script.R
file contains a call to here::i_am()
to declare its location. Running it from the current working directory will fail:
source(script_path, echo = TRUE)
#>
#> > here::i_am("scripts/script.R")
#> Error: Could not find associated project in working directory or any parent directory.
#> - Path in project: scripts/script.R
#> - Current working directory: /data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project
#> Please open the project associated with this file and try again.
To reset the project root mid-session, change the working directory with setwd()
. Now, the subsequent call to here::i_am()
from within script.R
works:
setwd(temp_project_path)
source(script_path, echo = TRUE)
#>
#> > here::i_am("scripts/script.R")
#> here() starts at /tmp/RtmplUGHIO/file5d2caf7faa0
#>
#> > print("Hello, world!")
#> [1] "Hello, world!"
To reiterate: a fresh session is almost always the better, cleaner, safer, and more robust solution. Use this approach only as a last resort.
The here package has a very simple and restricted interface, by design. The underlying logic is provided by the much more powerful rprojroot package. If the default behavior of here does not suit your workflow for one reason or another, the rprojroot package may be a better alternative. It is also recommended to import rprojroot and not here from other packages.
The following example shows how to find an RStudio project starting from a directory:
library(rprojroot)
find_root(is_rstudio_project, file.path(project_path, "analysis"))
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/here/demo-project"
Arbitrary criteria can be defined. See vignette("rprojroot", package = "rprojroot")
for an introduction.
Prior to version 1.0.0, it was recommended to attach the here package via library(here)
. This still works, but is no longer the recommended approach.↩
library(here)
no longer emits an informative message if here::i_am()
has been called before.↩
library(here)
emits a message that may be confusing if followed by the message from here::i_am()
.↩