Skip to Content

Read data from INBO databases (SQL Server) with R

Database access using R

Connection to INBO database

Here we provide the approach using the package DBI, which is also used by RStudio.[1]. This package enables the link between R and the (remote) database. After installation of the needed packages (install.packages(c("DBI", "glue", "tidyverse"))), the packages can be loaded:

library(DBI)
library(glue)
library(tidyverse)

To create a database connection, different approaches are available as explained in the next section:

  1. Use the dedicated connect_inbo_dbase function in the inbodb package
  2. Setup the required parameters yourself
  3. Reuse existing registered connections on your computer

Inbodb function

To support the connection to INBO databases, a dedicated function is available in the inbodb package, called connect_inbo_dbase. The function provides support to connect to INBO databases and uses the Connections pane in the Rstudio interface:

odbc-connecties

To use the connect_inbo_dbase function, make sure to install the inbodb package, following the installation instructions. After a successfull installation, load the library and create a database connection:

library(inbodb)
my_connection <- connect_inbo_dbase("D0021_00_userFlora")

Once the connection is successfully established, the database can be queried.

Remark for Linux users: When working in Linux, this setup (using Trusted connection) requires an active kerberos session. More information about the setup and functionality is to be found in the tutorial on kerberos installation.

Setting up the connection yourself

In case you want to setup the connection yourself (e.g. errors using inbodb), you can do so by providing the necessary parameters:

my_connection <- DBI::dbConnect(odbc::odbc(),
      driver = "ODBC Driver 17 for SQL Server",
      server = "<server hostname or IP>",
      port = 1433,
      database = "D0021_00_userFlora", # or your database of interest
      trusted_connection = "Yes")

The most important parameters are server and database, the others should normally be kept as such (except for maybe driver). For server, keep in mind that there is more than one server: database names starting with M, S or W use another server than those starting with D. The database name is the name of the database (if you can’t remember the name, connect with a database you do know, e.g. D0021_00_userFlora and you’ll see an overview of the existing databases on that server after connecting.) The driver version depends on your computer and can be looked up using odbc::odbcListDrivers().

Use existing MSAccess connection name

When you query data from a SQL database that is already accessible using MSAccess, such a database is also accessible from R. For Windows users, the most important element is to know the so-called DSN (i.e. a registered Data Source Name). Actually, it is just the name of the database as it is known by your computer (and MS Access). The easiest way to check the DSN is to check the registered ODBC connections in the administrator tools menu.

For Dutch-speaking Windows 7 users:

> Kies in het Configuratiescherm van Windows de optie Systeembeheer > Gegevensbronnen (ODBC). De optie Systeembeheer verschijnt in de categorie Systeem en onderhoud.

You should see a list similar to the list underneath, with the names of the available DSN names enlisted:

odbc-connecties

An alternative way to check the DSN name of a database already working on with Access, is to check the DSN inside MS Access (in dutch, check menu item Koppelingsbeheer):

access-dsn

For example, the DSN name UserFlora or Cydonia-prd can be used to query these databases and extract data from it with similar queries to the one used in MSAccess. First of all, the connection with the database need to be established, by using the odbcConnect function, providing the DSN name as argument:

For Windows users:

my_connection <- odbcConnect("UserFlora")

Once the connection is successfully established, the database can be queried.

Get a complete table from the database

The function dbReadTable can be used to load an entire table from a database. For example, to extract the tblTaxon table from the flora database:

rel_taxa <- dbReadTable(my_connection, "relTaxonTaxonGroep")
head(rel_taxa) %>% knitr::kable()
ID TaxonGroepID TaxonID
1 4 1
2 4 2
3 4 3
4 4 4
5 4 5
6 4 6

The connection my_connection, made earlier, is used as the first argument. The table name is the second argument.

Remark: If you have no idea about the size of the table you’re trying to load from the database, this could be rather tricky and cumbersome. Hence, it is probably better to only extract a portion of the table using a query.

Execute a query to the database

The function dbGetQuery provides more flexibilty as it can be used to try any SQL-query on the database. A complete introduction to the SQL language is out of scope here. We will focus on the application and the reusage of a query.

meting <- dbGetQuery(my_connection, paste("SELECT TOP 10 * FROM dbo.tblMeting", 
                                          "WHERE COR_X IS NOT NULL"))
head(meting) %>% knitr::kable()
ID WaarnemingID TaxonID MetingStatusCode Cor_X Cor_Y CommentaarTaxon CommentaarHabitat CREATION_DATE CREATION_USER UPDATE_DATE UPDATE_USER
2 21748 3909 GDGA 109948 185379 NA NA NA NA NA NA
14 45523 3909 GDGA 127708 179454 NA NA NA NA NA NA
15 124394 3909 GDGA 109424 192152 NA NA NA NA NA NA
23 38561 3909 GDGA 128290 179297 NA NA NA NA NA NA
24 126500 3909 GDGA 98714 178373 NA NA NA NA NA NA
173 73725 3909 GDGA 102612 189891 NA NA NA NA NA NA

Create and use query templates

When you regularly use similar queries, with some minimal alterations, you do not want to copy/paste each time the entire query. It is prone to errors and you’re script will become verbose. It is advisable to create query templates, that can be used within the dbGetQuery function.

Consider the execution of the following query. We are interested in those records with valid X and Y coordinates for the measurement, based on a given dutch name:

subset_meting <- dbGetQuery(my_connection, 
    "SELECT meet.COR_X
         , meet.Cor_Y
         , meet.MetingStatusCode
         , tax.NaamNederlands
         , tax.NaamWetenschappelijk
         , waar.IFBLHokID
    FROM  tblMeting AS meet
        LEFT JOIN tblTaxon AS tax ON tax.ID = meet.TaxonID
        LEFT JOIN tblWaarneming AS waar ON waar.ID = meet.WaarnemingID
    WHERE meet.Cor_X IS NOT NULL
        AND meet.Cor_X != 0
        AND tax.NaamNederlands LIKE 'Wilde hyacint'")
head(subset_meting) %>% knitr::kable()
COR_X Cor_Y MetingStatusCode NaamNederlands NaamWetenschappelijk IFBLHokID
88720 208327 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 11195
24106 199925 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 9601
103111 190915 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 6990
118123 183942 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 5672
106107 182343 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 5217
105765 180785 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 4830

If we need this query regularly, but each time using a different tax.NaamNederlands (only the name changes), it is worthwhile to invest some time in the creation of a small custom function that uses this query as a template. Let’s create a function flora_records_on_dutch_name that takes a valid database connection and a given Dutch name and returns the relevant subset of the data for this query:

flora_records_on_dutch_name <- function(dbase_connection, dutch_name) {
     dbGetQuery(dbase_connection, glue_sql(
            "SELECT meet.Cor_X
                 , meet.COr_Y
                 , meet.MetingStatusCode
                 , tax.NaamNederlands
                 , tax.NaamWetenschappelijk
                 , waar.IFBLHokID
            FROM  dbo.tblMeting meet
                LEFT JOIN dbo.tblTaxon AS tax ON tax.ID = meet.TaxonID
                LEFT JOIN dbo.tblWaarneming AS waar ON waar.ID = meet.WaarnemingID
            WHERE meet.Cor_X IS NOT NULL
                AND meet.Cor_X != 0
                AND tax.NaamNederlands LIKE {dutch_name}",
            dutch_name = dutch_name,
            .con = dbase_connection))
}

Hence, instead of copy-pasting the whole query each time (which could be error-prone), we can reuse the function for different names:

hyacint <- flora_records_on_dutch_name(my_connection, "Wilde hyacint")
head(hyacint) %>% knitr::kable()
Cor_X COr_Y MetingStatusCode NaamNederlands NaamWetenschappelijk IFBLHokID
88720 208327 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 11195
24106 199925 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 9601
103111 190915 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 6990
118123 183942 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 5672
106107 182343 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 5217
105765 180785 GDGA Wilde hyacint Hyacinthoides non-scripta (L.) Chouard ex Rothm. 4830
bosanemoon <- flora_records_on_dutch_name(my_connection, "Bosanemoon")
head(bosanemoon) %>% knitr::kable()
Cor_X COr_Y MetingStatusCode NaamNederlands NaamWetenschappelijk IFBLHokID
119247 204936 GDGA Bosanemoon Anemone nemorosa L. 10234
73658 199081 GDGA Bosanemoon Anemone nemorosa L. 8823
72752 199010 GDGA Bosanemoon Anemone nemorosa L. 8824
72921 198828 GDGA Bosanemoon Anemone nemorosa L. 8824
72874 198735 GDGA Bosanemoon Anemone nemorosa L. 8824
72887 198660 GDGA Bosanemoon Anemone nemorosa L. 8824

Remark: Do not forget to close your connection when finished.

dbDisconnect(my_connection)

The glue_sql function

In order to accomplish the re-usage of a query for different input names (dutch_name), the glue_sql function is used from the glue package. The glue_sql function (and the more general glue function) provides the ability to combine text and variable values in a single charactor string (i.e. the query to execute). For each variable name required in the query (any part of your query you want to have interchangeable), a representation in the query is given by the variable name you use in R, put in between curly brackets. For example, if you have the dutch_name variable in R, you can use it inside the query as {dutch_name}:

dutch_name <- 'Jan'
an_integer <- 3
a_float <- 2.8
glue('This prints a combination of a name: {dutch_name}, an integer: {an_integer} and a float value: {a_float}')
## This prints a combination of a name: Jan, an integer: 3 and a float value: 2.8

Whereas the glue function is a general function for strings, the glue_sql function is specifically created to setup queries to databases. More information is provided here and here.

[1] Formerly, connections were made using the package RODBC