Parse an html table into a data frame.
html_table(x, header = NA, trim = TRUE, fill = FALSE, dec = ".")
| x | A node, node set or document. |
|---|---|
| header | Use first row as header? If |
| trim | Remove leading and trailing whitespace within each cell? |
| fill | If |
| dec | The character used as decimal mark. |
html_table currently makes a few assumptions:
No cells span multiple rows
Headers are in the first row
sample1 <- minimal_html("<table> <tr><th>Col A</th><th>Col B</th></tr> <tr><td>1</td><td>x</td></tr> <tr><td>4</td><td>y</td></tr> <tr><td>10</td><td>z</td></tr> </table>") sample1 %>% html_node("table") %>% html_table()#> Col A Col B #> 1 1 x #> 2 4 y #> 3 10 z# Values in merged cells will be duplicated sample2 <- minimal_html("<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td>1</td><td>2</td><td>3</td></tr> <tr><td colspan='2'>4</td><td>5</td></tr> <tr><td>6</td><td colspan='2'>7</td></tr> </table>") sample2 %>% html_node("table") %>% html_table()#> A B C #> 1 1 2 3 #> 2 4 4 5 #> 3 6 7 7# If the table is badly formed, and has different number of columns # in each row, use `fill = TRUE` to fill in the missing values sample3 <- minimal_html("<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td colspan='2'>1</td><td>2</td></tr> <tr><td colspan='2'>3</td></tr> <tr><td>4</td></tr> </table>") sample3 %>% html_node("table") %>% html_table(fill = TRUE)#> A B C #> 1 1 1 2 #> 2 3 3 NA #> 3 4 NA NA