Extract attributes, text and tag name from html.
html_text(x, trim = FALSE) html_name(x) html_children(x) html_attrs(x) html_attr(x, name, default = NA_character_)
| x | A document, node, or node set. |
|---|---|
| trim | If |
| name | Name of attribute to retrieve. |
| default | A string used as a default value when the attribute does not exist in every node. |
html_attr, html_tag and html_text, a character
vector; html_attrs, a list.
movie <- read_html("https://en.wikipedia.org/wiki/The_Lego_Movie") cast <- html_nodes(movie, "tr:nth-child(8) .plainlist a") html_text(cast)#> [1] "Chris Pratt" "Will Ferrell" "Elizabeth Banks" "Will Arnett" #> [5] "Nick Offerman" "Alison Brie" "Charlie Day" "Liam Neeson" #> [9] "Morgan Freeman"html_name(cast)#> [1] "a" "a" "a" "a" "a" "a" "a" "a" "a"html_attrs(cast)#> [[1]] #> href title #> "/wiki/Chris_Pratt" "Chris Pratt" #> #> [[2]] #> href title #> "/wiki/Will_Ferrell" "Will Ferrell" #> #> [[3]] #> href title #> "/wiki/Elizabeth_Banks" "Elizabeth Banks" #> #> [[4]] #> href title #> "/wiki/Will_Arnett" "Will Arnett" #> #> [[5]] #> href title #> "/wiki/Nick_Offerman" "Nick Offerman" #> #> [[6]] #> href title #> "/wiki/Alison_Brie" "Alison Brie" #> #> [[7]] #> href title #> "/wiki/Charlie_Day" "Charlie Day" #> #> [[8]] #> href title #> "/wiki/Liam_Neeson" "Liam Neeson" #> #> [[9]] #> href title #> "/wiki/Morgan_Freeman" "Morgan Freeman" #>html_attr(cast, "href")#> [1] "/wiki/Chris_Pratt" "/wiki/Will_Ferrell" "/wiki/Elizabeth_Banks" #> [4] "/wiki/Will_Arnett" "/wiki/Nick_Offerman" "/wiki/Alison_Brie" #> [7] "/wiki/Charlie_Day" "/wiki/Liam_Neeson" "/wiki/Morgan_Freeman"