Munge an rtweet data.frame (according to personal preferences) for subsequent analysis.

clean_tweets_at(data = NULL, facet = NULL, trim = TRUE,
  cols = c("status_id", "created_at", "user_id", "screen_name", "text",
  "display_text_width", "reply_to_status_id", "is_quote", "is_retweet",
  "favorite_count", "retweet_count", "hashtags", "symbols", "urls_url",
  "urls_expanded_url", "media_expanded_url", "ext_media_expanded_url"),
  timezone = "America/Chicago")

clean_tweets(..., facet)

Arguments

data

data.frame (created using rtweet package).

facet

bare for NSE; character for SE. Name of column in data used for facetting. Set to NULL as default even though it is not required in order to simplify internal code. Included in cols if trim = TRUE.

trim

logical. Indicates whether or not to select only certain columns (and drop the others).

cols

character (vector). Name(s) of column(s) in data to keep. Only relevant if trim = TRUE.

timezone

character. Passed directly to lubridate::with_tz() as tzone parameter.

...

dots. Additional paramaters.

Value

data.frame.

Details

Converts nested lists to character(s). Adds a timestamp column that is derived from the created_at column. Also, adds a time column that represents the hour in the day of created_at.

See also

https://juliasilge.com/blog/ten-thousand-data/. https://buzzfeednews.github.io/2018-01-trump-twitter-wars/.