GOKAM.co.uk

Save Google Inspection URL data into Google Cloud Storage

François Joly — Fri, 23 Sep 2022 13:10:33 +0000

step 1 : getting the URLs

This one is going to be quick, we will use the xsitemap package which crawls XML sitemap

library(xsitemap)
library(urltools)
library(XML)
library(httr)
upload <- xsitemapGet("https://www.rforseo.com/sitemap.xml")

## Reaching for XML sitemap... https://www.rforseo.com/sitemap.xml

## regular sitemap detected -  39  web page url(s) found

## ......................................

head(upload)

##                                                             loc    lastmod
## 1                                      https://www.rforseo.com/ 2022-07-29
## 2                  https://www.rforseo.com/classic-r-operations 2021-05-08
## 3                                 https://www.rforseo.com/intro 2022-02-17
## 4                               https://www.rforseo.com/r-intro 2022-08-18
## 5                           https://www.rforseo.com/rpivottable 2022-08-18
## 6 https://www.rforseo.com/analysis/count-words-n-grams-shingles 2021-04-06

step 2 : Launching the URL Inspection API in parallel

We use the parallel package to allow us to run several requests at the same time.

Warning, with regards to the URL Inspection API, the quota is enforced per Search Console website property (calls querying the same site)

I could be useful to create some extra properties using url directories

library(searchConsoleR)
library(lubridate)
library(parallel)
scr_auth()


res <- mclapply(1:nrow(upload), function(i) {
  cat(".")         
  url <-  upload[i,"loc"]
  result <- inspection(url, siteUrl = "sc-domain:rforseo.com", languageCode = NULL)
  
  text <- paste0(url,"§",
                 result[["indexStatusResult"]][["verdict"]],"§",
                 result[["indexStatusResult"]][["coverageState"]],"§",
                 result[["indexStatusResult"]][["robotsTxtState"]],"§",
                 result[["indexStatusResult"]][["indexingState"]],"§",
                 now())
  text
  
  
   }, mc.cores = detectCores())      ## Split this job across 10 cores

res <- data.frame(unlist(res))

library(stringr)

res[,c("url", "verdict", "coverageState", "robotsTxtState", "indexingState", "date")] <- str_split_fixed(res$unlist.r., '§', 6)
res$unlist.r. <- NULL

step 3 : Save data frame inside a Google Cloud Storage bucket

# Load the package
library(googleCloudStorageR)
library(bigQueryR)


## project id
gcs_global_bucket("mindful-path-205008")

gcs_auth()

## custom upload function to ignore quotes and column headers
f <- function(input, output) {
  write.table(input, sep = ",", col.names = FALSE, row.names = FALSE, 
              quote = FALSE, file = output, qmethod = "double")}

## upload files to Google Cloud Storage
gcs_upload(res, name = "res.csv", object_function = f,bucket = "gsc_backup")

R apps for SEOs

François Joly — Mon, 08 Feb 2021 22:23:29 +0000

Dear SEOs,

I’ve made app that you migh find useful

CTR by Average Position

The first app computes Google Search Queries CTR by Average Position.

👉 https://gokam.shinyapps.io/ctr_pos/

With a big website it looks like this

Green is the average per position, red dots are branded search queries

app code is open sourced here

Crawl Recursively XML sitemaps

The second app help to detects, Crawl and Download XML sitemaps

👉 https://gokam.shinyapps.io/xsitemap/

this app rely primarily on the xsitemap package

SEO Crawling & metadata extraction with R & RCrawler

François Joly — Wed, 25 Mar 2020 22:49:00 +0000

It will be a long article so I added a Table of content 👇 Fancy, right?

Table Of Contents

Crawl an entire website with Rcrawler

The INDEX variable
HTML Files

So how to extract metadata while crawling?
Explore Crawled Data with rpivottable
Extract more data without having to recrawl
Categorize URLs using Regex
What if I want to follow robots.txt rules?
What if I want to limit crawling speed?
What if I want to crawl only a subfolder?
How to change user-agent?
What if my IP is banned?
Where are the internal Links?
Count Links

Count outbound links
Count inbound links

Compute ‘Internal Page Rank’
What if a website is using a JavaScript framework like React or Angular?
So what’s the catch?

This tutorial is relying on a package called Rcrawler by Salim Khalil. It’s a very handy crawler with some nice native functionalities.

After R is being installed and rstudio launched, same as always, we’ll install and load our package:

# install to be run once
install.packages("Rcrawler")
# and loading
library(Rcrawler)

Crawl an entire website with Rcrawler

To launch a simple website analysis, you only need this line of code:

Rcrawler(Website = "https://www.gokam.co.uk/")

It will crawl the entire website and provide you with the data

Less than 30s to crawl a small website

After the crawl is being done, you’ll have access to:

The INDEX variable

it’s a data frame, if don’t know what’s a data frame, it’s like an excel file. Please note that it will be overwritten every time so export it if you want to keep it!

To take a look at it, just run

View(INDEX)

INDEX data frame

Most of the columns are self-explanatory. Usually, the most interesting ones are ‘Http Resp‘ and ‘Level‘

The Level is what SEOs call “crawl depth” or “page depth”. With it, you can easily check how far from the homepage some webpages are.

Quick example with BrightonSEO website, let’s do a quick ‘ggplot’ and we’ll be able to see pages distribution by level.

#here the code to run to see the plot

# install ggplot plot library to be run once
install.packages("ggplot2")
# Loading library
library(ggplot2)
# Convert Level to number
INDEX$Level <- as.integer(INDEX$Level)

# Make plot
# 1 define dimensions (only 'Level')
# 2 set up the plot type
# 3 customise the x scale, easier to read
ggplot(INDEX, aes(x=Level))+
       geom_bar()+
       scale_x_continuous(breaks=c(1:10))


#alternative command to count webpages per Level
table(INDEX$Level)

# Should display something like that:
# 0  1   2  3   4   5  6  7  8   9  10
# 1 32 306 91 116 127 61 54 90 149 255

HTML Files

By default, the rcrawler function also store HTML files in your ‘working directory’. Update location by running setwd() function

Each file is named for its crawl order. So the homepage should be 1.html

Let’s go deeper into options by replying to the most commons questions:

So how to extract metadata while crawling?

It’s possible to extract any elements from webpages, using a CSS or XPath selector. We’ll have to use 2 new parameters

PatternsNames to name the new parameters
ExtractXpathPat or ExtractCSSPat to setup where to grab it in the web page

Let’s take an example:

#what we want to extract
CustomLabels <- c("title",
                 "h1",
                 "canonical tag",
                 "meta robots",
                 "hreflang",
                 "body class")

# How to grab it
 CustomXPaths <- c("///title",
           "///h1",
           "//link[@rel='canonical']/@href",
           "//meta[@rel='robots']/@content",
           "//link[@rel='alternate']/@hreflang",
           "//body/@class")

 Rcrawler(Website = "https://www.brightonseo.com/",
       ExtractXpathPat = CustomXPaths, PatternsNames = CustomLabels)

You can access the scraped data in two ways:

option 1 = DATA – it’s an environment variable that you can directly access using the console. A small warning, it’s a ‘list’ a little less easy to read

View(DATA) will display something like that

If you want to convert it to a data frame, easier to deal with, here the code:

NEWDATA <- data.frame(matrix(unlist(DATA), nrow=length(DATA), byrow=T))

option 2 = extracted_data.csv

It’s a CSV file that has been saved inside your working directory along with the HTML files.

It might be useful to merge INDEX and NEWDATA files, here the code

MERGED <- cbind(INDEX,NEWDATA)

As an example, let’s try to collect webpage type using scraped body class

Seems that the first word is the page type

Let’s extract the first word and feed it inside a new column

MERGED$pagetype <- str_split_fixed(MERGED$X7, " ", 2)[,1]

A little bit a cleaning to make the labels easier to read

MERGED$pagetype_short <- str_replace(MERGED$pagetype, "-default", "")
 MERGED$pagetype_short <- str_replace(MERGED$pagetype_short, "-template", "")
#it's basically deleting "-default" and "-template" from strings as it doesn't help that much understanding data

the 3 steps being displayed

And then a quick ggplot

library(ggplot2)
p <- ggplot(MERGED, aes(x=Level, fill=pagetype_short))+
   geom_histogram(stat="count")+
   scale_x_continuous(breaks=c(1:10))
p

Count of Pagetype per level

Want to see something even cooler?

#install package plotly the first time
#install.packages("plotly")
 library(plotly)
 ggplotly(p, tooltip = c("count","pagetype_short"))

An interactive graph

This is a static HTML file that can be store anywhere, even on my shared hosting

Explore Crawled Data with rpivottable

#install package rpivottable the first time
#install.packages("rpivottable")
# And loading
 library(rpivottable)
# launch tool 
rpivotTable(MERGED)

This create a drag & drop pivot explorer

It’s also possible make some quick data viz

Full DEMO – see by yourself

Extract more data without having to recrawl

All the HTML files are in your hard drive, so if you need more data extracted, it’s entirely possible.

You can list of your recent crawl by using ListProjects() function,

it displays 2 recent crawling projects

First, we’re going to load the crawling project HTML files:

LastHTMLDATA <- LoadHTMLFiles("gokam.co.uk-242115", type = "vector")
# or to simply grab the last one:
LastHTMLDATA <- LoadHTMLFiles(ListProjects()[1], type = "vector")

LastHTMLDATA <- as.data.frame(LastHTMLDATA)
colnames(LastHTMLDATA) <- 'html'
LastHTMLDATA$html <- as.character(LastHTMLDATA$html)

Let’s say you forgot to grab h2’s and h3’s you can extract them again using the ContentScraper() also included inside rcrawler package.

for(i in 1:nrow(LastHTMLDATA)) {
   LastHTMLDATA$title[i] <- ContentScraper(HTmlText = LastHTMLDATA$html[i] ,XpathPatterns = "//title")
   LastHTMLDATA$h1[i] <- ContentScraper(HTmlText = LastHTMLDATA$html[i] ,XpathPatterns = "//h1")
   LastHTMLDATA$h2[i] <- ContentScraper(HTmlText = LastHTMLDATA$html[i] ,XpathPatterns = "//h2")
   LastHTMLDATA$h3[i] <- ContentScraper(HTmlText = LastHTMLDATA$html[i] ,XpathPatterns = "//h3")
 }

et voilaaa

Categorize URLs using Regex

For those not afraid of regex, here is a complimentary script to categorize URLs. Be careful the regex order is important, some values can overwrite others. Usually, it’s a good idea to place the home page last

# define a default category

INDEX$UrlCat <- "Not match"

 

# create category name

category_name <- c("Category", "Dates", "author page", "Home page")

 
# create category regex, must be the same length

category_regex <- c("category", "2019", "author","example\.com.\/$")

 

# categorize

for(i in 1:length(category_name)){

# display a dot to show the progress
  cat(".")
# run regex test and update value if it matches
# otherwise leave the previous value
  INDEX$UrlCat <- ifelse(grepl(category_regex[i], INDEX$Url, ignore.case = T), category_name[i], INDEX$UrlCat)

}


# View variable to debug

View(INDEX)

What if I want to follow robots.txt rules?

just had Obeyrobots parameter

#like that
Rcrawler(Website = "https://www.gokam.co.uk/", Obeyrobots = TRUE)

What if I want to limit crawling speed?

By default, this crawler is rather quick and can grab a lot of webpage in no times. To every advantage an inconvenience, it’s fairly easy to wrongly detected as a DOS. To limit the risks, I suggest you use the parameter RequestsDelay. it’s the time interval between each round of parallel HTTP requests, in seconds. Example

# this will add a 10 secondes delay between
Rcrawler(Website = "https://www.example.com/", RequestsDelay=10)

Other interesting limitation options:

no_cores: specify the number of clusters (logical cpu) for parallel crawling, by default it’s the numbers of available cores.

no_conn: it’s the number of concurrent connections per one core, by default it takes the same value of no_cores.

What if I want to crawl only a subfolder?

2 parameters help you do that. crawlUrlfilter will limit the crawl, dataUrlfilter will tell from which URLs data should be extracted

Rcrawler(Website = "http://www.glofile.com/sport/", dataUrlfilter ="/sport/", crawlUrlfilter="/sport/" )

How to change user-agent?

#as simply as that
Rcrawler(Website = "http://www.example.com/", Useragent="Mozilla 3.11")

What if my IP is banned?

option 1: Use a VPN on your computer

Option 2: use a proxy

Use the httr package to set up a proxy and use it

# create proxy configuration
proxy <- httr::use_proxy("190.90.100.205",41000)
# use proxy configuration
Rcrawler(Website = "https://www.gokam.co.uk/", use_proxy = proxy)

Where to find proxy? It’s been a while I didn’t need one so I don’t know.

Where are the internal Links?

By default, RCrawler doesn’t save internal links, you have to ask for them explicitly by using NetworkData option, like that:

Rcrawler(Website = "https://www.gokam.co.uk/",  NetworkData = TRUE)

Then you’ll have two new variables available at the end of the crawling:

NetwIndex var that is simply all the webpage URLs. The row number are the same than locally stored HTML files, so
row n°1 = homepage = 1.html

NetwIndex data frame

NetwEdges with all the links. It’s a bit confusing so let me explain:

NetwEdges data frame

Each row is a link. From and To columns indicate “from” which page “to” which page are each link.

On the image above:
row n°1 is a link from homepage (page n°1) to homepage
row n°2 is a link from homepage to webpage n°2. According to NetwIndex variable, page n°2 is the article about rvest.
etc…

Weight is the Depth level where the link connection has been discovered. All the first rows are from the homepage so Level 0.

Type is either 1 for internal hyperlinks or 2 for external hyperlinks

Count Links

I guess you guys are interested in counting links. Here is the code to do it. I won’t go into too many explanations, it would be too long. if you are interested (and motivated) go and check out the dplyr package and specifically Data Wrangling functions

Count outbound links

count_from <- NetwEdges[,1:2] %>%
#grabing the first two columns
     distinct() %>%
# if there are several links from and to the same page, the duplicat will be removed.
     group_by(From) %>%
     summarise(n = n()) 
# the counting
View(count_from)
# we want to view the results

the homepage (n°1) has 13 outbound links

To make it more readable let’s replace page IDs with URLs

count_from$To <- NetwIndex
View(count_from)

using website URLs

Count inbound links

The same thing but the other way around

count_to -> NetwEdges[,1:2] %>%
#grabing the first two columns
     distinct() %>%
# if there are several links from and to the same page, the duplicat will be removed.
     group_by(To) %>%
     summarise(n = n())
# the counting
View(count_to)

# we want to view the results

count of inbound links

Again to make it more readable

count_to$To <- NetwIndex
View(count_to)

using website URLs

So the useless ‘author page‘ has 14 links pointing at it, as many as the homepage… Maybe I should fix this one day.

Compute ‘Internal Page Rank’

Many SEOs, I spoke to, seem to be very interested in this. I might as well add here the tutorial. It is very much an adaptation of Paul Shapiro awesome Script.

But Instead of using ScreamingFrog export file, we will use the previously extracted links.

links <- NetwEdges[,1:2] %>%
   #grabing the first two columns
   distinct() 
# loading igraph package
 library(igraph)
# Loading website internal links inside a graph object
 g <- graph.data.frame(links)

# this is the main function, don't ask how it works
 pr <- page.rank(g, algo = "prpack", vids = V(g), directed = TRUE, damping = 0.85)

# grabing result inside a dedicated data frame
 values <- data.frame(pr$vector)
 values$names <- rownames(values)

# delating row names
 row.names(values) <- NULL

# reordering column
 values <- values[c(2,1)]
# renaming columns
 names(values)[1] <- "url"
 names(values)[2] <- "pr"
 View(values)

Internal Page Rank calculation

Let make it more readable, we’re going to put the number on a ten basis, just like when the PageRank was a thing.

#replacing id with url
values$url <- NetwIndex
# out of 10
 values$pr <- round(values$pr / max(values$pr) * 10)
#display
 View(values)

On 15 webpages website, it’s not very impressive but I encourage you to try on a bigger website.

What if a website is using a JavaScript framework like React or Angular?

RCrawler handly includes Phantom JS, the classic headless browser.
Here is how to to use

# Download and install phantomjs headless browser
# takes 20-30 seconds usually
install_browser()

# start browser process 
br <-run_browser()

After that, reference it as an option

Rcrawler(Website = "https://www.example.com/", Browser = br)

# don't forget to stop browser afterwards
stop_browser(br)

It’s fairly possible to run 2 crawls, one with and one without, and compare the data afterwards

This Browser option can also be used with the other Rcrawler functions.

⚠️ Rendering webpage means every Javascript files will be run, including Web Analytics tags. If you don’t take the necessary precaution, it’ll change your Web Analytics data

So what’s the catch?

Rcrawler is a great tool but it’s far from being perfect. SEO will definitely miss a couple of things like there is no internal dead links report, It doesn’t grab nofollow attributes on Links and there is always a couple of bugs here and there, but overall it’s a great tool to have.

Another concern is the git repo which is quite inactive.

This is it. I hope you did find this article useful, reach to me for slow support, bugs/corrections or ideas for new articles. Take care.

ref:
Khalil, S., & Fakir, M. (2017). RCrawler: An R package for parallel web crawling and scraping. SoftwareX, 6, 98-106.

Remove Page query parameters using Data Studio calculated fields

François Joly — Sun, 08 Mar 2020 14:41:28 +0000

If you are using Google Data Studio as quite extensively as I do you maybe came across this rather annoying issue
Sometimes GET parameters get in the way of quality reporting and you would rather remove them all.

Of course, Facebook is the worst with his fbclick but can be useful for pagination, ecommerce filters and so on.

One session for each row of course

One way if dealing with that is to export the data, another very classic one is to use a GA filter remove them but I not a big fan of deleting data that might be useful one day.

So now I’m using a calculated direct inside Google Data Studio, here is the formula to copy past for the savvy users:

REGEXP_REPLACE(Landing Page,"(.)\?.","\1")

For the other, here is the step by step:
Open your report and go to “Ressource” -> “Manage added sources”

Choose edit

and ‘ADD A FIELD’ on the top right

Name this new field something memorable and copy post the previous formula and you are good to go.
I hope you will find this useful

Crawling with R’ using rvest package

François Joly — Fri, 06 Mar 2020 16:43:55 +0000

If you want to crawl a couple of URLs for SEO purposes, there are many many ways to do it but one of the most reliable and versatile packages you can use is rvest

Here is a simple demo from the package documentation using the IMDb website:

# Package installation, instruction to be run only once 
install.packages("rvest") 

# Loading rvest package
library(rvest)

The first step is to crawl the URL and store the webpage inside a ‘lego_movie’ variable.

lego_movie <- read_html("http://www.imdb.com/title/tt1490017/")

Quite straightforward, isn’t it?

Beware lego_move is now an xml_document that need to be parse in order to extract the data. Here is how to do it:

rating <- lego_movie %>% 
   html_nodes("strong span") %>%
   html_text() %>%
   as.numeric()

For those who don’t know %>% operator is like the | ( “pipe”) for a terminal command line. The operations are carried out successively. Meaning the results of the previous command are the entries for the next one.

html_nodes() function will extract from our webpage, HTML tags that match CSS style query selector. In this case, we are looking for a tag whose parent is a tag.
then script will extract the inner text value using html_text() then convert it to a number using as.numeric().

Finally, it will store this value inside rating variable to display the value just write:

rating # it should display > [1] 7.8

Let’s take another example. This time we are going to grab the movies’ cast.

Having a look at the HTML DOM, it seems that we need to grab an HTML tag who’s parent tag have ‘titleCast’ as an id and ‘primary_photo’ as a class name and then we’ll need to extract the alt attribute

cast <- lego_movie %>% html_nodes("#titleCast .primary_photo img") %>% html_attr("alt") cast # Should display: # > [1] "Will Arnett" "Elizabeth Banks" "Craig Berry" # > [4] "Alison Brie" "David Burrows" "Anthony Daniels" # > [7] "Charlie Day" "Amanda Farinos" "Keith Ferguson" # > [10] "Will Ferrell" "Will Forte" "Dave Franco" # > [13] "Morgan Freeman" "Todd Hansen" "Jonah Hill"

Last example, we want the movie poster url. First step is to grab tag who’s parent have a class name ‘poster’ Then extract src attribute and display it

poster <- lego_movie %>% html_nodes(".poster img") %>% html_attr("src") poster # Shoudl display: # [1] "https://m.media-amazon.com/images/M/MV5BMTg4MDk1ODExN15BMl5BanBnXkFtZTgwNzIyNjg3MDE@.V1_UX182_CR0,0,182,268_AL.jpg"

Now a real-life crawl example

Now that we’ve seen an example by the book. We’ll switch to something more useful and a little bit more complex. Using the following tutorial, you’ll be able to extract the review score of any WordPress plugins over time.

For example here are the stats for Yoast, the famous SEO plugin:

Here are the ones’ for All in one SEO, his competitor

Very useful to follow if your favourite plugin new release is well received or not.

But before that, a little warning, the source code I’m about to show you has been made by me. It’s full of flaws, couple of stack overflow copypasta but… it works. 😅 So Dear practitioners please don’t judge me
It’s one of the beauties of R, you get your ends relatively easily.

(but I gladly accept any ideas to make this code easier for beginner, don’t hesitate to contact me)

So let’s get to it, the first step is to grab a reviews page URL. On this one, we have 49 pages of reviews.

We’ll have to make a loop to run into each pagination. Another problem is that no dates are being displayed but only durations, so we’ll have to convert them.

As usual, we’ll first load the necessary packages. If there are not installed yet, run the install.packages() function as seen before.

#Loading packages library(tidyverse) library(rvest)

# we store plugin url inside a variable, to make the code easy to reuse pluginurl <- "https://wordpress.org/support/plugin/wp-fastest-cache/" # we create and empty dataframe to receive the data that will be retrieved from each pagination. If you don't know what's a data frame think of them as excel file all_reviews <- data.frame() ##### beginning of the LOOP #### # if copy past stuff, don't forget to grab the code until the end of the loop at least for(i in 1:49) { # sending to console the loop status # paste0() function is just a concatenation function with a weird name message(paste0("Page ",i)) # faculative: make a small break betweeh each loop iteration # this pause the loop for 2 secondes # Sys.sleep(2) # we grab the webpage and store the result inside html_page variable to be able to reuse it several times html_page <- read_html(paste0(pluginurl,"reviews/page/",i,"/")) # html_nodes is function that use the css or xpath to extract the value from the html page. This part is to extract the number of stars reviews <- html_nodes(html_page, ".wporg-ratings")

If you need help to select elements, chrome inspector is great. You can copy/paste xpath and .css style selector directly:

# Then we are getting every htmml attributes values into columns and rows # it's a copy/past from stackoverflow, it's works don't ask me how. extract <- bind_rows(lapply(xml_attrs(reviews), function(x) data.frame(as.list(x), stringsAsFactors=FALSE)))

In other words, it transforms this HTML data hard to deal with

into a clean data frame with nice columns

# using the extract() function get the number of stars extract <- extract %>% extract(title,c("note")) # same process but this time to extract the duration # Grabing from the html file, the duration being displayed dates <- html_nodes(html_page, ".bbp-topic-freshness") # Extracting the real duration value from text: we remove line breaks and what's after "ago" extract$dates <- html_text(dates, trim = T) %>% str_replace_all("[\r|\n|\t]" , "") %>% str_replace_all(" ago.*$" , "") # apply duration type to values, necessary for future conversions # more info https://lubridate.tidyverse.org/reference/duration.html extract$duration <- lubridate::as.duration(extract$dates) # removing from the data frame the now useless columns & rows extract$class <- NULL extract$title <- NULL extract$style <- NULL extract$note$class <-NULL extract$note$style <- NULL extract <- extract[-1,] # erase rownames rownames(extract) <- c() # converte values to the right type extract$note <- as.vector(extract$note) extract$note <- as.numeric(extract$note) # adding all date retrieved during this loop to the main data frame 'all_reviews' all_reviews <- rbind(all_reviews, extract) ##### END OF THE LOOP ##### }

The next step is to convert these durations into days. It’s going to be quick:

# .Data is the number of seconds, we divided by 86400 to have the number of days and we round it all_reviews$duration2 <- round(all_reviews$duration@.Data/86400) # Today date minus review age will give us the review date all_reviews$day <- today()-all_reviews$duration2

# we want to see number of stars as a category not as a scale all_reviews$note <- as.factor(all_reviews$note)

the data is now ready, export your data or make a small graph to display it using ggplot package

library(ggplot2) ggplot(all_reviews, aes(x=day, fill=note))+ geom_histogram()

this is it. I hope you find it useful. If you have problems, reach to me on twitter maybe I can help

Export your data from R’

François Joly — Mon, 02 Mar 2020 22:56:48 +0000

R’ and RStudio are great but sometimes it’s better the just export your data to exploit them elsewhere or just show them to other people. Here is a review of possible techniques:

Export your data into a CSV

assuming your data is store inside df var, fairly simple:

#setup where to write the file setwd("~/Desktop") # en write the file write.csv(df, "data.csv")

Export your data into an excel file

A little bit more complex, we’ll use the ‘xlsx’ package

#setup where to write the file setwd("~/Desktop") # if the package is not instal yet, run this # install.packages("xlsx") # Loading the package library(xlsx) # we write the file write.xlsx(df, "data.xlsx")

A few more tips for you:

I’ll like to use the sheetName option to explicitly name the tab. The default name is “Sheet1”. Quite useful to have a record of when the file has been generated for example. Replace last instruction what follows and you’ll be able to know.

write.xlsx(df, "data.xlsx", sheetName=format(Sys.Date(), "%d %b %Y"))

Another good one that I like is to send the excel file to a Shared folder directly. Replace first instruction by

setwd("/Users/me/Dropbox/Public")

Of course, replace the file path by yours.

Send your data by email

If the data to send it not to big, another interesting idea is to send it by email using the ‘gmailr’ package.

#install.packages("gmailr") #install.packages("tableHTML") # Packages loading library(gmailr) # This one is usefull to transform a data frame into an HTML library(tableHTML) # This will allow you to connect to gmail # replace the fake value by your key and secret # more info here: https://gargle.r-lib.org/articles/get-api-credentials.html gm_auth_configure("mykey.apps.googleusercontent.com", "mysecret") #transform the data frame 'df' to a html table msg = tableHTML(df) # Construct email test_email <- gm_mime() %>% gm_to("another@example.com") %>% gm_from("me@example.com") %>% gm_subject("Email title") %>% gm_html_body(paste("Hi Mate,
Here are the data you requested:", msg," Kind regards, François")) # end send it gm_send_message(test_email) Perform automatic browser tests with Selenium & R! François Joly — Sun, 01 Mar 2020 15:27:00 +0000 Selenium is a very classic tool for QA and it can help perform automatic checks on a website. This is an intro of how to use it: The first step is, as always, to install and load the RSelenium package #install to run once install.packages("RSelenium") library(RSelenium) We’ll launch a selenium server with a Firefox browser in a controlled mode. It will take quite some time the first time but after it will load in a few seconds. here is the command: rd <- rsDriver(browser = "firefox", port = 4444L) At the end of the process, it should open a firefox window like this one Then we’ll grab the instance to be able to control our browser remDr <- rd[["client"]] It’s now possible to send action to our browser. To open a website URL just type remDr$navigate("http://www.bbc.com") You will notice the robot head icon which means that it is a remote-controlled browser Here are some useful commands: # find a dom element using the class selector and grab inner text remDr$findElement(using = "class", value ="top-story")$getElementText() # find a dom element using a class selector and click on it remDr$findElement(using = "class", value ="top-story")$clickElement() # get h1 textusing a tag selector remDr$findElement(using ="tag", value = "h1")$getElementText() # refresh browser remDr$refresh() When you are done with it, don’t forget to # close browser remDr$close() # stop the selenium server rd[["server"]]$stop() # and delete it rm(rd) Otherwise, it’s gonna be a mess when you’ll get back on it Why Selenium is a very interesting solution? One of the great advantages of using Selenium is that you can alternate automatic and manual actions in the same session. For example, you can log on somewhere and run an automatic script after pretty easily or… fill in a captcha and run your script. Check your < title > Pixel length with Google Sheet François Joly — Wed, 12 Jun 2019 09:47:43 +0000 Edit 1: For busy people, Google Sheet direct link Edit 2: the intitial script was deeply improve by Jean-Francois Picard from lg2.com thank you again for your contribution Why would you check your Pixel length?</h2> <p>We should check webpages title meta tags because if they are too long, Google will remove the end of the text, like that:</p> <blockquote><p><br /> <em>it’s quite annoying really…</em></p></blockquote> <p>A simple way of doing it is to check the number of characters. <a href="https://moz.com/learn/seo/title-tag">Moz</a> is explaining it better than I can:</p> <blockquote><p><em>Google typically displays the first 50–60 characters of a title tag. If you keep your titles under 60 characters, our research suggests that you can expect about 90% of your titles to display properly.</em></p></blockquote> <p>This works just fine but if you want to be <strong>more precise</strong> in your metadata optimization work,<strong> you’ ll have to check pixels </strong>instead. The reason is all letters do not have the same width. There is even a difference between upper and lower case letters:</p> <blockquote><p></p></blockquote> <h2>How can we simply deal with this problem?</h2> <p></p> <p>Here comes the <a href="https://docs.google.com/spreadsheets/d/1rbOo08UmnXfWfZOTmjnLbL-9BP_h4GxA6iN4j-k4gHE/edit?usp=sharing">Google Sheet</a>! Let me break through the file structure:</p> <ul> <li style="list-style-type: none;"> <ul> <li><strong>Column A: </strong>ALL the URLs you want to check. In my case, I use the <em>IMPORTXML</em> function to retrieve the latest articles from the BBC website<br /> <pre class="crayon-plain-tag">=IMPORTXML("http://feeds.bbci.co.uk/news/uk/rss.xml?";"//link")</pre><br /> <em>//link</em> is an XPath formula to extract URLs from the BBC RSS file. If we were using an XML sitemap file, we should have used <em>‘//loc’ </em>instead.</li> <li><strong>Column B:</strong> crawling URLs using again IMPORTXML function and extracting meta <title>’s<br /> <pre class="crayon-plain-tag">=IMPORTXML(A2;"//title[1]")</pre> </li> <li><strong>Column C:</strong> we use LEN function to count the number of characters.<br /> <pre class="crayon-plain-tag">=<span class=" default-formula-text-color" dir="auto">if</span><span class=" default-formula-text-color match-paren" dir="auto">(</span><span dir="auto">B4</span><span class=" default-formula-text-color" dir="auto"><></span><span class=" string " dir="auto">"Error"</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" default-formula-text-color" dir="auto">if</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">A4</span><span class=" default-formula-text-color" dir="auto"><></span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" default-formula-text-color" dir="auto">LEN</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">B4</span><span class=" default-formula-text-color" dir="auto">)</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">)</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" string " dir="auto">""</span><span class=" match-paren default-formula-text-color" dir="auto">)</span></pre> </li> <li><strong>Column D: </strong>custom function <em><span class=" default-formula-text-color" dir="auto">pixelTitle</span></em> to calculate the corresponding number of <strong>pixels</strong>.<br /> <pre class="crayon-plain-tag"><span class=" default-formula-text-color" dir="auto">=</span><span class=" default-formula-text-color" dir="auto">if</span><span class=" default-formula-text-color match-paren" dir="auto">(</span><span dir="auto">B4</span><span class=" default-formula-text-color" dir="auto"><></span><span class=" string " dir="auto">"Error"</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" default-formula-text-color" dir="auto">if</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">A4</span><span class=" default-formula-text-color" dir="auto"><></span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" default-formula-text-color" dir="auto">pixelTitle</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">B4</span><span class=" default-formula-text-color" dir="auto">)</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">)</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" string " dir="auto">""</span><span class=" match-paren default-formula-text-color" dir="auto">)</span></pre> </li> <li><strong><strong>Column E: </strong></strong>custom function <span class=" default-formula-text-color" dir="auto"><em>pixelTitleTooLong</em>, </span> using the number of pixels, is the title too long?<br /> <pre class="crayon-plain-tag"><span class=" default-formula-text-color" dir="auto">=</span><span class=" default-formula-text-color" dir="auto">if</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">B4</span><span class=" default-formula-text-color" dir="auto"><></span><span class=" string " dir="auto">"Error"</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" default-formula-text-color" dir="auto">if</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">A4</span><span class=" default-formula-text-color" dir="auto"><></span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" default-formula-text-color" dir="auto">pixelTitleTooLong</span><span class=" default-formula-text-color" dir="auto">(</span><span dir="auto">B4</span><span class=" default-formula-text-color" dir="auto">)</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">)</span><span class=" default-formula-text-color" dir="auto">;</span><span class=" string " dir="auto">""</span><span class=" default-formula-text-color" dir="auto">)</span></pre> </li> <li><strong>Column G & H:<br /> </strong>More complexe settings: with G formula you can modify the pixel constant and with column H, you can remove a word et add a words at the end of the title.<br /> The last one can be useful because sometimes <title> are being rewritten and the brand is added automatically by Google.</li> </ul> </li> </ul> <h2>So how to use this?</h2> <p>You can make a copy of the <a href="https://docs.google.com/spreadsheets/d/1rbOo08UmnXfWfZOTmjnLbL-9BP_h4GxA6iN4j-k4gHE/edit?usp=sharing">Google sheet</a> or if you prefer, you can also copy paste the functions from <a href="https://github.com/pixgarden/seo-check-width-serps">github</a> inside your Google Script Editor window.</p> <p>I hope you’ll find it useful!</p> </article> <article> <h1>Hunt down keyword cannibalization using R’</h1> <p>François Joly — Tue, 19 Mar 2019 23:34:46 +0000</p> <h2>What the hell is keyword cannibalization?</h2> <p>if you put a lot of articles out there, at some point, some article will compete with one another for the same keywords in Google result pages. it’s what SEO people call ‘keyword cannibalization’.</p> <h2>Does it matter SEO wise?</h2> <p>Sometimes it’s perfectly normal. I hope, for your sake, that several of your webpages show up when someone is typing your brand name in Google.</p> <p>Sometimes it’s not. Let me give an example:</p> <p style="padding-left: 30px;">💭 Imagine you run an e-commerce website, with various page type: products, FAQ’s, blog posts, …</p> <p style="padding-left: 30px;">At some point, Google decided to make a switch: a couple of Google search queries that were sending traffic to product pages, now display a blog post of yours.</p> <p style="padding-left: 30px;">Inside Google Analytics, SEO sessions count is the same. Your ‘Rank Tracking’ software will not bring you up position change.</p> <p style="padding-left: 30px;">And yet, these blog post pages will be able to convert much less, and at the end of the month, this will result in a decrease in sales.</p> <h2>How to check for keyword cannibalization?</h2> <p>There are several ways to do it. Of course, SEO tools people want you to use their tools, the <a href="https://ahrefs.com/blog/keyword-cannibalization/">method from ahref</a> is definitely useful. Unfortunately, this kind of tool can be sometimes imprecise, it doesn’t take into account what’s really happening.</p> <p>So let me show you another method using R’. Once set up, you’ll be able to check big batches of keywords in minutes. 🤖</p> <h2>step 0: install R & rstudio</h2> <p>So, of course, you’ll need to <a href="https://www.r-project.org/">download and install R’</a> and I’d recommend using <a href="https://www.rstudio.com/">rstudio</a> IDE.<br /> There is a lot of tutorials over the Web if you need any help for this part.</p> <h2>step 1: install the necessary packages</h2> <p>First, we’ll load <em>searchConsoleR</em>, awesome 🌈 package by <a href="https://github.com/MarkEdmondson1234">Mark Edmondson</a>.<br /> This will allow us to send requests to Google ‘Search Console API’ very easily.</p><pre class="crayon-plain-tag">install.packages("searchConsoleR") library(searchConsoleR)</pre><p>Then let’s load <em>tidyverse</em>. For those who don’t know about it, it’s a very popular master package that will allow us to work with data frames and in a graceful way.</p><pre class="crayon-plain-tag">install.packages("tidyverse") library(tidyverse)</pre><p>and finally, something to help to deal with Google Account Authentication (still by Mark Edmondson). It will spare the pain of having to set up an API Key.</p><pre class="crayon-plain-tag">install.packages("googleAuthR") library(googleAuthR)</pre><p></p> <h2>step 2 – gather DATA</h2> <p>Let’s initiate authentification. This should open a new browser window, asking you to validate access to your GSC account. The script will be allowed to make requests for a limited period of time.</p><pre class="crayon-plain-tag">scr_auth()</pre><p>This will create a <strong>sc.oauth</strong> file inside your working directory. It stores your temporary Access tokens. If you wish to switch between Google accounts, just delete the file, re-run the command and log in with another account.</p> <p>Let’s list all websites we are allowed to send requests about:</p><pre class="crayon-plain-tag">sc_websites <- list_websites() View(sc_websites)</pre><p>and pick one</p><pre class="crayon-plain-tag">hostname <- "https://www.example.com/"</pre><p><em><small>don’t forget to update this with your hostname</small></em></p> <p>As you may know, Search Console data is not available right away. That’s why we want to request data for the last <em>available</em> 2 months, so between 3 days ago and 2 months before that… again using a little useful package!</p><pre class="crayon-plain-tag">install.packages("lubridate") require(lubridate) tree_days_ago <- lubridate::today()-3 beforedate <- tree_days_ago month(beforedate) <- month(beforedate) - 2 day(beforedate) <- days_in_month(beforedate)</pre><p>and <strong>now the actual request (at last!)</strong></p><pre class="crayon-plain-tag">gsc_all_queries <- search_analytics(hostname, beforedate, tree_days_ago, c("query", "page"), rowLimit = 80000)</pre><p>We are requesting ‘query’ and ‘page’ dimensions. If you wish, it’s possible to restrict request to some type of user device, like ‘desktop only’. See function <a href="https://www.rdocumentation.org/packages/searchConsoleR/versions/0.3.0/topics/search_analytics">documentation.</a></p> <p>There is no point in asking for a longer time period. We want to know if our webpages currently compete with one another now.</p> <p><em>rowLimit</em> is a bit of a big random number, this should be enough. If you have a popular website, with a lot of long tail traffic. You might need to increase it.</p> <p>API respond is store inside <em>gbr_all_queries </em>variable as a data frame.</p> <p></p> <p>If you happen to have several domains/subdomains that compete with each other for the same keywords, this process should be repeated. The results will have to be aggregated, <a href="https://dplyr.tidyverse.org/reference/bind.html"><em>bind_rows</em></a> function will help you bind them together. This is how to use it :</p><pre class="crayon-plain-tag">bind_rows(gsc_queries_1,gsc_queries_2)</pre><p></p> <h2>step 3 – clean up</h2> <p>First, we’ll filter out queries that are not on the 2 first SERPs and that doesn’t generate any click. There is no point of making useless time-consuming calculations.</p> <p>We’ll also remove branded search queries using a regex. As said earlier, having several positions for your brand name is pretty classic and shouldn’t be seen as a problem.</p><pre class="crayon-plain-tag">gsc_queries_filtered <-gsc_all_queries %>% filter(position<=20) %>% filter(clicks!=0) %>% filter(!str_detect(query, 'brandname|brand name'))</pre><p><em><small>update this with your brand name</small></em></p> <h2>step 4 – computations</h2> <p>We want to know for one query, what percentage of clicks are going to each landing page.</p> <p>First, we’ll create a new column <strong>clicksT</strong> with the aggregated number of clicks for each search query.<br /> Then, using this value to calculate what we need inside a new <strong>per</strong> column.</p><pre class="crayon-plain-tag">gsc_queries_computed <- gsc_queries_filtered %>% group_by(query) %>% mutate(clicksT= sum(clicks)) %>% group_by(page, add=TRUE) %>% mutate(per=round(100*clicks/clicksT,2)) View(gsc_queries_computed)</pre><p>A <strong>per</strong> column value of 100 means that all clicks go the same URL.</p> <p>Last final steps, we will sort rows</p><pre class="crayon-plain-tag">gsc_queries_final <- gsc_queries_computed %>% arrange(desc(clicksT))</pre><p>[edit : ] It could also make sense fo remove rows where cannibalization is not significant. Where <strong>per</strong> column value is not very high. [end of edit]</p> <p>Removing now useless columns: click, impression and total click per query group</p><pre class="crayon-plain-tag">gsc_queries_final <-gsc_queries_final[,c(-3,-4,-7)]</pre><p>Now it’s your choice to display it inside rstudio</p><pre class="crayon-plain-tag">View(gsc_queries_final)</pre><p>Or write a CSV file to open it elsewhere</p><pre class="crayon-plain-tag">write.csv(gsc_queries_final,"./gsc_queries_final.csv")</pre><p>Here is my rstudio view (anonymized sorry 🙊)</p> <p></p> <h2>step 5 – analysis</h2> <p>You should check data inside each “query pack”. Everything is sorted using the total number of clicks, so, first rows are critical, bottoms rows not so much.</p> <p>To help you deal with this, let’s check the first one’s</p> <p></p> <p style="padding-left: 30px;"><em>For Search query 1:</em><br /> 97% of click are going to the same page. Their is no Keyword cannibalization here. It’s interesting to notice that the ‘second’ landing page, only earn 1,4% of clicks, even though, it got an average position of 1,5. Users really don’t like the second ‘Langing page’. Page metadata probably sucks.</p> <p style="padding-left: 30px;">Check if the first landing page is the right one and we should move on.</p> <p></p> <p style="padding-left: 30px;"><em>For Search query 2:</em><br /> 63% of click are going to the first landing page. 36% to the second page. This is Keyword cannibalization.<br /> It could make sense to adapt internal linking between involved landing page to influence which one should rank before the other one’s, depending on your goals, pages bounce rates, etc.</p> <p>And so on…</p> <p>This is it my friends, I hope you’ll find it be useful!</p> <p> </p> <p> </p> </article> <article> <h1>Google Analytics Classic and Universal parameters explained</h1> <p>François Joly — Sat, 02 Mar 2019 12:33:23 +0000</p> <h2>Universal Analytics (analytics.js)</h2> <p>Tag sends a hit on the url /collect, here is the meaning of the parameters</p> <p> </p> <table> <tbody> <tr> <td><strong>Parameters</strong></td> <td><strong>Meaning in English</strong></td> <td><strong>Example</strong></td> </tr> <tr> <td>a</td> <td>??</td> <td></td> </tr> <tr> <td>cd1</td> <td>Custom Dimension</td> <td></td> </tr> <tr> <td>cid</td> <td>Client ID</td> <td>17444485575.14222271155</td> </tr> <tr> <td>de</td> <td>Document Encoding</td> <td>UTF-8</td> </tr> <tr> <td>dl</td> <td>Document location URL</td> <td>http://www.website.io/test.html</td> </tr> <tr> <td>dp</td> <td>Document Path</td> <td>/foo</td> </tr> <tr> <td>dr</td> <td>Document Referrer</td> <td></td> </tr> <tr> <td>dt</td> <td>Document Title</td> <td>Accueil – Mon site</td> </tr> <tr> <td>fl</td> <td>Flash Version</td> <td>20.0 r</td> </tr> <tr> <td>gtm</td> <td>GTM ID</td> <td>GTM-TR5S4R</td> </tr> <tr> <td>je</td> <td>Java Enabled</td> <td>0</td> </tr> <tr> <td>jid</td> <td>“Display Join Beacon” for linking analytics with the double click cookie <a href="https://productforums.google.com/forum/?hl=hu&nomobile=true#!topic/tag-manager/wCoT99zVE_s;context-place=forum/tag-manager">source: Simo</a></td> <td>60036755</td> </tr> <tr> <td>sd</td> <td>Screen Colors</td> <td>24-bit</td> </tr> <tr> <td>sr</td> <td>Screen Resolution</td> <td>1280×800</td> </tr> <tr> <td>t</td> <td>Track Type. Must be one of ‘pageview’, ‘screenview’, ‘event’, ‘transaction’, ‘item’, ‘social’, ‘exception’, ‘timing’.</td> <td>pageview</td> </tr> <tr> <td>tid</td> <td>Tracking ID / Web Property ID</td> <td>UA-20202-14</td> </tr> <tr> <td>ul</td> <td>User Language</td> <td>en-us</td> </tr> </tbody> </table> <h2>Classic Google Analytics (ga.js)</h2> <p>Tag sends a hit on the url /r/__utm.gif, here is the meaning of the parameters</p> <table> <tbody> <tr> <td><strong>Parameters</strong></td> <td><strong>Meaning in English</strong></td> <td><strong>Example</strong></td> </tr> <tr> <td>utmac</td> <td>Account ID</td> <td>UA-1202056-1</td> </tr> <tr> <td>utmcc</td> <td>Analytics Cookie string “utmcc contains the combined strings of the __utma and __utmz Google Analytics cookies. This string is URL encoded.</td> <td></td> </tr> <tr> <td>utmcs</td> <td>Character set</td> <td>ISO-8859-1</td> </tr> <tr> <td>utmdt</td> <td>Page title</td> <td></td> </tr> <tr> <td>utmfl</td> <td>Flash version</td> <td></td> </tr> <tr> <td>utmhid</td> <td>Hit ID, random number</td> <td></td> </tr> <tr> <td>utmhn</td> <td>Hostname</td> <td>apps.google.com</td> </tr> <tr> <td>utmht</td> <td>that’s the timestamp, in milliseconds since the UNIX epoch.</td> <td></td> </tr> <tr> <td>utmipc</td> <td>eCommerce – Product code / SKU</td> <td></td> </tr> <tr> <td>utmipn</td> <td>eCommerce – Product name</td> <td>E-commerce – Nom du produit</td> <td></td> </tr> <tr> <td>utmipr</td> <td>eCommerce – Product price</td> <td></td> </tr> <tr> <td>utmiqt</td> <td>eCommerce – Quantity</td> <td></td> </tr> <tr> <td>utmiva</td> <td>eCommerce – Product category / variation</td> <td></td> </tr> <tr> <td>utmje</td> <td>Java enabled? (1 = yes, 0 = no)</td> <td></td> </tr> <tr> <td>utmjid</td> <td>Display Join Beacon? for linking analytics with the double click cookie. If you’ve enabled display advertising (for e.g. demographic data etc.) your hits will be recycled through the doubleclick servers, and this id is used to join the data together.</td> <td></td> </tr> <tr> <td>utmn</td> <td>Random ID to prevent gif caching</td> <td></td> </tr> <tr> <td>utmp P</td> <td>Page path</td> <td></td> </tr> <tr> <td>utmr F</td> <td>Full referral URL</td> <td></td> </tr> <tr> <td>utmredir</td> <td>redirection?</td> <td></td> </tr> <tr> <td>utms</td> <td>Requests made this session (max. 500)</td> <td></td> </tr> <tr> <td>utmsc</td> <td>Screen colour depth (e.g. 24-bit)</td> <td></td> </tr> <tr> <td>utmsr</td> <td>Screen resolution</td> <td></td> </tr> <tr> <td>utmt</td> <td>Request type (e.g. ‘event’, ‘tran’ etc…)</td> <td>event</td> </tr> <tr> <td>utmtci</td> <td>Billing City</td> <td></td> </tr> <tr> <td>utmtco</td> <td>Billing Country</td> <td></td> </tr> <tr> <td>utmtid</td> <td>Order ID The utmtid order ID must be unique for each order, otherwise Google Analytics will group multiple transactions under a single entry. All monetary fields should be filled in without a currency symbol, e.g.: 12.50</td> <td></td> </tr> <tr> <td>utmtrg</td> <td>Billing Region</td> <td></td> <td></td> </tr> <tr> <td>utmtsp</td> <td>Shipping cost</td> <td></td> </tr> <tr> <td>utmtst</td> <td>Store name</td> <td></td> </tr> <tr> <td>utmtto</td> <td>Order Total (inc. tax and shipping)</td> <td></td> </tr> <tr> <td>utmttx</td> <td>Tax cost</td> <td></td> </tr> <tr> <td>utmu C</td> <td>lient usage / Error data (encoded)</td> <td></td> </tr> <tr> <td>utmul</td> <td>Language code (e.g. en-us)</td> <td></td> </tr> <tr> <td>utmvp</td> <td>Viewport resolution</td> <td></td> </tr> <tr> <td>utmwv</td> <td>Tracking code version</td> <td></td> </tr> <tr> <td>v</td> <td>Protocol Version</td> <td></td> </tr> <tr> <td>vp</td> <td>????</td> <td></td> </tr> <tr> <td>z</td> <td>????</td> <td></td> </tr> <tr> <td>_r</td> <td>????</td> <td></td> </tr> <tr> <td>_s</td> <td>????</td> <td></td> </tr> <tr> <td>_u</td> <td>????</td> <td></td> </tr> <tr> <td>_ut</td> <td>ma</td> <td></td> </tr> <tr> <td>_ut</td> <td>mht</td> <td></td> </tr> <tr> <td>_ut</td> <td>mz</td> <td></td> </tr> <tr> <td>_v</td> <td>????</td> <td></td> </tr> <tr> <td>aip</td> <td>Anonymize IP</td> <td></td> </tr> </tbody> </table> <h2>Enhanced Ecommerce Universal Analytics (ec.js)</h2> <p> </p> <h3 id="title_2322_6634" class="cheat_sheet_output_title">Product Impressions</h3> <div id="block_2322_6634" class="cheat_sheet_output_block"> <table id="cheat_sheet_output_table" class="cheat_sheet_output_twocol" border="0" cellspacing="0" cellpadding="0"> <tbody> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]nm</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>il (impression list) The list or collection to which product belongs (Example:il1nm)</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]nm</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The name of the product impression (product impression) #</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]id</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The product ID or SKU of the product impression #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]pr</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The price of the product impression #</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]br</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The brand of the product impression #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]ca</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The category of the product impression #</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]va</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The variant of the product impression #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]ps</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The product´s position in a list of the product impression #</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]cd[index]</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The product´s custom dimension index #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>il[index]pi[index]cm[index]</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>The product´s custom metric index #</div> </td> </tr> </tbody> </table> </div> <h3 id="title_2322_6635" class="cheat_sheet_output_title">Promotion Impressions</h3> <div id="block_2322_6635" class="cheat_sheet_output_block"> <table id="cheat_sheet_output_table" class="cheat_sheet_output_twocol" border="0" cellspacing="0" cellpadding="0"> <tbody> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>promo[index]id</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>Promotion ID #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>promo[index]nm</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>Promotion Name #</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>promo[index]cr</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>Promotion Creative #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>promo[index]ps</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>Promotion Position #</div> </td> </tr> </tbody> </table> </div> <h3 id="title_2322_6636" class="cheat_sheet_output_title">Product Info</h3> <div id="block_2322_6636" class="cheat_sheet_output_block"> <table id="cheat_sheet_output_table" class="cheat_sheet_output_twocol" border="0" cellspacing="0" cellpadding="0"> <tbody> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pa</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product action (click, detail,add,remove,checkout,checkout_option,purchase,refund)</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]nm</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # Name</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]id</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # ID or SKU</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]pr</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # Price</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]va</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # Variant</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]qt</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # Quantity</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]cd[index]</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # Custom Dimension #</div> </td> </tr> <tr class="countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]cm[index]</div> </td> <td class="cheat_sheet_output_cell_2" valign="top"> <div>product # Custom Metric #</div> </td> </tr> <tr class="altrow countrow"> <td class="cheat_sheet_output_cell_1" valign="top"> <div>pr[index]nm</div> </td> </tr> </tbody> </table> </div> </article> </main></body></html>