HTTP Lessons – Lesson 3 – Client Identity
HTTP Lessons – Lesson 1 – Overview of basic concepts
HTTP Lessons – Lesson 2 – Architectural Aspects
HTTP Lessons – Lesson 3 – Client Identity
HTTP Lessons – Lesson 4 – Client Authentication Mechanisms
HTTP Lessons – Lesson 5 – Security
HTTP Lessons – Glossary
So far, you've learned the basic concepts and some of the architectural aspects of HTTP. This brings us to the next important topic for HTTP: client identification.
In this article, you'll learn why client identification is important and how web servers can identify you (your web client). You'll also see how this information is used and stored.
In this article, you will learn more about:
- Client ID and why it's so important
- Different ways to identify the client
- HTTP request headers used for identification
- IP address
- Long (fat) URLs
- Cookies
First, let's see why websites need to know about you.
Client ID and Why It's So Important
As you are aware, every website, at least those that care enough about you and your actions, includes some content personalization.
What do I mean by that?
These personalizations include things like recommended products when you visit e-commerce sites, recommendations of “people you may know/want to add” on social media, recommended videos, ads that know what you need, articles that may interest you.
This feels like a double-edged sword. On one hand, it's great to have personalized, tailored content delivered to you. On the other hand, it can lead to all sorts of stereotypes and prejudices.
But how can we live without knowing how our favorite team scored last night or what celebrities did last night?
Either way, content personalization has become a part of our daily lives, and we probably don't want to do anything about it.
Let's see how web servers can identify you to achieve this effect.
Different Methods to Identify the Client
There are several ways a Web server can identify you:
- HTTP request headers
- IP address
- Long URLs
- Cookies
- Login information (authentication)
Let's go over them all. HTTP authentication is explained in more detail in Chapter 4 of HTTP Lessons.
HTTP Request Headers Used for Identification
Web servers have several ways to extract information about you directly from HTTP request headers.
These titles are:
- From – contains the user's email address
- User-Agent – Contains information about the web client
- Referer – Contains the source from which the user came
- Authorization – includes username and password
- Client-ip – contains the user's IP address
- X-Forwarded-For – contains the user's IP address (when going to the proxy server)
- Cookie – Cookies containing an ID tag generated by the server
Theoretically, From header would be ideal for uniquely identifying the user, but in practice, this header is very rarely used due to security concerns of email collection.
User-Agent The header contains information such as browser version, operating system, etc. While this is important for customizing content, it does not identify the user in a more relevant way.
Referer The header tells the server where the user is coming from. This information is used to improve understanding of user behavior, but less often to identify it.
While these headers provide some useful information about the client, they are not enough to personalize the content in a meaningful way.
The remaining headings provide more precise identification mechanisms.
IP address
Client identification by IP address was widely used in the days when IP addresses were not so easily spoofed/swapped. While it can be used as an additional security check, it is not reliable enough to be used on its own.
Here are some of the reasons:
- Not the user, the machine defines
- NAT firewalls – Many ISPs (Internet service providers) use NAT firewalls to increase security and deal with IP address shortages
- Dynamic IP addresses – users usually get dynamic IP address from ISP
- HTTP proxies and gateways – these can hide the original IP address. Some proxies use Client-ip or X-Forwarded-For to protect the original IP address
Long (fat) URLs
It's not uncommon to see websites use URLs to improve the user experience. As the user browses the website, they add information until the URLs become cluttered and unreadable.
You can see what the long URL looks like by browsing the Amazon store.
https://www.amazon.com/gp/product/1942788002/ref=s9u_psimh_gw_i2?ie=UTF8&fpl=fresh&pd_rd_i=1942788002&pd_rd_r=70BRSEN2K19345MWASF0&pd_rd_w=KpLza&pd_rd_wg=gTIeL&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=&pf_rd_r=RWRKQXA6PBHQG52JTRW2&pf_rd_t=36701&pf_rd_p=1cf9d009-399c-49e1-901a-7b8786e59436&pf_rd_i=desktop
There are several problems with using this approach.
- Ugly
- Cannot be shared
- Cuts the cache
- Limited to the current session
- Increases the load on the server
Cookies
The most current method of client identification, excluding authentication. Developed by Netscape, but now supported by every browser.
There are two types of cookies: session cookies and persistent cookies. Session cookies are deleted when the cookie exits the browser, while persistent cookies are saved to disk and can be used longer. For a session cookie to be treated as a persistent cookie, the Max-Age or Expiry parameters must be set.
Modern browsers like Chrome and Firefox can keep background processes running when you close them, so you can pick up where you left off. This can cause session cookies to be retained, so be careful.
So how do cookies work?
Cookies contain a list of name-value pairs that are set by the server using the Set-Cookie or Set-Cookie2 response header. Typically, the information stored in the cookie is some form of client identification, but some websites also store other information.
The browser stores this information in its cookie database and returns it the next time the user visits the page/website. The browser can process thousands of different cookies and knows when to serve each one.
Here is a sample flow.
1. User-Agent -> Server
POST /acme/login HTTP/1.1 [form data]
2. Server -> User-Agent
HTTP/1.1 200 OK Set-Cookie2: Customer="WILE_E_COYOTE"; Version="1"; Path="/acme"
The server sends the Set-Cookie response header to tell the User-Agent (browser) to set a cookie about the user.
3. User-Agent -> Server
POST /acme/pickitem HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme" [form data]
The user selects the item into the shop cart.
4. Server -> User-Agent
HTTP/1.1 200 OK Set-Cookie2: Part_Number="Rocket_Launcher_0001"; Version="1"; Path="/acme"
The shopping cart now contains one item.
5. User-Agent -> Server
POST /acme/shipping HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"; Part_Number="Rocket_Launcher_0001"; [form data]
The user chooses the shipping method.
6. Server -> User-Agent
HTTP/1.1 200 OK Set-Cookie2: Shipping="FedEx"; Version="1"; Path="/acme"
The new cookie reflects the shipping method.
7. User-Agent -> Server
POST /acme/process HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"; Part_Number="Rocket_Launcher_0001"; $Path="/acme"; Shipping="FedEx"; $Path="/acme" [form data]
That's it.
There's one more thing I want you to be aware of. Cookies aren't perfect either. Besides security concerns, cookies A problem working with the REST architectural style (Section about misusing cookies).
About cookies RFC 2965You can find more information at .
Conclusion
You've learned about the strengths and potential pitfalls of content personalization. You're also aware of the different methods servers can use to identify you. In Part 4 of this series, we'll discuss the most important type of client identification: authentication.
If you find some of the concepts in this chapter unclear, see parts 1 and 2 of the HTTP series.
Thanks for reading and feel free to leave a comment.
Next Lesson: HTTP Lessons – Lesson 4 – Client Authentication Mechanisms
References;
- The HTTP reference: https://www.code-maze.com/the-http-reference
- The HTTP series part 1: https://www.code-maze.com/http-protocol-overview-part1
- The HTTP series part 2: https://www.code-maze.com/http-series-part-2
- HTTP: The Definitive Guide: http://shop.oreilly.com/product/9781565925090.do
- Confirmation bias explained: https://en.wikipedia.org/wiki/Confirmation_bias
- REST anti-patterns: https://www.infoq.com/articles/rest-anti-patterns
- Cookies RFC: https://www.ietf.org/rfc/rfc2965.txt
Original Article:
https://www.code-maze.com/http-series-part-3/