HTTP Dersleri - Ders 3 - İstemci Kimliği

HTTP Lessons – Lesson 1 – Overview of basic concepts
HTTP Lessons – Lesson 2 – Architectural Aspects
HTTP Lessons – Lesson 3 – Client Identity
HTTP Lessons – Lesson 4 – Client Authentication Mechanisms
HTTP Lessons – Lesson 5 – Security
HTTP Lessons – Glossary

So far, you've learned the basic concepts and some of the architectural aspects of HTTP. This brings us to the next important topic for HTTP: client identification.

In this article, you'll learn why client identification is important and how web servers can identify you (your web client). You'll also see how this information is used and stored.

In this article, you will learn more about:

Client ID and why it's so important
Different ways to identify the client
HTTP request headers used for identification
IP address
Long (fat) URLs
Cookies

First, let's see why websites need to know about you.

Client ID and Why It's So Important

As you are aware, every website, at least those that care enough about you and your actions, includes some content personalization.

What do I mean by that?

These personalizations include things like recommended products when you visit e-commerce sites, recommendations of “people you may know/want to add” on social media, recommended videos, ads that know what you need, articles that may interest you.

This feels like a double-edged sword. On one hand, it's great to have personalized, tailored content delivered to you. On the other hand, it can lead to all sorts of stereotypes and prejudices.

But how can we live without knowing how our favorite team scored last night or what celebrities did last night?

Either way, content personalization has become a part of our daily lives, and we probably don't want to do anything about it.

Let's see how web servers can identify you to achieve this effect.

Different Methods to Identify the Client

There are several ways a Web server can identify you:

HTTP request headers
IP address
Long URLs
Cookies
Login information (authentication)

Let's go over them all. HTTP authentication is explained in more detail in Chapter 4 of HTTP Lessons.

HTTP Request Headers Used for Identification

Web servers have several ways to extract information about you directly from HTTP request headers.

These titles are:

From – contains the user's email address
User-Agent – Contains information about the web client
Referer – Contains the source from which the user came
Authorization – includes username and password
Client-ip – contains the user's IP address
X-Forwarded-For – contains the user's IP address (when going to the proxy server)
Cookie – Cookies containing an ID tag generated by the server

Theoretically, From header would be ideal for uniquely identifying the user, but in practice, this header is very rarely used due to security concerns of email collection.

User-Agent The header contains information such as browser version, operating system, etc. While this is important for customizing content, it does not identify the user in a more relevant way.

Referer The header tells the server where the user is coming from. This information is used to improve understanding of user behavior, but less often to identify it.

While these headers provide some useful information about the client, they are not enough to personalize the content in a meaningful way.

The remaining headings provide more precise identification mechanisms.

IP address

Client identification by IP address was widely used in the days when IP addresses were not so easily spoofed/swapped. While it can be used as an additional security check, it is not reliable enough to be used on its own.

Here are some of the reasons:

Not the user, the machine defines
NAT firewalls – Many ISPs (Internet service providers) use NAT firewalls to increase security and deal with IP address shortages
Dynamic IP addresses – users usually get dynamic IP address from ISP
HTTP proxies and gateways – these can hide the original IP address. Some proxies use Client-ip or X-Forwarded-For to protect the original IP address

Long (fat) URLs

It's not uncommon to see websites use URLs to improve the user experience. As the user browses the website, they add information until the URLs become cluttered and unreadable.

You can see what the long URL looks like by browsing the Amazon store.

https://www.amazon.com/gp/product/1942788002/ref=s9u_psimh_gw_i2?ie=UTF8&fpl=fresh&pd_rd_i=1942788002&pd_rd_r=70BRSEN2K19345MWASF0&pd_rd_w=KpLza&pd_rd_wg=gTIeL&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=&pf_rd_r=RWRKQXA6PBHQG52JTRW2&pf_rd_t=36701&pf_rd_p=1cf9d009-399c-49e1-901a-7b8786e59436&pf_rd_i=desktop

There are several problems with using this approach.

Ugly
Cannot be shared
Cuts the cache
Limited to the current session
Increases the load on the server

Cookies

The most current method of client identification, excluding authentication. Developed by Netscape, but now supported by every browser.

There are two types of cookies: session cookies and persistent cookies. Session cookies are deleted when the cookie exits the browser, while persistent cookies are saved to disk and can be used longer. For a session cookie to be treated as a persistent cookie, the Max-Age or Expiry parameters must be set.

Modern browsers like Chrome and Firefox can keep background processes running when you close them, so you can pick up where you left off. This can cause session cookies to be retained, so be careful.

So how do cookies work?

Cookies contain a list of name-value pairs that are set by the server using the Set-Cookie or Set-Cookie2 response header. Typically, the information stored in the cookie is some form of client identification, but some websites also store other information.

The browser stores this information in its cookie database and returns it the next time the user visits the page/website. The browser can process thousands of different cookies and knows when to serve each one.

Here is a sample flow.

1. User-Agent -> Server

POST /acme/login HTTP/1.1 [form data]

2. Server -> User-Agent

HTTP/1.1 200 OK Set-Cookie2: Customer="WILE_E_COYOTE"; Version="1"; Path="/acme"

The server sends the Set-Cookie response header to tell the User-Agent (browser) to set a cookie about the user.

3. User-Agent -> Server

POST /acme/pickitem HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme" [form data]

The user selects the item into the shop cart.

4. Server -> User-Agent

HTTP/1.1 200 OK Set-Cookie2: Part_Number="Rocket_Launcher_0001"; Version="1"; Path="/acme"

The shopping cart now contains one item.

5. User-Agent -> Server

POST /acme/shipping HTTP/1.1 Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme";         Part_Number="Rocket_Launcher_0001"; [form data]

The user chooses the shipping method.

6. Server -> User-Agent

HTTP/1.1 200 OK Set-Cookie2: Shipping="FedEx"; Version="1"; Path="/acme"

The new cookie reflects the shipping method.

7. User-Agent -> Server

POST /acme/process HTTP/1.1 Cookie: $Version="1";         Customer="WILE_E_COYOTE"; $Path="/acme";         Part_Number="Rocket_Launcher_0001"; $Path="/acme";         Shipping="FedEx"; $Path="/acme" [form data]

That's it.

There's one more thing I want you to be aware of. Cookies aren't perfect either. Besides security concerns, cookies A problem working with the REST architectural style (Section about misusing cookies).

About cookies RFC 2965You can find more information at .

Conclusion

You've learned about the strengths and potential pitfalls of content personalization. You're also aware of the different methods servers can use to identify you. In Part 4 of this series, we'll discuss the most important type of client identification: authentication.

If you find some of the concepts in this chapter unclear, see parts 1 and 2 of the HTTP series.

Thanks for reading and feel free to leave a comment.