HTTP Lessons – Lesson 2 – Architectural Aspects

HTTP Lessons – Lesson 1 – Overview of basic concepts
HTTP Lessons – Lesson 2 – Architectural Aspects
HTTP Lessons – Lesson 3 – Client Identity
HTTP Lessons – Lesson 4 – Client Authentication Mechanisms
HTTP Lessons – Lesson 5 – Security
HTTP Lessons – Glossary

In the first article of this series, we discussed the fundamental concepts of HTTP. Now that we have a foundation to build upon, we can discuss some of HTTP's architectural aspects. There's more to HTTP than simply sending and receiving data.

HTTP cannot function as an application protocol on its own. It requires an infrastructure in the form of a hardware and software solution that provides various services and enables communication over the World Wide Web to be possible and efficient.

In this article, you will learn about:

  • Web Servers
  • Proxy Servers
  • Caching
  • Gateways, Tunnels and Relays
  • Web Browsers

They're an integral part of our online lives, and you'll learn exactly what each one is for and how it works. This information will help you connect the dots from the first article and further understand the HTTP communication flow.

Let's get started.

Web Servers

As explained in the first article, the primary function of a web server is to store resources and serve them as soon as it receives requests. You access the web server using a web client (aka a web browser) and, in return, retrieve the requested resource or modify the state of existing ones. Web servers can also be accessed automatically using web browsers; we'll discuss this later.

Some of the most popular Web servers, and the ones you've probably heard of, are Apache HTTP Server, Nginx, IIS, Glassfish...

Web servers can range from very simple and easy to use to sophisticated and complex pieces of software. Modern web servers can perform a wide variety of tasks. The main tasks a web server can perform include:

  • Establishing connections – accept or close the client connection
  • Request to receive – reading an HTTP request message
  • Transaction request – interpret the request message and take action
  • Access source – access the resource specified in the message
  • Create a reply – Create HTTP response message
  • Send reply – send the response back to the client.
  • Save the transaction – write to a log file about the completed operation

I'll explain the basic web server flow in several distinct stages. These stages represent a very simplified version of the web server flow.

Step 1: Establishing a connection

When a web client wants to access a web server, it attempts to open a new TCP connection. The server, on the other hand, finds the client's IP address. It's then up to the server to decide whether to open or close the TCP connection to that client.

If the server accepts the connection, it adds it to the list of available connections and tracks the data on that connection.

It can also close the connection if the client is not authorized or is blacklisted (malicious).

The server may also attempt to determine the client's hostname using "reverse DNS." This information can be helpful when logging messages, but hostname lookups can take some time and slow down operations.

Stage 2: Receiving/Processing Requests

When parsing incoming requests, web servers parse the request line, headers, and body (if any). One thing to note is that the connection can pause at any time, and in this case, the server must temporarily store the data until it receives the rest.

High-end web servers must be able to open many simultaneous connections. This includes multiple simultaneous connections from the same client. A typical web page may request many different resources from the server.

Step 3: Accessing the resource

Because web servers are primarily resource providers, they have multiple ways to map and access resources.

The simplest way to map a resource is to use the request URI to locate the file in the Web server's file system. Typically, resources are placed in a special folder on the server called docroot. For example, docroot on a Windows server F:\WebResources\ can be found on . A GET request, /images/codemazeblog.txt If the server wants to access the file in the file, F:\WebResources\images\codemazeblog.txt file and returns that file in the response message. When multiple websites are hosted on the web server, each can have a separate docroot.

If a Web server receives a request for a directory instead of a file, it can resolve it in several ways. It can return an error message, return the default index file instead of the directory, or scan the directory and return the content in an HTML file.

The server also maps the request URI to a dynamic resource—a software application that produces a dynamic result. It is used to connect web servers to complex software solutions and deliver dynamic content. application servers There is a class of servers called .

Step 4: Generate and send a response

After determining which resource it should serve, the server generates the response message. The response message includes the status code, response headers, and, if necessary, the response body.

If the response contains a body, the message typically includes a Content-Length header that describes the size of the body and a Content-Type header that describes the MIME type of the returned resource.

After the response is generated, the server selects the client it needs to send the response to. For nonpersistent connections, the server must close the connection once the entire response message has been sent.

Step 4: Logging

When the transaction is completed, the server logs all transaction information to a file. Many servers provide specialized logging.

Proxy Servers

Proxy servers are intermediary servers. They typically sit between a web server and a web client. By their nature, proxy servers must act as both a web client and a web server.

But why do we need proxy servers? Why not just communicate directly between web clients and web servers? Isn't that much simpler and faster?

Simple, maybe, but faster? Not faster, really. But we'll get to that.

Before I explain what proxy servers are used for, I need to mention something. What I'm going to explain is the concept of a reverse proxy, or the difference between a forward proxy and a reverse proxy.

A forward proxy acts as a proxy for a client requesting resources from a web server. It protects the client by filtering through a firewall or hiding information about the client. A reverse proxy works the opposite way. It's typically placed behind a firewall and protects web servers. All clients know is that they're talking to the real web server and remain unaware of the network behind the reverse proxy.

Proxy Sunucu

Proxy Server

Ters Proxy Sunucu

Reverse Proxy Server

Proxies are very useful and have a wide range of applications. Now, let's explore some ways proxy servers work.

  • Compression Compressing content directly increases communication speed. It's that simple.
  • Monitoring and filtering – Do you want to deny access to adult websites to children in primary school? Proxy is the right solution for you.
  • Security – Proxies can serve as a single entry point to an entire network. They can detect malicious applications and restrict application-level protocols.
  • Anonymity – Requests can be modified by the proxy to achieve greater anonymity. It can strip sensitive information from the request and leave only the important information. Sending less information to the server can also degrade user experience, but sometimes anonymity is the more important factor.
  • Access control – Quite simply, you can centralize access control for many servers on a single proxy server.
  • Caching – You can use proxy server to cache popular content, thus significantly reducing loading speeds.
  • Load balancing – If you have a service that receives a lot of "peak traffic," you can use a proxy to distribute the workload across more compute resources or web servers. Load balancers redirect traffic to prevent a single server from becoming overloaded during peaks.
  • Transcoding – Modifying the content of the message body may also be the responsibility of the proxy

As you can see, proxies can be versatile and flexible.

Caching

Web caches are devices that automatically create copies of requested data and store them locally.

By doing this you can:

  • Reduces traffic
  • Eliminates network bottlenecks
  • Prevents server overload
  • Reduces response delay over long distances

So, you could say that web caches improve both user experience and web server performance. And of course, you can potentially make more money.

The fraction of requests served from the cache is called the Speed Ratio. It can range from 0 to 1, where 0.% is 0 and 1.% is 100 requests. The ideal goal is, of course, to achieve 100 requests, but the actual number is usually closer to % 40.

Here's what the basic Web caching workflow looks like:

İşte temel Web önbellek iş akışının görünüşü

Gateways, Tunnels and Relays

Over time, as HTTP matured, people found many different ways to use it. HTTP became useful as a framework for connecting different applications and protocols.

Let's see how it goes.

Gateways

Gateways are pieces of hardware that can enable HTTP to communicate with different protocols and applications by abstracting a way to retrieve a resource. These are called protocol converters and are much more complex than routers or switches because they use multiple protocols.

For example, you can use a gateway to retrieve a file via FTP by sending an HTTP request. Or, you can take an encrypted message over SSL and convert it to HTTP (Client-Side Security Accelerator Gateways) or convert HTTP into more secure HTTP messages (Server-Side Security Gateways).

Tunnels

Tunnels use the CONNECT request method. Tunnels enable sending non-HTTP data over HTTP. The CONNECT method requests that the tunnel open a connection to the destination server and pass data between the client and server.

CONNECT request example;

CONNECT api.github.com:443 HTTP/1.0 User-Agent: Chrome/58.0.3029.110 Accept: text/html,application/xhtml+xml,application/xml

CONNECT response;

HTTP/1.0 200 Connection Established Proxy-agent: Netscape-Proxy/1.1

The CONNECT response does not need to specify the Content-Type, unlike a regular HTTP response.

Once the connection is established, data can be sent directly between the client and the server.

Relay

Relays are against the rules of the HTTP world and don't have to comply with them. They are dumbed-down versions of proxies that pass on information they receive as long as they can establish a connection using minimal information in the request messages.

Its sole purpose is to fulfill the need for a proxy implementation with as little hassle as possible. This can also potentially lead to trouble, but it's very convenient to use and certainly has its benefits to consider when using relays.

Spiders (Web Crawler)

Arama Motorları - Web Crawler)

Commonly called spiders, they are bots that crawl the World Wide Web and index its content. Spiders are the primary tools of search engines and many other websites.

Spiders are completely automated pieces of software and require no human interaction to function. Spiders can vary widely in complexity, and some spiders are quite complex pieces of software (like search engines).

Spiders consume the resources of the website they visit. Therefore, public websites have a mechanism to tell crawlers which parts of the website to crawl or which to crawl nothing at all. This can be done using robots.txt (the robots exclusion standard).

Of course, since it's just a standard, robots.txt can't prevent uninvited spiders from crawling a website. Some malicious robots include email harvesters, spambots, and malware.

Here are a few examples of robots.txt files:

User-agent: * Disallow: /

This tells all spiders to stay out.

User-agent: * Disallow: /somefolder/ Disallow: /notinterestingstuff/ Disallow: /directory/file.html

This only refers to specific directories and a single file.

User-agent: Googlebot Disallow: /private/

You can block a specific spider (Google Bot) like in this case.

Given the vast nature of the World Wide Web, even the most powerful spiders cannot crawl and index it all. Therefore, they use a selection policy to crawl the most relevant sections. Furthermore, the WWW changes frequently and dynamically, so spiders must use a freshness policy to determine whether to revisit websites. Spiders can easily overwhelm servers by requesting them too quickly, so a freshness policy is implemented. Most known spiders poll servers every 20 seconds, with a 3-4 minute interval to avoid creating server load.

You may have heard of the mysterious and sinister deep web, or dark web. It's not part of the web itself, and it's intentionally not indexed by search engines to hide information.

Conclusion

You should now have a better picture of how HTTP works, and there's much more to it than just requests, responses, and status codes. HTTP has a whole infrastructure of different hardware and software components that it uses to achieve its potential as an application protocol.

Every concept I've discussed in this article is large enough to cover an entire article or even a book. My goal is to give you a rough overview of the different ideas so you know how they all fit together and what to look for when the need arises.

If you found some of the explanations a bit short and vague and missed my previous articles, be sure to visit part 1 of the series and the HTTP reference where I talk about the basic concepts of HTTP.

Thanks for reading and keep checking out part 3 of the series where I explain the different ways servers can identify clients.

If you find this article useful, please leave a comment.

Next lesson: HTTP Lessons – Lesson 3 – Client Identity

References;

Original Article: https://www.code-maze.com/http-series-part-2/

5 1 vote
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments