How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Have you ever typed a URL in your browser and wondered:
“What actually happens after I press Enter?”
Most people think a browser just opens websites. But the truth is, browsers are complex systems that transform code into pixels on your screen — all in a few milliseconds! Let’s take a step-by-step journey and understand how this happens.
What is Browser ?
A browser is an application for accessing websites to show on users screen. Internally it:
Fetches resources like HTML, CSS, and JavaScript documents from a server
Understands the structure of the page
Applies styles and layouts
Executes scripts
Renders pixels on your screen
Now we should see how browser do these all things what is inside the browser —
Main Parts of a Browser
Browser is like a collection of components working together. Each component does their own part then we see the result on our screen. Let’s take a look at each components-
User Interface (UI) – address bar, tabs, buttons, bookmarks. This is what we see.
Browser Engine – the conductor, coordinating other parts. Acts like a bridge between UI and Rendering Engine.
Rendering Engine – the artist, painting the web page. Examples of rendering engines:Blink (Chrome, Edge), Gecko (Firefox), WebKit (Safari)
Networking – fetches resources from the server. Uses protocols like HTTP, HTTPS, DNS.
JavaScript Engine – executes scripts and manipulates the page. Examples of JS engines: V8 (Chrome, Edge), SpiderMonkey (Firefox), JavaScriptCore (Safari)
UI Backend – interacts with your operating system to draw windows and widgets.
Data Storage – stores cookies, cache, localStorage, IndexedDB.

How browser works step by step
Networking component send HTTP request to the server and server sends the response to the Browser.
Once the browser receive the full chunk of data, it began parsing. Parsing is done by rendering engine.
Parsing is the step the browser takes to turn the data it receives over the network into the DOM and CSSOM, which is used by the render engine to paint a page to the screen.
DOM Tree
Browser first read the raw html document from the network and convert each byte to character based on file’s encoding (e.g. - UTF-8).
Then html parser (part of the Rendering Engine) in the browser convert these characters into distinct tokens. Each token represents a part of the HTML structure, such as
<html>,<body>, or a text string.Then Each token then converts into object or node like HTML tag becomes an "element node", text becomes a "text node", and a comment becomes a "comment node".
Finally the parser links these nodes together in hierarchical order like tree structure which creates the DOM tree.

CSSOM
The browser first fetches the CSS files from the network (or reads
<style>in HTML).The CSS parser (part of the Rendering Engine) reads the raw CSS text character by character.
It tokenizes the CSS: selectors, properties, and values.
Each rule is converted into objects called CSS rules.
These objects are then linked together into a hierarchical tree called the CSS Object Model (CSSOM).
- For example, rules inside a media query are nested under that query node.
The CSSOM represents all style information the browser will apply to the DOM nodes.
Render Tree
The Rendering Engine combines the DOM tree (structure) and the CSSOM tree (styles).
This combination creates the Render Tree.
The Render Tree contains only visible elements.
- Elements like
display: noneare excluded.
- Elements like
Each node in the Render Tree knows:
What element it represents
What styles apply to it
JavaScript Execution
While parsing HTML, if the browser encounters JavaScript:
- The JavaScript Engine executes the code.
JavaScript can:
Modify the DOM (add/remove elements)
Modify styles (which updates the CSSOM)
If DOM or CSSOM changes:
The Render Tree may need to be rebuilt
Layout and painting may happen again
Frame Construction
After the Render Tree is ready, the Rendering Engine performs Frame Construction.
Each render tree node is converted into a frame (box).
Frames store:
Size
Position
Style information
Think of frames as boxes that will be placed on the page.
Layout (Reflow)
Now the browser calculates:
Exact width and height of each frame
Exact position on the screen
This step is called Layout or Reflow.
Any change in size (font-size, width, viewport resize) can trigger reflow.
Painting
After layout, the browser starts painting.
Painting means:
Filling pixels with colors
Drawing text, borders, shadows, images
Each visual part of a frame is painted separately.
Final Render
After compositing, the final image is sent to the screen.
You now see the web page.

Very Basic Idea of Parsing
Imagine you see this written:
2 + 3 × 4
Step 1: Read the raw characters
The computer first sees characters, not math:
'2' '+' '3' '×' '4'
This is just text.
Step 2: Tokenization (breaking into meaningful pieces)
Now the parser groups characters into tokens:
2→ Number token+→ Operator token3→ Number token×→ Operator token4→ Number token
So the parser understands: “These are numbers and operators”
Step 3: Build a structure (tree)
Math has rules (operator precedence).
So the parser builds a tree structure like this:
+
/ \
2 ×
/ \
3 4
This tree means:
First calculate
3 × 4Then add
2
Step 4: Use the structure
Now the computer can:
Evaluate it →
2 + (3 × 4) = 14Or transform it
Or optimize it
Summary
When you type a URL and press Enter, the browser fetches data from the server using the networking layer. The rendering engine then parses HTML to build the DOM and parses CSS to build the CSSOM. These two trees are combined to create the render tree, which is used to calculate layout, paint pixels, and display the page on the screen. JavaScript can modify the DOM and CSSOM during this process. Overall, a browser works as a collection of components that cooperate to turn raw web data into a visible, interactive webpage.
Thank you for reading & Happy Coding 👨💻
References
MDN Web Docs – How Browsers Work
https://developer.mozilla.org/en-US/docs/Web/Performance/How_browsers_work




