The Hard Parts of Servers and Node.js - Notebook

·

7 min read

The Hard Parts of Servers and Node.js - Notebook

Hi folks, these are my public notes on an awesome workshop deep dive into the internal of Node.js with Will Sentance.

Hope you will find something helpful here.

JavaScript, Node, and The computer

We had to better understand JavaScript to understand Node.js

It’s a language that does 3 things, and 1 involves a lot of help from C++

  1. Saves data and functionality (code)

  2. Uses that data by running functionality (code) on it.

  3. Has a ton of built-in labels that trigger Node features that are built in C++ to use our computer’s internals

Let’s see JavaScript other Talent - built-in labels that trigger Node features

We can setup with JavaScript label, a Node.js feature and computer internals to wait for requests for html/css/js from our users.

But how?

The most powerful built-in Node feature of all http and its associated built-in label in JS - is also http conveniently

  • Using http feature of Node to setup an open socket
const server = http.createServer();
server.listen(80);

Inbound web request → run code to send back message

If inbound message → send back data

But at what moment?

Using Node APIs

Node auto-runs the code (function) for us when a request arrives from a user

function doOnIncoming(incomingData, functionsToSetOutGoingData) {
    functionsToSetOutGoingData.end("welcome to our server!")
};

const server = http.createServer(doOnIncoming);
server.listen(80);
  1. We don’t know when the inbound request will come - we have to rely on Node to trigger JS code to run

  2. People often end up using req and res for the parameters

  3. JavaScript is single-threaded & synchronous. All slow work (ex: speaking to the DB) is done by Node in the background

2 parts of calling a function - executing its code and inserting input (arguments)

  • Node not only will run our function at the right moment, it will also automatically insert whatever the relevant data as the additional argument (input)

  • Sometimes, it will even insert a set of functions in an object (as an argument) which gives us direct access to the message in Node, being sent back to the user, and allows us to add data to the message

Messages are sent in HTTP format - The “protocol” for browser-server interaction

  • HTTP Message: Request line (url, method), Headers (metadata about the request), Body (optional)

Our return message is also in HTTP format

We can use the body to send the data and the headers to send important metadata.

In the headers we can include info on the format of the data being sent back - for example it’s html so to load it as a webpage.

Events and Error Handling

In server-side development, do we get errors?

It’s understandable cause we’re interacting with others’ computers over the internet.

There’s a lot of issues that could arise.

How can we handle this?

We need to understand our background Node http server feature better.

Node will automatically send out the appropriate event depending on what it gets from the computer internals (http message or error)

function doOnIncoming(incomingData, functionsToSetOutGoingData) {
    functionsToSetOutGoingData.end("welcome to our server!")
};

function doOnError(errorInfo) {
    console.error(errorInfo);
};

const server = http.createServer();
server.listen(80);

server.on('request', doOnIncoming);
server.on('clientError', doOnError);

Reading from the File System with fs

Importing tweets with fs

function cleanTweets(tweetsToClean) {
    // code that removes bad tweets
}   

function useImportedTweets(errorData, data) {
    const cleanedTweetsJson = cleanTweets(data);
    const tweetsObj = JSON.parse(cleanedTweetsJson);
    console.log(tweetsObj.tweet2);
}

fs.readFile('./tweets.json', useImportedTweets);
  • Every file has a path (a link - like a domestic url)

  • JSON is a JS-ready data format

Here we want to use the JavaScript labels for Node C++ features that’s written in C++, that do have access to our file system - at the computer's internal features - operating system level.

fs.readFile('./tweets.json', useImportedTweets);

The auto-run function useImportedTweets will be executed when the tweet.json is finished being read.

Streams in Node

  • Streams in Node or in Computer science in general is Chunks of data

What if Node used the event message-broadcasting pattern to send out a message (event) each time a sufficient batch of the JSON data had been loaded in

And at each point, take that data and start cleaning it - in batches

let cleanedTweets = "";

function cleanTweets(tweetsToClean) {
    // algorithm to remove bad tweets from `tweetsToClean`
}

function doOnNewBatch(data) {
    cleanedTweets += cleanTweets(data);
}

const accessTweetsArchive = fs.createReadStream('./tweetsArchive.json')

accessTweetsArchive.on('data', doOnNewBatch);

We can break down any inbound flow of data into chunks.

On each chunk, run a function on it, to do it in that batch.

The call stack, event loop and callback queue in Node

  • All the stuff that we’re relying on Node to autorun at some certain time and put it back into JavaScript again, that stuffs all get priority AFTER all regular JS code is run.
function useImportedTweets(errorData, data) {
    const tweetsObj = JSON.parse(data);
    console.log(tweetsObj.tweet1);
}

function immediately() {console.log("Run me last")};

function printHello() {console.log("Hello")};

function blockFor500ms() {
    // Block JS thread DIRECTLY for 500ms
    // with e.g a for loop with 5000 elements
}

setTimeout(printHello, 0);

fs.readFile("./tweet.json", useImportedTweets);

blockFor500ms();

console.log("ME FIRST");

setImmediate(immediately);

First

setTimeout(printHello, 0);

At 0ms, this timeout is finished, but the printHello function will not be pushed on top of the callstack and executed immediately.

Instead, it will be registered on a Timer Queue to run later.

Next,

fs.readFile("./tweet.json", useImportedTweets);

With the help of Node C++ Features, and Libuv, we can access the internal file system at the operating system level.

But the readFile will also run non-blocking in the background, which will not be resolved immediately.

Let’s move on

blockFor500ms();

Here we have a regular JavaScript function, that will run and take a long time to finish, it will block the JS main thread of execution for literally 500ms.

When this function is executed, it will have its execution context, popped on top of the callstack.

At this time, our fs.readFile finished, we got the data comeback from the file system.

But at this moment, the auto-run callback function attached to the readFile is also NOT be executed immediately, it will be registered and wait in the IO Queue.

By 95% of the functions you’re gonna have autorun in Node, will end up in this queue. For instance, anything involved in data from the file system, a network socket,…

AFTER 500ms,

Our blockFor500ms finished, its execution context poped off the call stack.

NOW, we move to the next line

console.log("ME FIRST");

We have a log of ME FIRST

LAST LINE

setImmediate(immediately);

By the name itself, maybe you thought this line would be executed IMMEDIATELY?

NOPE!

  • This function will be ABSOLUTE OPPOSITE of running immediately

  • It’s the WORST named function in all of history

  • It certainly is NOT gonna run immediately, it will be the LAST QUEUE to be check.

The immediately function will be registered in the Check Queue

Now, we have 0 things left on the global code, and 0 thing on the callstack.

HERE IS the time for Event Loop to come into play!

FIRST, we check the Timer Queue. Push the printHello on top of callstack, and execute it. Then log Hello to the console

SECOND, check the I/O Callback Queue, useImportedTweets will be executed, with auto-imported data from Node, specifically, Error = null, and data , log the content of tweet1 to console.

LAST, check the Check Queue and put the immediately on top of callstack, execute it.

Log run me last

Rules for the automatic execution of the JS code by Node

  • Hold each deferred function in ONE of the task queues when Node background API “completes”

  • Add the function to the Call Stack (i.e execute the function) ONLY when the call stack is totally EMPTY (Have the event loop check this condition)

  • Prioritize tasks in the MicroTask Queue OVER Timer queue over IO queue, over Check (immediately) queue, and over the Close queue.

  • Any close event, with associated functions will go into a Close Queue

In the MicroTask Queue, we have 2 smaller one

a) for process.nextTick (we’re not used anymore)

b) for any function delayed using Promises - they get stuck in this one here.

In between the Event Loop doing the check on all other Queue, it will always go back and check the Micro Task Queue before it moves on to check next queue.