In the previous blog post, we saw how to create a production-ready logging system. In this blog post, we will see how to analyze the logs generated by the system.

🫱🫱🫱🫱Let's build a production-ready logger using Winston🫲🫲🫲🫲

While many advanced tools are available for log analysis, such as the ELK stack and Splunk, we will see how to do it the lame way using JavaScript🥱🥱.

The Log Format

The logs generated by the logger are in the following format:

Understanding The Log Format

The log format includes the following fields:

timestamp: The time at which the log was generated
level: The log level (DEBUG, INFO, WARN, ERROR)
Source IP: The IP address of the source of the client making the request
kind of request: The kind of request made by the client (GET, POST, PUT, DELETE)
response time: The time taken by the server to respond to the request
path: The path of the request made by the client
status code: The status code of the response sent by the server
response time: The time taken by the server to respond to the request
total time: The total time taken by the server to process the request
Remote IP: The IP address of the client making the request

The Log Analyzer

Laying The Basis For The Log Analyzer

First, create a regular expression to parse the log format.

const logLevelPattern = /(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\s+(?<level>\w+):\s+(?<client_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)\s+(?<method>\w+)\s+(?<path>\/\S*)\s+(?<status>\d{3})\s+(?<response_time_1>\d+\.\d{3})\s+ms\s+(?<response_time_2>\d+\.\d{3})\s+ms\s+(?<other_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)/g;

The regular expression seems daunting at first sight, but it's not that hard to understand. It consists of named groups for each field in the log format. The named groups are enclosed in (?<name>pattern).

Using named groups makes it easier to extract the fields from the log line. For example, to extract the level field from the log line, we can use the following code:

const logLine = '2024-07-04 10:42:09 error: ::ffff:72.200.26.133 GET /api/v1/resource 201 0.082 ms 0.177 ms ::ffff:72.200.26.133';

const match = logLevelPattern.matchAll(logLine);

for (const m of match) {
    //We are interested in the groups from the match object because we have used named groups in the regex.
    console.log(m.groups);
    console.log(m.groups.level);
}

The match object will have this kind of output:

The groups property of the match object will have the named groups extracted from the log line. We can access the level field using m.groups.level.

Analyzing The Logs Using The Log Analyzer

Now that we have the regular expression to parse the log format, let's create a log analyzer that reads the logs from a file and analyzes them.

import { EventEmitter } from 'events';
import fs from 'fs';
import path from 'path';

class LogAnalyzer extends EventEmitter {
    // initialize the log analyzer with the log file stream and the log level pattern and initialize the objects to store the analysis results
    constructor(logFileStream, logLevelPattern) {
        super();
        this.logFileStream = logFileStream;
        this.logLevelPattern = logLevelPattern;
        this.time = {};
        this.paths = {};
        this.ips = {};
        this.responseTime = [];
        this.totalResponseTime = [];
        this.count = 0;
    }

    //method to start the analysis
    analyze() {
        this.logFileStream.on('ready', () => console.log('================START======================='));
        this.logFileStream.on('data', this.processChunk.bind(this));
        this.logFileStream.on('end', this.finishAnalysis.bind(this));
    }

    //method to process each chunk of data read from the log file
    processChunk(chunk) {
        console.log('Processing chunk:', this.count);

        this.logFileStream.pause();

        const output = chunk.toString().matchAll(this.logLevelPattern);

        for (const match of output) {
            //extract the named groups from the match object
            const { groups } = match;
            //update the objects with the extracted fields
            this.updateObjects(groups);
        }
        this.count++;

        this.logFileStream.resume();
    }

    //method to update the objects with the extracted fields
    updateObjects(groups) {
        const hourKey = groups.time.split(':')[0] + ':00';
        this.time[hourKey] = (this.time[hourKey] || 0) + 1;
        this.updateObject(this.paths, groups.path);
        this.updateObject(this.ips, groups.client_ip);
        this.responseTime.push(parseFloat(groups.response_time_1));
        this.totalResponseTime.push(parseFloat(groups.response_time_2));
    }

    //method to update an object with a key
    updateObject(obj, key) {
        obj[key] = (obj[key] || 0) + 1;
    }

    //method to finish the analysis
    finishAnalysis() {
        console.log('================END=========================');
        console.log("Let's perform some analysis on the log file");

        this.emit('analysisComplete', this.getAnalysisResults());
    }

    //method to sort an object based on the count of the keys
    sortingObject(obj, max = true) {
        //sorting is based on the count of the keys to get the most used or least used keys
        if (max) {
            return Object.entries(obj).sort((a, b) => b[1] - a[1]);
        } else {
            return Object.entries(obj).sort((a, b) => a[1] - b[1]);
        }
    }

    //method to get the analysis results
    getAnalysisResults() {
        return {
            timeDistribution: this.sortingObject(this.time, true).slice(0, 4),
            mostUsedPathDistribution: this.sortingObject(this.paths, true).slice(0, 4),
            leastUsedPathDistribution: this.sortingObject(this.paths, false).slice(0, 4),
            ipDistribution: this.sortingObject(this.ips, true).slice(0, 4),
            avgResponseTime: this.average(this.responseTime).toFixed(5),
            avgTotalResponseTime: this.average(this.totalResponseTime).toFixed(5),
        };
    }

    average(arr) {
        return arr.reduce((a, b) => a + b, 0) / arr.length;
    }
}

// Usage:
const logLevelPattern = /(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\s+(?<level>\w+):\s+(?<client_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)\s+(?<method>\w+)\s+(?<path>\/\S*)\s+(?<status>\d{3})\s+(?<response_time_1>\d+\.\d{3})\s+ms\s+(?<response_time_2>\d+\.\d{3})\s+ms\s+(?<other_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)/g;

const logFileStream = fs.createReadStream(path.resolve('./logs/combined/generated_logs.log'), {
    encoding: 'utf-8',
    highWaterMark: 10 * 1024, // 10KB
});

const analyzer = new LogAnalyzer(logFileStream, logLevelPattern);

analyzer.on('analysisComplete', (results) => {
    console.log('Analysis complete. Results:', results);
});

analyzer.analyze();

Workflow:

The logFileStream is created using fs.createReadStream to read the log file in chunks.
The LogAnalyzer class is created with the logFileStream and logLevelPattern as arguments.
The analyze method is called on the LogAnalyzer instance to start the analysis.
The processChunk method is called when a chunk of data is read from the log file. It processes the chunk and extracts the fields from the log line using the logLevelPattern.
In the processChunk method, updateObjects is called to update the objects with the extracted fields.
The finishAnalysis method is called when the analysis is complete. It emits an event analysisComplete with the analysis results.
The sortingObject method is used to sort the objects based on the count of the keys.
The getAnalysisResults method returns the analysis results.
The average method calculates the average of an array of numbers.
The analysisComplete event is listened to and the analysis results are logged to the console.
The analyze method is called to start the analysis.

The LogAnalyzer class reads the log file in chunks and processes each chunk to extract the fields from the log line. It updates the objects with the extracted fields and calculates the average response time and total response time. Finally, it emits an event analysisComplete with the analysis results.

The analysis results include the time distribution, most used path distribution, least used path distribution, IP distribution, average response time, and average total response time.

Output:

This way we can analyze the logs generated by the logger using the log analyzer in an absolutely lame way😂😂

You can find the complete code for the log analyzer here

Generating random logs

Let's address the elephant in the room: not everyone has a log file from a real production system to analyze. So, let's generate some random logs using the faker library.

fakerjs documentation

import fs from 'fs';
import { faker } from '@faker-js/faker';

// Sample data
const methods = ["GET", "POST", "PUT", "DELETE"];
const paths = ["/", "/api/v1/resource", "/login", "/register"];
const ips = ['45.94.188.156', '238.249.31.148', '91.113.1.90', '113.232.207.105', '96.129.247.250', '105.171.179.234', '144.42.125.14', '109.111.74.178', '72.200.26.133', '83.65.134.149'];
const statusCodes = [200, 201, 400, 404, 500];
const levels = ["info", "error", "warn", "debug"];

// Function to generate a random timestamp
const randomTimestamp = (start, end) => {
    const startDate = start.getTime();
    const endDate = end.getTime();
    return new Date(startDate + Math.random() * (endDate - startDate));
};

// Generate log entries
const logEntries = [];
const startDate = new Date(2024, 6, 4, 10, 41, 0); // Months are 0-based in JavaScript
const endDate = new Date(2024, 6, 4, 10, 45, 0);

for (let i = 0; i < 1000; i++) {  // Generate 500 log entries

    //Generating random log entries
    const timestamp = randomTimestamp(startDate, endDate);
    const dateStr = timestamp.toISOString().replace('T', ' ').substring(0, 19);
    const level = faker.helpers.arrayElement(levels);
    const clientIp = faker.helpers.arrayElement(ips);
    const method = faker.helpers.arrayElement(methods);
    const path = faker.helpers.arrayElement(paths);
    const status = faker.helpers.arrayElement(statusCodes);
    const responseTime1 = (Math.random() * (0.1 - 0.05) + 0.05).toFixed(3);
    const responseTime2 = (Math.random() * (0.3 - 0.1) + 0.1).toFixed(3);
    const otherIp = faker.helpers.arrayElement(ips);

    logEntries.push(`${dateStr} ${level}: ::ffff:${clientIp} ${method} ${path} ${status} ${responseTime1} ms ${responseTime2} ms ::ffff:${otherIp}`);
}

// Write to log file
fs.writeFileSync('./logs/combined/generated_logs.log', logEntries.join('\n'));

console.log("Log file generated successfully.");

This script generates 1000 random log entries and writes them to a log file generated_logs.log. The log entries are generated with random timestamps, log levels, client IPs, methods, paths, status codes, response times, and other IPs.

You can find the complete code for generating random logs here

Conclusion

In this blog post, we saw how to analyze the logs generated by the logger using the log analyzer. We created a log analyzer that reads the logs from a file, parses the log format, and analyzes the logs. We also generated random logs using the faker library to test the log analyzer.

While the log analyzer is a simple way to analyze logs, it is not suitable for large log files or real-time log analysis. For large log files or real-time log analysis, you can use advanced log analysis tools like ELK stack, Splunk, etc. But for small log files or testing purposes, the log analyzer is a good starting point.

So... Thanks for reading this blog post. I hope you enjoyed it. If you have any questions or feedback, feel free to leave a comment below. And I'll...

PS:

I typically write these articles in the TIL form, sharing the things I learn during my daily work or afterward. I aim to post once or twice a week with all the things I have learned in the past week.

Analyzing logs - the lame way