My Experiences with Data Scripting

Data Science Jul 01, 2019

Scripting is what every developer needs to do at some point in his/her lifetime to get some of their work done over manual labour. One of the good things about being a lone developer for your side projects is that it teaches you to do all the field work. Fetching the data, Obtaining the data, Uploading it, and Working on it. So when I was finding my way on one of such side projects of my own, I learnt just that: Scripting.

The Problem

I had built a Backend for a Website, and had to upload some data into the database. The database had a an array for a Quiz, so each record in it would be in the form of:

{
    "question": "What does the fox say?",
    "option1": "Ding Ding Ding",
    "option2": "Meow Meow",
    "option3": "Bow Bow",
    "option4": "Whatever",
    "correct": 1
}

I was provided a data of 50 questions serially in a Rich Text Format File, and I had to make sure each and every question was visible in the database in the format presented above. Now I had 2 options at this point: Go about copy pasting each and every question in the database manually filling out the options and question field, which would take 50*5 = 250 Copy-Pastes, or find a magical way to do all of this through... scripting!

Before we move ahead with the story, it's important to note that I wasn't a regular script writing magician before this. So if you think I knew exactly what I was going to do at this time, you couldn't be more wrong.

The Process

At first, I had to visualize my solution. I would make a JavaScript file, and run it on Node. What it should do is:

  1. Pick up all the questions in the file
  2. Segregate them into different questions
  3. Build objects in the above given format with their values being the texts in the text file.
  4. Push them to the database.  

Okay, so first question, how do I handle files in JS? Well, just like every question is answered in Node, we have an npm module for it: fs. So I headed to the documentation of this module, and the reading and writing was just a code snippet away. Step 1 completed!

Now how do I segregate the entire text picked up from the file into different question sets and each question sets into different key-value pairs? To be honest, this step really depends on how you analyze the data pattern and how lucky you are with the data format provided.

Organizing the Data

I logged the data onto the screen, and I saw a pattern which was anyway obvious in the text file. Each question was seperated from the other by a blank line, and each option was printed on a new line. (Unfortunately, I cannot share the preview of the data here due to security reasons) Anyway, so when the entire text was read by JavaScript as one String, the new lines were read as '\n'. That means, all the questions were seperated by 2 '\n's and each question, and each of their options were seperated by a single '\n'.

So my first step was to build an array of different questions by splitting the string everywhere 2 '\n's are encountered.    

questionsList = myString.split('\n\n')

There, I have an array of questions with me. Now for each question, I have to seperate the key-value pairs. Using the same split function, I can get a 5-element array, which I can build an object with. I can apply the map function to the questionsList array for it to do this to each and every element.

So the questionsList is now not an array of Strings of questions, but an array of Objects of questions, in the exact same format which I wanted.

That completes the 2nd and 3rd Step.

Pushing Content to the Database

This is the most easiest part of the job. Even if your database platform doesn't allow you to upload multiple records at once, you can do it my way.

  1. JSONify your data: You have an array of JS Objects, but you want everything to be in JSON format in your database (that's the only way it is). It's not that big a deal anyway: Just add the line JSON.stringify(yourObject) and you're done. Moving on.
  2. Store this content to a new file using the fs module function again.
  3. Make a POST Request: You have a server file that's connected to the hosted Database, and that's all you need. Build a temporary endpoint that will pick up the data you send and update or make a new record for it. That's it! Here's my un-optimized code, because let's be honest, it's just 50 questions after all!
router.post('/addquiz', (req, res) => {
	// let questions = req.body.questions
	let questions = [req.body]
	questions.map(thisQuestion => {
		var question = {
			question: thisQuestion.question,
			option1: thisQuestion.option1,
			option2: thisQuestion.option2,
			option3: thisQuestion.option3,
			option4: thisQuestion.option4,
			correct: thisQuestion.correct
		}
		Courses.findOneAndUpdate({courseId: req.body.id}, {$push: {questions: question}})
		.then(course => {
			console.log(course)
		})
	})
	res.send('Done')
})

 What you should take away from this...

is how you can use the simplest of things to build or mould the most Brobdingnagian data (TBBT Reference). As a programmer who did this for the first time, I honestly felt amazing! This might not be the exact formula for Data Mining, and there never will be, because every day you might have a new challenge and a new format to mould that data in. All you have to do is have faith in JavaScript or Python to do the work for you.

Here's my code if you want to have a look at it:

let fs = require('fs')

fs.readFile('Scenarios for quiz.txt', "utf-8", function (err, data) {
    data = data.split('\n\n')
    let newArray = data.slice(2).map((string, index) => {
        string = string.split('\n');
        string = string.map(currentString => {
            return currentString = currentString.slice(currentString.indexOf(')')+1).trim()
        })
        if(index == 1) {
            console.log('Original String', string)
        }
        return {
            question: string[0],
            option1: string[1],
            option2: string[2],
            option3: string[3],
            option4: string[4],
            correct: 1
        }
    })
    console.log(newArray.length)
    newData = {
        id: '17',
        questions: newArray
    }
    newData = JSON.stringify(newData)
    // console.log(newData)
    fs.writeFile('POST Request content.txt', newData, (err) => {
        
    })
})

Give us a thumbs up it encourages us to create more valuable content, and tell us in the comment box about the topics you want to explore

Utkarsh Singh

Always curious, frequently moody and occassionally handsome