Mimersbrunn
3 minute read
Mimersbrunn.se is a website where students can upload self-written essays for others to read, comment on, and rate. The issue with the site is that the article read count is updated based on the number of requests made for a specific article, rather than filtering requests by IP-address or browser fingerprint. This allows users to inflate view counts artificially. I aimed to make the article about Danderyds Gymnasium the most viewed article on the website.
Tools Used
- Python: For sending requests.
- Node.js: For creating the request-sending server.
- Glitch.com: For hosting the Node.js server.
Method
To establish which article was the most viewed, I needed to know the view count of the top article. This involved iterating through all ~60,000 article IDs:
Python Script to Gather Article Data
from time import sleep
from tqdm import trange
import requests
from bs4 import BeautifulSoup
import urllib3
import json
from threading import Thread
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def article_data(articleID):
global ARTICLES
try:
url = f"https://mimersbrunn.se/article?id={articleID}"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload, verify=False)
soup = BeautifulSoup(response.content, 'html.parser', from_encoding='utf-8')
views = soup.findAll('small')[0].findAll('span')[0].decode_contents()
title = soup.findAll('title')[0].decode_contents()
ARTICLES.append({'Views': int(views), 'Title': title, 'ID': articleID})
except Exception as e:
ARTICLES.append({'Views': -1, 'Title': 'Unable to read data', 'ID': articleID})
def _thread_article_data(startID, stopID):
for ID in range(startID, stopID):
article_data(ID)
MAX_ARTICLES = 60000
NUM_THREADS = 6
ARTICLES = []
THREADS = []
STEP = MAX_ARTICLES // NUM_THREADS
for i in range(NUM_THREADS):
THREADS.append(Thread(target=_thread_article_data, args=(i*STEP, (i+1)*STEP)))
for t in THREADS:
t.start()
for i in trange(MAX_ARTICLES):
for _ in range(20):
if i < len(ARTICLES):
break
sleep(0.5)
ARTICLES = sorted(ARTICLES, key=lambda i: i['Views'], reverse=True)
with open('articleData.json', 'w') as f:
f.write(json.dumps(ARTICLES))
It turns out the most viewed article is the Islänningasagor article with 2,356,325 views at the time of writing.
In order to get the Danderyds Gymnasium article 2M views ahead I had to write another script to automate the website requests.
Python Script to Inflate Views
from time import sleep
from tqdm import trange
import requests
from threading import Thread
def generate_view():
url = "https://mimersbrunn.se/article?id=" + str(ARTICLE_ID)
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload, verify=False)
def _thread_view_generator(views):
global VIEW_COUNT
for _ in range(views):
while True:
try:
generate_view()
break
except:
sleep(0.1)
VIEW_COUNT += 1
ARTICLE_ID = 1441
VIEW_COUNT = 0
NUM_VIEWS = 2000000
NUM_THREADS = 5
STEP = NUM_VIEWS // NUM_THREADS
THREADS = []
for i in range(NUM_THREADS):
THREADS.append(Thread(target=_thread_view_generator, args=(STEP,)))
for t in THREADS:
t.start()
for i in trange(NUM_VIEWS):
for _ in range(20):
if i < VIEW_COUNT:
break
sleep(0.5)
Although this method works, it only gets about 500,000 views per night. This is a bit slower than I would hope, as I prefer not having my computer fan running the whole night. A better solution was to translate the python script to Node.js and host it externally from glitch.com with a visual interface of stats.
Node.js Script for External Hosting
const https = require("https");
const restify = require("restify");
const fs = require("fs");
var server = restify.createServer();
server.listen(process.env.PORT || 3000, function () {
console.log("Listening to %s", process.env.PORT);
});
server.get("/", function (req, res) {
res.send("Hello World!");
});
async function requestHandler() {
let url = "https://mimersbrunn.se/article?id=1441";
https.get(url, (resp) => {
let data = "";
resp.on("data", (chunk) => {
data += chunk;
});
resp.on("end", () => {
console.log("Request completed");
});
}).on("error", (err) => {
console.log("Error: " + err.message);
});
}
setInterval(requestHandler, 1000);
The completed website can be found on glitch.me: https://mimers-brunn.glitch.me/
After around a week the Danderyds Gymnasium article finally became the most viewed article, once and for all concluding my project.
Result
After about a week, the article about Danderyds Gymnasium became the most viewed article on Mimersbrunn.se, concluding my project. This experience enhanced my understanding of external hosting and view counter security. I also developed an API for my website's view counting using Node.js.
Unforturnetly mimersbrunn.se has closed down at an unknown date after this project concluded.