TIL about referer spam

This is cute. Suppose you are a spammer and you want to target website owners, how do you do this? Submitting spam comments is one possibility; either your spam is posted, in which case you win, or you get stuck in the spam filter, and the site admin will at some point scroll past it when checking if any proper comments were misflagged, and you’ll sort of win anyway. Here is a better strategy: access their website while spoofing your referer.

What is a referer, you ask? Let’s say you are browsing bethzero.com/2018/12/01/referer-spam, and you click a link to example.org. Your browser will, in its HTTP request to example.org, mention that you got referred to their page from bethzero.com/2018/12/01/referer-spam. This is so that website owners can know where their visitors come from. Super neat, how can we use this for spamming?

The trick is to send requests with fake referer attributes.1This is the correct spelling. Except in some cases, such as rel=”noreferrer”. Request the page https://bethzero.com, put https://bestblogideas.com as the referer. When the proprietor of bethzero.com looks at her visitor statistics, she will see that bestblogideas.com linked to her, and probably visit it herself to see what they write about her. This is where you sell her access to your cheap special $300 blogging video course.

Another fun way to use this is probably for doxxing. Create a special page tracking.com/<unique-code> and use that as referer. If you make sure that no other entry points to that page exist, then every visitor will be the Beth you’re targetting. For bonus points, you put Facebook’s tracking code on the page so that you’ll forever be able to target advertisements directly to Beth.

Beth might try to outsmart you and satisfy her interests in her referrers by pointing her browser at tracking.com without the unique code. In that case, you can try buying a number of domain names for a kind of adaptive group testing procedure. If you have a million dollars you can probably do this on bigger scales, finding the IP address of a good fraction of pseudonymous website owners. Looking at my own stats, I think this is really happening.

Delayed WordPress RSS feed

In the category “2-minute hacks for skilled people, 30-minute hacks for me because I can’t code for shit”, let’s replay an old RSS feed. Useful if you want to read the old posts on a blog but don’t want to do it all at once.

This way, you can relive the past of your favourite WordPress blogs. Requires a server with php installation and a feed reader that updates at least once per day. I’ve got a Raspberry Pi 3B running selfoss.

We’ll be using a fancy feature of Wordpess, namely that https://example.com/yyyy/mm/dd/feed gives the RSS feed with posts from that day interval. Actually every public-facing WordPress page can be appended with /feed to produce something meaningful. It will not be a truly faithful replay of old posts, because if posts get edited or deleted we won’t get to see the original post.

With this knowledge in mind, create a php file

$ touch public/delay.php
$ chmod 755 public/delay.php

and fill it with the following code

<?php
$url = htmlspecialchars($_GET['url']);
$years = floatval($_GET['years']);
$months = floatval($_GET['months']);
$days = floatval($_GET['days']);

// calculate the time stamp of some day in the past
// (we subtract a fixed amount of seconds so that we dont
// run into issues with how many days a month has and
// because I am not good enough at coding to handle this
// the right way.)
$months_delayed = $months + 12 * $years;
$days_delayed = $days + 30 * $months_delayed;
$moment = time() - intval(60 * 60 * 24 * $days_delayed);

// redirect to the RSS file with posts from that day
header('Location: ' . $url . '/' . date('Y/m/d', $moment) . '/feed', 303);
die();
?>

Requesting the page http://example.org/delay.php?url=https://bethzero.com/feed&months=1 will redirect you to an RSS file containing all posts made exactly 30 days ago. Because we’re using a temporary 303 redirect, the page that you get redirected to changes every day. If your feed reader updates every day, you should get to see every post.

One small issue happens if a day has more blog posts than can appear in the RSS feed (default is 10 in WordPress). In that case, you might miss out on the oldest posts of the day.

Bayes’ theorem and transgender lesbians

Some time ago I read a cool post on Tumblr, but I can’t find it anymore. It was about calculating P(trans|WLW), the fraction of women who love women that is transgender, from P(trans), the fraction of the general population that is transgender, P(WLW), the fraction of the population that is a woman-loving woman, and P(WLW|trans), the fraction of gynephillic trans women among trans people. Bayes’ theorem says

P(trans|WLW) = P(WLW|trans)P(trans)/P(WLW).

I remember that the resulting number was significant. As I could not find it again, here is a quick and dirty reconstruction. For every statistic, I picked the first one I found that did not seem completely unrealistic.

So Bayes’ theorem gives us P(trans|WLW) = 0.15.

Bonus: suicide attempts

While preparing this post, I stumbled upon this report. Page 8 lists:

  • P(attempted suicide) = 0.016.
  • P(attempted suicice|trans) = 0.41.

Bayes now says P(trans|attempted suicide) = 0.26. Big if true.*

Section of Doubt

Applying Bayes’ theorem like this seems to give unreasonably good mileage. That suggests that social scientists aren’t allowed to use numbers from different studies and get conclusions from them, or asymmetric misreporting makes these calculations error-prone.

The last number above is big. Makes one wonder why so little effort is spent explicitly targetting at-risk trans people.

* Added July 16th: I just met a subject expert, she said this figure sounded about right.