Continue reading "Is the economy linear programming or is it gradient descent?"

]]>This will be bad take, but also fun. There will be no thinking before typing, and no second draft.

There is a good tradition of people knowing that in a functioning economy, prices for goods are just the Lagrange multipliers of the constraints of that economy. Many people are aware of this fact, and more still would make much better decisions if they knew. You can learn this from any good text on linear optimization, such as The Best Uses of Economic Resources by L.V. Kantorovich.

The linear optimization perspective aligns well with the typical Effective Altruism mindset: prices represent exchange value, so to be good we should allocate our money to maximize to maximize value produced per dollar spent. If a thing is very cost-inefficient, don’t spend anything on it. If everyone behaved that way, we would collectively solve the optimization problem that is our economy. The result would be an economy producing the most utility possible.

The convex optimization perspective on the economy contrasts with a more recent view implicitly held by many people: everything is gradient descent, the best algorithm is always gradient descent. Machine Learning? Gradient descent. Bandit problems? Gradient descent. Maximum flow? Gradient descent. Online algorithms? Gradient descent!

If prices carry gradient information then we should spend money on a good proportional to one over its price. If everyone spends their money that way, we will descent our loss function as quickly as possible, arriving at the optimal economy maximizing utility in little time.

If prices are Lagrange multipliers, you should be frugal and spend most of your money on bed nets and stuff. If prices are gradient thingies, you can and must buy some of that expensive tasty vegan white chocolate. I just did exactly that. Yum yum yum.

Edit: How did I mess up this post so badly. This Langrange multipliers versus gradient descent argument is a silly red herring. The real fun would have been in comparing a gradient descent based interpretation of prices and a mirror descent based interpretation. It wouldn’t be too hard to shitpost a legitimate utilitarian argument based on that, arguing that it is everyone’s moral duty to learn about the difference between gradient descent and mirror descent and to apply this knowledge in everyday life. Ah well. Writing prompt for whomever reads this.

]]>Continue reading "Browsers leaking previously visited pages?"

]]>I have this WordPress plugin called Statify running. It produces a pretty visitor count graph, a list of which pages were visited today, as well as referer addresses of visitors. Now the strange thing is that sometimes they contain meaningful addresses that do not seem like spam, but are also of pages that in no way link to me. One example is today’s list:

None of the 4 listed pages seem to contain a link to my blog. I am guessing that some browser reports a referer when it shouldn’t, but I am not sure.

]]>The AI-Safety-as-an-effective-cause hypothesis has already been soundly refuted by others, but I think it never hurts to show more disdain for this self-recursive Bayesian superintelligence nonsense. With that in mind, let’s talk about Bostrom’s awful book. Specifically, this graph:

There are two claims embedded in here. One is “when you are past village idiot level, then Einstein level is very close”. The other is “there is a lot of room beyond Einstein level”. I would argue that both are preposterous, in particular if you define intelligence by some kind of problem-solving ability.

For this post I want to focus on the empirical prediction made by these two claims. Take any task for which “AI” is better than unskilled humans, for example a game. Look at the growth of the rating over time. The claims predict that there will be very little (development/training) time between an AI becoming competitive with unskilled humans and surpassing the very best humans. Proof by picture:

Does that prediction hold up? Let’s look at everything Deepmind and OpenAI did. Go? Nope. Starcraft? Nope. Rubik’s cube? Nope. Any other shitty thing? Nope. The prediction failed just as badly before the neural net hype, considering chess and go and Age of Empires and any other game you could play against the computer.

Bostrom is talking out of his ass.

]]>We all know the type who think Bayes’ rule is the key to rational thought. They are talking out of their ass. This short note intends to explain why I say that, but without any trite arguments and just focusing on the computer science. In short, because Bayes’ rule is an *information-theoretic* concept without computational content.

For one, conditional probability distributions are not computable. Of course you say, but no sane person would limit themselves to only computable endeavors? And moreover, that result only applies when there is no noise, i.e., the realm of spherical cows! Your objection is valid, but we have to observe this theoretical impossibility result for the sake of completeness.

The real issue occurs in real-world complexity classes. Random SAT/UNSAT is hard in the appropriate density range. So if I pick a random SAT instance from a probability distribution known to both of us, and I tell you the bits encoding the instance one by one, your Bayesian updates on the probability that my instance is true will be hard as well. Both in theory and in practice. No matter how much computing power you have, I can easily sample SAT instances that make your Bayes’ rule calculations suffer.

This hardness carries over to everyday problems. Bayes’ rule is typically insufficient to tell you the answers you are looking for.

]]>Continue reading "Economic systems are all the same when you take out the stupid parts."

]]>I’ve got a brilliant hot take here. Consider the following:

- Capitalism but without money
- Communism but without central planning
- Kantorovich-style dual certificates but without being a dumb-ass

In this post I will argue that each of these three is better than their normie counterparts, and that they are all the exact same thing.

Capitalism has a lot of good things going for it. In particular, it is a viable way of allowing agents to communicate what they consider worthwhile doing and of communicating to agents how they can make themselves of use. I am not saying that its current instantiation is super good at those things, just that putting prices on things is a way for society to collaborate.

However, as the second fundamental theorem of welfare economics tells us, there are a bunch of free variables in capitalism that could be set any which way: how much money does every agent have? Most settings of these free parameters will lead to sub-optimal outcomes, so under an assumption of naturalness we should look for a variant that is unique in its class. I argue that we get this by removing money.

Specifically, let’s get rid of money but keep prices. Prices will no longer reflect what you pay, just how much effort and resources were needed to procure it. When you “buy” one product instead of another, you are communicating which things are worth the effort to make and which are not.

Let’s take my grocery trip from today as an example. I bought cheap bread and expensive fake chicken pieces, even though for both the more expensive one is better. I have plenty of money so I could have easily bought expensive bread and expensive fake chicken. But I think the expensive bread is much more expensive but only a little better, while the expensive fake chicken is a lot better. By buying what I bought, I communicate what I think is worth the effort/resources and what is not. I would have brought the same if I was a millionaire, because money was no part of my purchasing decision. Only the prices matter.

Why would capitalism without money be better? For one, some people are too poor to buy healthy food, which is a huge loss of welfare. For two, billionaires will literally buy private yachts and £250,000 watches with useless tourbillions. This is a clear example of resources not being spent in a good way caused by money, but there are many more less visible examples as well.

I won’t say too much about this. Communism was a disaster for many reasons. Economies are not legible in a top-down fashion. Even if they were, a central bureau couldn’t obtain all necessary data in time. Central planning is inherently undemocratic. We need a decentralized way of arranging the economy, led largely by consumers and built on the assumption that people can decide for themselves what they find important or worthwhile.

Kantorovich was a Soviet mathematician who invented the discipline of linear programming for the sake of planning the Soviet economy.

For those who are unaware, a linear program is an optimization problem that can be expressed as \max c^{\rm T} x, ~{\rm s.t.}~ Ax \leq b, with given c, x \in \mathbb{R}^d, b \in \mathbb{R}^n, A \in \mathbb{R}^{d \times n}. That is to say, one aims to maximize a linear function \sum_{i=1}^d c_i x_i of variables x_1,x_2,\dots,x_d subject to linear inequalities of the form \sum_{i=1}^d a_{ij}x_i \leq b_i for j = 1,2,3,\dots,n. This is a very powerful framework for optimization theory and is used for many practical applications.

Any linear program allows for a *dual program*, give by \min b^{\rm T} y, ~{\rm s.t.}~ A^{\rm T}y = c, y \geq 0. The solutions to these programs have the same objective value and tell you a lot of information about what an optimal solution looks like.

Kantorovich basically considered x_1,\dots,x_d to be numbers describing the economy: what goods should be produced and where. The linear constraints Ax \leq b represent the constraints of the economy: how many workers, resources and factories are there and what are their capacities. The objective function c^{\rm T}x encodes the goal of maximizing the welfare of the Soviet citizens. Thus, the optimal solution to this program would tell you the optimal way the Soviet economy could be run.

So far so noble. The dual linear program to Kantorovich’s optimization problem represents market prices. Its solution tells you the exchange value of different goods. If you know the correct exchange values, that will roughly tell you how the rest of the economy should look like, and if you know the shape of the economy you know what the exchange values are.

This is where it got political for Kantorovich. In a dual linear program, every constraint not met in the primal linear program gets value 0. The Soviet economy had surplus labor (it was constrained by different factors), so Kantorovich put the value of labor at 0. The Party thought this flied in the face of Marx’s teachings and it is a small miracle that Kantorovich survived.

But this view of prizes as dual solutions makes a lot of sense. Once you leave Kantorovich’s worst simplifying assumptions, no prize will be exactly 0. At that point, you get into a territory looking like interior-point methods (a class of algorithms for solving linear programming problems), in which your dual solution (prizes) tell you how to improve your primal solution (production and consumption of goods and services) and your primal solution tells you how to improve your dual solution.

]]>Continue reading "A lot of people are the smartest person in their country"

]]>**Definition** For a point set S \subset \mathbb{R}^n and a point x \in S, we say that x is *Pareto-optimal* if, for any y \in S, x \neq y, there exists an index i \leq n such that x_i > y_i.

**Exercise** For P any probability distribution on \mathbb{R}^n where the different coordinates are independent random variables and independent x_1,\dots,x_m \sim P, the expected number of Pareto optimal points among them is \sum_{i_1 = 1}^m \frac{1}{i_1} \sum_{i_2 = 1}^{i_1} \frac{1}{i_2} \cdots \sum_{i_{n-1}=1}^{i_{n-2}} \frac{1}{i_{n-1}}. (hint: induction on n)

Principal component analysis is a standard tool for anyone who wants to do quantitative science. You start with a pile of data points that vary along different axes, find the axis of most variation in the data, find the axis of second most variation, and so on. Bonus points if afterwards you pronounce that this statistical ghost is a real and meaningful quantity. See: Spearman’s g, big-five personality traits, among others.

If your data is normally distributed, and lets face it, all data is normally distributed if you squint enough, then the principal components are, in expectation, exactly the rows of your covariance matrix. That is, when you re-parametrize your data such that the principal components align with the coordinate directions, then the coordinates of any data point are independent random variables.

The different quantities resulting from a principal component analylis are always orthogonal directions. The first principal components are the axes of largest variation in the data set (the single component of Spearman’s *g*, the five personality traits), they don’t come equipped with any sense of being the “most important”. Just the biggest variance, which is a notion that is well-defined only when you chose the units of the original coordinates well. Therefore, we cannot say that one of the components is “the real one”, i.e., we cannot impose a meaningful linear ordering of quality after calculating the principal components.

Anyone can tell you that intelligence varies along different dimensions. Easy example: some people are good at math but bad with languages, for others it is the other way around. How many distinct directions are there? I don’t think it is a stretch to think there are at least 11 different ones. Turning our proof into a computation, we can now count the expected number of Pareto optimal points for given m,n.

```
def initialize(i):
tab = []
for k in range(1,i+1):
tab.append((k,1./k))
return tab
def iterate(tab):
s = 0.
for i in range(len(tab)):
s = s + tab[i][1]
tab[i] = (tab[i][0],s/tab[i][0])
return tab
def final(tab):
out = 0.
for (x,y) in tab:
out = out + y
return out
def count(m, n):
if n == 1:
return 1
table = initialize(m)
while n > 2:
table = iterate(table)
n = n-1
return final(table)
```

In our intelligence interpretation, this counts how many people can be said to be “the smartest”, in the sense that nobody else is smarter in all 11 different ways. That is, we can call someone “the smartest” if their location in intelligence space is Pareto-optimal among all people.

The above code has the maximum size of `m`

limited by memory space, but we can calculate `count(4882495,11) = 427462.8`

, where 4882495 is the population count of Ireland. Making the appropriate division, we conclude that one in every 11.4 Irish people is the smartest person in Ireland.

Professor: Welcome back to Algorithms class. Today, we will cover Dijkstra’s shortest path algorithm, the core of all navigation software. Next week’s assignment is to write code to compute shortest paths in a number of real world instances.

Professor: Suppose we have a road network described by a graph with n vertices and m edges and a vector c \in \mathbb{R}^E assigning a travel time to every edge. We want to minimize total travel time, and –

Student: I have a question. What if we want to keep other things into account, like finding a simple route with few turns?

Professor: The formalism allows for that by changing the objective and graph. Anyway, so we want to minimize travel time and the first step is to [blah blah blah]

]]>Professor: Welcome back to Operations Research class. Last week we discussed the mathematical fundamentals of optimization theory, and this week we will see how to apply it to practice. Suppose we have n machines and m employees working on k different products. We want to maximize total profit, and –

Student: I have a question. What if we want to keep other things into account, like work satisfaction among employees?

Professor: The formalism allows for that by changing the objective and constraints. Anyway, so we want to maximize profit and the first step is to [blah blah blah]

]]>Continue reading "Case study in the politics of mathematics: optimality of Nash equilibria"

]]>Back in the Soviet era, Leonid Vitaliyevich Kantorovich got himself in trouble. He developed a theory of optimization problems, originally for the sake of optimizing plywood production, but he saw a potential for using his techniques to optimize the entire Soviet economy. So he sent letters to the central planning bureau to convince them to make use of his ideas.

He has been involved in advanced mathematical research since the age of 15; in 1939 he invented linear programming, one of the most significant contributions to economic management in the twentieth century. Kantorovich has spent most of his adult life battling to win acceptance for his revolutionary concept from Soviet academic and economic bureaucracies; the value of linear programming to Soviet economic practices was not really recognized by his country’s authorities until 1965, when Kantorovich was awarded a Lenin prize for his work.

Excerpt from original CIA file on Kantorovich.

There was just one problem, arising from the theory of linear programming duality. For any linear optimization problem, we can derive a dual problem. If you solve both the primal and dual problems, they turn out to have the same solution value, and moreover the optimal solution to the one certifies optimality of the solution to the other. These “dual solutions” can have clear interpretations. In the case of optimizing resource allocations, the dual solution can be interpreted as market prices.

LP duality theory connects the notion of optimal resource allocation as used for central planning with the efficient market hypothesis and Nash equilibria. This connection can be interpreted as “capitalist markets find the optimal allocation of goods and services”. Obviously, the communists did not like that.

This interpretation has been popular in the US as well. They’d see this interpretation in light of the slightly weaker “first welfare theorem” and Smith’s invisible hand.

Of course there are numerous solid arguments for why these interpretations are bogus. To name just a few: markets are not convex, barriers to market entry are non-zero, humans are not perfectly rational nor omniscient, computation and communication are neither free nor instantaneous, negative externalities are common, the dual LP only takes prices into account and not net cash flow of individuals, and the “welfare function” doesn’t necessarily exist.

All of this was known 50+ years ago, but apart from the 1950 cutesy example of the prisoner’s dilemma, game theorists didn’t do much of note to dispute the notion that Nash equilibria are typically good. Nothing really got formalized until 20 or so years go with the introduction of the price of anarchy. The price of anarchy (stability) of a game is the ratio of social welfare in the optimal solution versus the social welfare in the worst (best) Nash equilibrium. The PoA and PoS can be unboundedly bad, and recent decades have seen a lot of exciting work happening here.

Mathematics is political. When we want to apply mathematical theorems to the real world, we need to make certain simplifying assumptions. Or actually, if we want to think any thoughts about the real world, but whatever. The simplifications we make determine the theorems we can prove. We can work on theorems that “prove” that capitalism is good or we can work on theorems that “prove” that capitalism is bad. Both would be valid mathematics, but both individuals and communities can decide to value one over the other, resulting in undergrads being taught either “10/10 scientists agree: capitalism is good” or “beware: capitalism can be arbitrarily bad”.

]]>Continue reading "What does it mean to call something a social construct?"

]]>Many explanations of the term “social construct” are severely lacking. Let me explain what I personally mean when I call something a social construct. Largely, it means that *we could have done it differently.*

Gender is a social construct. Our ideas of which things are for men and which are for women has differed over place and time. Little boys used to wear dresses, women used to be considered mentally unfit to own property, gender nonconforming people (like women wearing pants) used to be shunned by their community. And that is just Western countries over the past century, without considering different millennia or cultures. We could change our current notions of gender without the gods coming down to interfere with our ambitions.

Money is a social construct. By choosing to redistribute or not, we shape the role of money and the way it influences the rich and poor. Same for (dis)allowing usury, theft or private ownership of the means of production, to name a few socially constructed aspects of private property and thus of money. Every facet of money is constructed through a social process, but the resulting structure can impact a person’s life as much as the law of gravity.

Mathematics is a social construct. There is no linear “tech tree” forcing people to develop Euclidean geometry before graph theory or predator/prey models. No law of nature tells pure mathematicians to abhor the thought of their work finding practical use. Instead of sorting computational problems into various Turing degrees, we could have been doing something useful instead. Peer reviewers use their own judgement to decide whether a result is novel, interesting and proven with enough rigor, no objective measurement devices involved. The choice of what is Real Mathematics and what is not is similarly arbitrary.

I tell people that a thing is socially constructed when I want to remind them that *they could be different*. When they’re talking about the thing as if it is a fixture of life and I want to invite them to imagine how it could be changed. Because no matter your philosophy of mathematics or external reality, the way we understand and value various ideas is always different among individuals and communities.

Continue reading "We can’t simulate nematodes with only 302 neurons"

]]>C. Elegans is a wonderful creature. It’s a nematode, so it looks like a tiny worm. It is one of biologists’ favorite model organisms and we know a lot about it.

It has roughly 1000 cells, and its cell lineage is almost identical for every individual. This creature has been mapped out more thoroughly than any other species on the planet. You can browse a map of its cells through the OpenWorm Browser. The hermaphrodite has 302 neurons and its connectome is completely known.

And faithfully simulating a nematode remains far out of reach. The state of the art is concerned with modeling basic aspects of its locomotion, like its different gaits on different surfaces.

It is silly to think we’ll be simulating human minds any time soon, when people have been trying to simulate C. Elegans for decades now.

]]>I am fond of things that should not work but still do. Stuff like induction, sat solvers, or the preservation of approximate truth under logical deduction. One more example is the O(n³) phenomenon.

To fix notation, we say an algorithm runs in time O(n³) if there are numbers a, b > 0 such that, given an input of size n, the algorithm, run on a reasonable theoretic computer model with random-access memory, takes time upper bounded by a * n³ + b, for any n. This value n might be the number of letters in an input string, the number of vertices or edges in an input graph, the ambient dimension of a convex body, the number of variables in your 3SAT instance, or any other such number measuring input size.

In CS theory, the O(n³) time phenomenon is the observation that any problem which is solvable in polynomial time has an algorithm solving it in O(n³) time. It is false, and showing so is a nice undergrad exercise in diagonalization. It is also kind of true, as its prediction comes true again and again for almost all problems that we care about.

]]>In case you missed it, last week bought the most exciting piece of AI news of the past couple years. Deepmind’s Alphastar will be playing real non-NDA’d humans for multiple games (via Ars Technica).

This is cool news. Partly because they finally seem intent on fixing their cheating with superhuman micro, but mostly because repeat games could allow humans to develop counter strategies. The exhibition matches earlier this year only had the humans play one match against any given bot, thus giving the bot a huge informational advantage. These new matches will hopefully end up being more fair. However, Blizzard’s press release is vague enough that Deepmind could still decide to play only a hand full of games, which would prevent humans from gaining enough knowledge to devise counter strategies. Worse, the Alphastar team could decide to stop playing once they observe online message boards sharing good tactics between human players or any other nasty hacking of p’s.

The hype labs haven’t been doing well in fair play or reproducibility so far, but I’m willing to hope they’ll improve.

]]>Mathematical hobbyists tend to be fascinated with information theory, Kolmogorov complexity and Solomonoff induction.This sentiment is very understandable. When I first learned of them, these subjects felt like they touch upon some fundamental truth of life that you don’t normally hear about. But for all its being a fundamental property of life and understanding, mathematicians treat it as a cute mathematical curiosity at most. In this post I will explain some of the reasons why so few mathematicians and computer scientists have cared about it over the past 50 years.

The zero-th reason is that it depends on your choice of encoding. You cannot cover this up by saying that any Turing machine can simulate any other with constant overhead, because a 2000 bit difference is not something you can compensate for on data-constrained topics.

The first reason is obvious and dates back to ancient times. Kolmogorov complexity is not computable. Properties that we can literally never know are pretty useless in everyday life.

The second reason is related: we cannot compute Kolmogorov complexity in practice. Even time-constrained variants are hellishly expensive to compute for large data sets.

The third reason is more typical of modern thinking in computer science theory. Namely that any theory of information needs a theory of computation to be useful in practice. This is directly related to the difference between computational and statistical indistinguishability, as well as the myth that your computer’s entropy pool could run out. Cryptography is safe not because it is information-theoretically impossible to retrieve the plaintext but because it is computationally infeasible to retrieve the plaintext. The Kolmogorov complexity of a typical encrypted data stream is low but it would be mistake to think that anyone could compute a short description. Along another route, once I have told you an NP-complete problem (with a unique solution), it won’t add any new information if I told you the answer. But still you would learn new information by getting the answer from me, because you couldn’t *compute* it yourself even knowing all requisite information.

Kolmogorov complexity is useless based on classical CS theory, practice and modern CS theory. This is how you know that anyone who proposes that it is an integral part of rational thought is full of shit.

]]>Continue reading "11. Did MIRI cause a good thing to happen?"

]]>The Future Perfect podcast from Vox did an episode proclaiming that AI Risk used to be a fringe concern but is now mainstream. That is why OpenAI did not open up GPT-2 to the general public. This was good. Everything thanks to Jaan Tallinn and Eliezer Yudkowsky. I cannot let this go uncontested.

Today: why are the people who made our writing bot so worried about what it could do? The short answer is they think that artificial intelligence models like this one can have major unintended consequences. And that’s an idea that’s moved from the fringe to the mainstream with the help of philanthropy.

[0:02:21-0:02:43]

This here is the central claim. The money of Tallinn is responsible for people thinking critically about artificial intelligence. Along the way we hear that Tallinn acquired his AI worries from Yudkowsky. Hence, Yudkowsky did something good.

AI might have unintended consequences, like taking our jobs or messing with our privacy. Or worse. There are serious researchers who think the AI could lead to people dying: lots of people. Today, this is a pretty mainstream idea. It gets a lot of mentions it any round-up by AI expert of their thinking on AI and so it’s easy to forget that a decade ago this was a pretty fringe position. If you hear this kind of thing and your reaction is like “Come on, Killer Robots? Really that sounds like science fiction”, don’t worry, you are part of a long tradition of dismissing the real world dangers of AI. The founders of the field wrote papers in which they said as an aside. “Yes. This will probably like transform human civilization and maybe kill us.” But in the last decade or so, something has started to change. AI Risk stopped being a footnote in papers because a small group of people in a small group of donors started to believe that the risks were real. Some people started saying wait if this is true, it should be our highest priority and we should be working on it. And those were mostly fringe people in the beginning. A significant driver of the focus on AI was Eliezer Yudkowsky.

[0:04:17-0:05:50]

So the driving force behind all worries about AI is said to be Yudkowsky. Because of his valiant essay-writing, Tallin got convinced and put his money towards funding MIRI and OpenAI. Because of course his real fears center around Mickey Mouse and Magic Broomstick, not on algorithms being biased against minorities or facial recognition software being used to put the Uyghur peoples in China in concentration camps. Because rational white men only focus on important problems.

Yes, so here are a couple examples every year the Pentagon discovers some bugs in their system that make them vulnerable to cybersecurity attacks. Usually they discover those before any outsiders do and they’re there for able to handle them. But if an AI system were sufficiently sophisticated, it could maybe identify the bugs that the Pentagon wouldn’t discover for years to come and therefore be able to do things like make it look to the US government like we’re being attacked by a foreign nuclear power.

[0:13:03-0:13:34]

This doesn’t have anything to do with my point, I just think its cute how people from America, the country whose army of cryptographers and hackers (all human) developed and lost the weapons responsible for some of the most devastating cyberattacks in history, worry that other countries *might* be able to do the same things if only they obtain the magical object that is speculated to exist in the future.

[GPT-2] is a very good example of how philantropic donations from people like Jaan Tallinn have reshaped our approach to AI. The organization that made GPT-2 to is called OpenAI. OpenAI got funding from Jaan Tallinn among many others and their mission is not just to create Artificial Intelligence. But also to make sure that the Artificial Intelligence it creates doesn’t make things worse for Humanity. They’re thinking about, as we make progress in AI, as we develop these systems with new capabilities, as we’re able to do all these new things, what’s a responsible process for letting our inventions into the world? What does being safe and responsible here look like and that’s just not something anybody thought about very much, you know, they haven’t really asked what is the safe and responsible approach to this. And when OpenAI started thinking about being responsible, they realized “Oh man, that means we should hold off on releasing GPT-2”.

[0:17:53-0:19:05]

This is boot licking journalism, completely going along with the narrative that OpenAI’s PR department is spinning, just like Vox’s original coverage of that puff news and all of the Effective Altruism community’s reaction. There is something profoundly absurd about taking a corporate lab’s press release at face value and believing that those people live in a vacuum. A vacuum where nobody had previously made unsupervised language models, as well as one where nobody had previously thought about what responsible release of ML models entails. OpenAI is fundamentally in the business of hype, to stroke their funders’ egos, and providing compute-heavy incremental progress in ML is just the means to this end.

It’s kind of reassuring but this organization is a voice at the table saying hey, let’s take this just a little slower. And the contributions from donors like Jaan Tallinn, they have to put that cautionary voice at the table and they put them there early. You know, I think it mattered. I think that the conversation we’re having now is probably more sophisticated, more careful, a little more aware of some of the risks than it would been if there hadn’t been these groups starting 10-15 years ago to start this conversation. I think I has one of those cases where something was always going to be funded only from the fringe and where it really didn’t matter that it got that funding from the fringe.

[0:20:18-0:20:53]

The writing makes a clear statement here: the people on the fringe (Yudkowsky et al.) are a significant part of the reason why people are thinking about this. I can hardly imagine how a journalist could say this after having done any research on the topic outside of their own cult-bubble, so I think they didn’t do this.

People in EA, people in ML and the staff at Vox seem almost willfully ignorant of all previous academic debate on dual use technology, none of which derives from MIRI’s fairy tales of evil genies. I blame this phenomenon on contempt of rationalists for the social sciences. If Yudkowsky contributed anything here, it might mainly be in making socio-political worries about technology seem marginally more exciting to his tech bro audience. But the counterfactual is unclear to me.

]]>Continue reading "10. Compute does not scale like you think it does"

]]>One argument for why AGI might be unimaginably smarter than humans is that the physical limits of computation are so large. If humans are some amount of intelligent with some amount of compute, then an AGI with many times more compute will be many times more intelligent. **This line of thought does not match modern thinking on computation.**

The first obvious obstacle is that not every problem is linear time solvable. If intelligence scales as log(compute), then adding more compute will hardly affect the amount of intelligence of a system. But if you believe in AI Risk then this likely won’t convince you.

The second, more concrete, obstacle is architecture. Let’s compare two computing devices. Device A is a cluster consisting of one billion first generation Raspberry Pi’s, for a total of 41 PFLOPS. Device B is a single PlayStation 4, coming in at 1.84 TFLOPS. Although the cluster has 22,000 times more FLOPS, there are plenty of problems that we can solve faster on the single PlayStation 4. Not all problems can be solved quicker through parallelization.

Modern computers are only as fast as they are because of very specific properties of existing software. Locality of reference is probably the biggest one. There is spacial locality of reference: if a processor accesses memory location *x*, it is likely to use location *x+1* soon after that. Modern RAM exploits this fact by optimizing for sequential access, and slows down considerably when you do actual random access. There is also temporal locality of reference: if a processor accesses value* x* now, it is likely to access value *x* again in a short while. This is why processor cache provides speedup over just having RAM, and why having RAM provides a speedup over just having flash memory.

Brains don’t exhibit such locality nearly as much. As a result, it is much easier to simulate a small “brain” than a large “brain”. Adding neurons increases the practical difficulty of simulation much more than linearly. It might be *possible* that this would not be an obstacle for AGI, but it might also be possible for the ocean to explode, so that doesn’t tell us anything.

Continue reading "Ovens have secret built-in automatic timers"

]]>Every oven I’ve ever used has had a secret function, a mechanism that automatically tells you when the food is ready. It is wonderful and I want to tell you about it.

So most ovens control their temperature using a bimetallic strip. When the temperature inside is less than the target temperature, the strip closes a circuit that activates the heating. As soon as the temperature is sufficiently big, the strip will have deformed enough to open the circuit and stop the heating. In many ovens, especially older ones, you can hear this as a soft ***click***. If you are lucky, the mechanism is sensitive enough to rapidly go on and off to stay on temperature, at least for a couple seconds.

If you eat frozen pizza, it often only has to be heated to a sufficient temperature. When it reaches this temperature, the pizza will stop cooling down the air around it, thereby allowing the oven to reach its target temperature and starting to say ***click***. So the sound will tell you when the food is ready, no need to read the packaging to find the correct baking time.

The same happens for dishes that are ready when enough water has evaporated, or when a certain endothermic chemical reaction has stopped happening. All are done the moment the oven says ***click***. There might be some exceptions to this phenomenon, but I have yet to run in to one. Which is great because I always forget to read oven instructions on packaging or recipes before throwing them out. Try it out with your own electrically powered food heating units.

Continue reading "9. Don’t work on long-term AGI x-risk now"

]]>Suppose you believe AGI will be invented in 200 years, and, if it is invented before the alignment problem is solved, everyone will be dead forever. Then you probably shouldn’t work on AGI Safety right now.

On the one hand, our ability to work on AGI Safety will increase as we get closer to making AGI. It is preposterous to think such a problem can be solved by purely reasoning from first principles. No science makes progress without observation, not even pure mathematics. Trying to solve AGI risk now is as absurd as trying to solve aging before the invention of the microscope.

On the other hand, spending resources now is much more expensive than spending resources in 100 years. Assuming a 4% annual growth rate of the economy, it would be around 50 times as expensive.

Solving AGI Safety becomes easier over time, and relatively cheaper on top of that. Hence you should not work on AGI Safety if you think it can wait.

]]>I’ve been looking on and off for mp3 players for a couple months. I wanted a device with proper playlist support, bluetooth, and sufficient battery and storage capacity. It had to be cheap and with a UI that does not make me wish for death.

I ended up buying a $30 second hand Nokia Lumia 650. I deleted everything except the music player and maps app, downloaded maps of every country I might reasonably ever visit and the complete contents of Wikivoyage, copied my music onto it from my pc and put it permanently in airplane mode. It is a bit too laggy, but other than that I like this setup a lot.

But more important than my love for Windows Phone, is my hate for Android and iOS. I dislike the former for its role in the global surveillance economy and its butt-ugly interface. I dislike the latter because of its adversarial pricing model and excessively walled garden.

I don’t want to get my dinner from the pathologically neoliberal butcher, the dominant-strategy-playing externality-indifferent brewer or the stalking, price-discriminating, search-engine-optimizing baker. Their antithesis probably consist of the local organic farmer’s market, self-hosted FOSS software and artisan everything, but I do like economies of scale.

I’m still searching for the synthesis. For now, I’ll start with trying to minimize my interactions with companies who relate to their users or customers in a very adversarial manner

]]>Continue reading "8. Links #3: the real AI was inside us all along"

]]>Olivia Solon: The rise of ‘pseudo-AI’: how tech firms quietly use humans to do bots’ work

It’s hard to build a service powered by artificial intelligence. So hard, in fact, that some startups have worked out it’s cheaper and easier to get humans to behave like robots than it is to get machines to behave like humans.

Brian X. Chen and Cade Metz: Google’s Duplex Uses A.I. to Mimic Humans (Sometimes)

]]>In other words, Duplex, which Google first showed off last year as a technological marvel using A.I., is still largely operated by humans. While A.I. services like Google’s are meant to help us, their part-machine, part-human approach could contribute to a mounting problem: the struggle to decipher the real from the fake, from bogus reviews and online disinformation to bots posing as people.