#GAN_Paint: Learn to paint with an AI system:…Generating pictures out of neuron activations – a new, AI-infused photoshop filter…MIT researchers have figured out how to extract more information from trained generative adversarial networks, letting them identify specific ‘neurons’ in the network that correlate to specific visual concepts. They’ve built a website that lets anyone learn to paint with these systems. The effect is akin to having a competent ultra-fast painter standing by your shoulder, letting you broadly spraypaint an area where you’d like, for instance, some sky, and then the software activates the relevant ‘neuron’ in the GAN model and uses that to paint an image for you. Why it matters: **Demos like this give a broader set of people a more natural way to interact with contemporary AI research, and help us develop intuitions about how the technology behaves. Paint with an AI yourself here:** GANpaint (MIT-IBM Watson AI Lab website). Read more about the research here: GAN Dissection: Visualizing and Understanding Generative Adversarial Networks (MIT CSAIL). Paint with a GAN here (GANPaint website).
DeepMind says the future of AI safety is all about agents that learn their own reward functions:…History shows that human-specified reward functions are brittle and prone to creating agents with unsafe behaviors…Researchers with DeepMind have laid out a long-term strategy for creating AI agents that do what humans want in complex domains where it is difficult for humans to construct an appropriate reward function. The basic idea here is that to create safe AI agents, we want agents that figure out appropriate reward functions by collecting information from the (typically human) user and use this to learn a reward function, then we can use reinforcement learning to optimize this learned reward function. The nice thing about this approach, according to DeepMind, is that it should work for agents that have the potential to become smarter than humans: “agents trained with reward modeling can assist the user in the evaluation process when training the next agent”. A long-term alignment strategy: DeepMind thinks that this approach potentially has three properties that give it a chance of being adopted by researchers: it is scalable, it is economical, and it is pragmatic. Next steps: The researchers say these ideas are “shovel-ready for empirical research today”. The company believes that “deep RL is a particularly promising technique for solving real-world problems. However, in order to unlock its potential, we need to train agents in the absence of well-specified reward functions.” This research agenda sketches out ways to do that.** Challenges:** Reward modeling has a few challenges which are as follows: amount of feedback (how much data you need to collect to have the agent successfully learn the reward function); the distribution of feedback (where the agent visits new states which lead to it generating a higher perceived reward for doing actions that are in reality sub-optimal); reward hacking, which is when the agent finds a way to exploit the task to give itself reward that leads to it learning a function that does not reflect the implicit expressed wishes of the user; unacceptable outcomes (taking actions that a human would likely never approve, such as an industrial robot breaking its own hardware to achieve a task; or a personal assistant automatically writing a very rude email; and the reward-result gap (the gap between the optimal reward model and the reward function learned by the agent ). DeepMind thinks that each of these challenges can potentially be dealt with by some specific technical approaches, and today there exist several distinct ways to tackle each of the challenges, which seems to increase the chance of one working out satisfactorily. Why it might matter: Human empowerment: **Putting aside the general utility of having AI agents that can learn to do difficult things in hard domains without inflicting harm on humans, this research agenda also implies something else: Something which isn’t directly discussed in the paper but which is implicit to this agenda is that it offers a way to *empower humans with AI. *if AI systems continue to scale in capability then it seems likely that in a matter of decades we will fill society with very large AI systems which large numbers of people interact with. We can see the initial outlines of this today in the form of large-scale surveillance systems being deployed in countries like China; in self-driving car fleets being rolled out in increasing numbers in places like Phoenix, Arizona (via Google Waymo); and so on. I wonder what it might be like if we could figure out a way to maximize the number of people in society who were engaged in training AI agents via expressing preferences. After all, the central mandate of many of the world’s political systems comes from people regularly expressing their preferences via voting (and, yes, these systems are a bit rickety and unstable at the moment, but I’m a bit of an optimist here). Could we better align society with increasingly powerful AI systems by more deeply integrating a wider subset of society into the training and development of AI systems? Read more:** Scalable agent alignment via reward modeling: a research direction (Arxiv).
Global police, global government likely necessary to ensure stability from powerful AI, says Bostrom:…If it turns out we’re playing with a rigged slot machine, then how do we make ourselves safe?…Nick Bostrom, researcher and author of Superintelligence *(which influenced the thinking of a large number of people with regard to AI) has published new research in which he tries to figure out what problems policymakers might encounter if it turns out planet earth is a “vulnerable world”; that is a world “which there is some level of technological development at which civilization almost certainly gets devastated by default”. Bostrom’s analysis compares the process of technological development as like a person or group of people steadily withdrawing balls from a vase. Most balls are white (beneficial, eg medicines), while some are of various shades of gray (for instance, technologies that can equally power industry or warmaking). What Bostrom’s *Vulnerable World Hypothesis *papers worries about is whether we could at one point withdraw a “black ball” from the vase. This would be “a technology that invariably or by default destroys the civilization that invents it. The reason is not that we have been particularly careful or wise in our technology policy. We have just been lucky.” In this research, Bostrom creates a framework for thinking about the different types of risks that such balls could embody, and outlines some ideas for potential (extreme!) policy responses to allow civilization to prepare for such a black ball. Types of risks:** To help us think about these black balls, Bostrom lays out a few different types of civilization vulnerability that could be stressed by such technologies.** Type-1 (“easy nukes”)**: “There is some technology which is so destructive and so easy to use that, given the semi-anarchic default condition, the actions of actors in the apocalyptic residual make civilizational devastation extremely likely”.** Type-2a (“safe first strike”)**: “There is some level of technology at which powerful actors have the ability to produce civilization-devastating harms and, in the semi-anarchic default condition, face incentives to use that ability”.** Type-2b (“worse global warming”)**: “There is some level of technology at which, in the semi-anarchic default condition, a great many actors face incentives to take some slightly damaging action such that the combined effect of those actions is civilizational devastation”.** Type-0**: “There is some level of technology that carries a hidden risk such that the default outcome when it is discovered is inadvertent civilizational devastation”.** Policy responses for a risky world: bad ideas: **How could we make a world with any of these vulnerabilities safe and stable? Bostrom initially considers four options then puts aside two as being unlikely to yield sufficient stability to be worth pursuing. These discarded ideas are to: restrict technological development, and “ensure that there does not exist a large population of actors representing a wide and recognizably human distribution of motives” (aka, brainwashing).** Policy responses for a risky world: good ideas: **There are potentially two types of policy response that Bostrom says could increase the safety and stability of the world. These are to adopt “Preventive policing” (which he also gives the deliberately inflammatory nickname “High-tech Panopticon”), as well as “global governance”. Both of these policy approaches are challenging. Preventive policing would require all states being able to “monitor their citizens closely enough to allow them to intercept anybody who begins preparing an act of mass destruction”. Global governance is necessary because states will need “to extremely reliably suppress activities that are very strongly disapproved of by a very large supermajority of the population (and of power-weighted domestic stakeholders)”, Bostrom writes.** Why it matters: **Work like this grapples with one of the essential problems of AI research: are we developing a technology so powerful that it can fundamentally alter the landscape of technological risk, even more so than the discovery of nuclear fission? It seems unlikely that today’s AI systems fit this description, but it does seem plausible that future AI technologies could be. What will we do, then? “Perhaps the reason why the world has failed to eliminate the risk of nuclear war is that the risk was insufficiently great? Had the risk been higher, one could eupeptically argue, then the necessary will to solve the global governance problem would have been found,” Bostrom writes.** Read more:** **The Vulnerable World Hypothesis (Nick Bostrom’s website)*.**