Import AI

Import AI #97: Faking Obama and Putin with Deep Video Portraits, Berkeley releases a 100,000+ video self-driving car dataset, and what happens when you add the sensation of touch to robots.

by Jack Clark

Try a little tenderness: researchers add touch sensors to robots.…It’s easier to manipulate objects if you can feel them…Researchers with the University of California at Berkeley have added GelSight touch sensors to a standard 7-DoF Rethink Robotics ‘Sawyer’ robot with an attached Weiss WSG-50 parallel gripper to explore how touch inputs can improve performance at grasping objects – a crucial skill for robots to have if used in commercial settings.   Technique: The researchers construct four sub-networks that operate over specific data inputs (camera image, two GelSight images to model texture senses before and after contact, and an action network that processes 3D motion, in-plane rotation, and change in force. They link these networks together within a larger network and train the resulting model over a dataset of objects. The researchers pre-train the image components of the network with a model previously trained to classify objects on ImageNet. The approach yields a model that adapts to novel surfaces, learns interpretable policies, and can be taught to apply specific constraints when handling an object, like grasping it gently. **   Results: The researchers test their model and find that systems trained with vision and action inputs get 73.03% accuracy, compared to 79.34% for systems trained on tactile inputs and action, compared to 80.28% for systems trained with tactile and vision and action.Harder than you think: This task, like most that require applying deep learning components to real-world systems, contains a few quirks which might seem non-obvious from the outset, for example: “The robot only receives tactile input intermittently, when its fingers are in contact with the object and, since each re-grasp attempt can disturb the object position and pose, the scene changes with each interaction”.   Read more:** More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch (Arxiv).

Want 100,000 self-driving car videos? Berkeley has you covered!…”The largest and most diverse open driving video dataset so far for computer vision research”., according to the researchers..Researchers with the University of California at Berkeley and Nexar have published BDD100K, a self-driving car dataset which BDD100K contains ~120,000,000** images spread across ~100,000 videos. “Our database covers different weather conditions, including sunny, overcast, and rainy, as well as different times of day including daytime and nighttime,” they say. The dataset is substantially larger than ones released by the University of Toronto (KITTI), Baidu (ApolloScape), Mapillary, and others, they say.DeepDrive:The dataset release is significant for where it comes from: DeepDrive, a Berkeley-led self-driving car research effort with a vast range of capable partners, including automotive companies such as Honda, Toyota, and Ford. DeepDrive was set up partially so its many sponsors could pool research efforts on self-driving cars, seeking to close an implicit gap with other players.   Rich data: The videos are annotated with hundreds of thousands of labels for objects like cars, trucks, persons, bicycles, and so on, as well as richer annotations for road lines drivable areas, and more; they also provide a subset of roughly ~10,000 images with full-frame instance segmentation.  Why it matters – the rise of the multi-modal dataset: The breadth of the dataset with its millions of labels and carefully refined aspects will likely empower researchers in other areas of AI, as well as its obvious self-driving car audience. I expect that in the future these multi-modal datasets will become increasingly attractive targets to use to evaluate transfer learning from other systems, for instance by training a self-driving car model in a rich simulated world then applying it to real-world data, such as BDD100K.   Challenges: The researchers are hosting three challenges at computer vision conference CVPR relating to the dataset, and are asking groups to compete to develop systems for road object detection, drivable area prediction, and domain adaptation.  Read more: **BDD100K: A Large-scale Diverse Driving Video Database (Berkeley AI Research blog).


An incomplete timeline of dubious things that people have synthesized via AI– Early 2017:**Montreal Startup Lyrebird launches with audio recording featuring synthesized voices of Donald Trump, Barack Obama, Hillary Clinton.– Late 2017:** “DeepFakes” arrive on the internet via Reddit with a user posting pornographic movies with celebrity faces animated onto them. A consumer-oriented free editing application follows and DeepFakes rapidly proliferate across the internet, then consumer sites start to clamp down on them.**– 2018:Belgian socialist party makes a video containing a synthesized Donald Trump giving a (fake) speech about climate change. Party says video designed to create debate and not trick viewers.– Listen: Politicians discussing about Lyrebird (Lyrebird Soundcloud).– Read more: DeepFakes Wikipedia entry.– Read more: Belgian Socialist Party Circulates “Deep Fake” Donald Trump Video (Politico Europe).**

Why all footage of all politicians is about to become suspect:…Think fake news is bad now? ‘Deep Video Portraits’ will make it much, much worse…A couple of years ago European researchers caused a stir with ‘face2face’, technology which they demonstrated by mapping their own facial expressions onto synthetically rendered footage of famous VIPs, like George Bush, Barack Obama, and so on. Now, new research from a group of American and European researchers has pushed this fake-anyone technology further, increasing the fidelity of the rendered footage, reducing the amount of data needed to construct such convincing fakes, and also dealing with visual bugs that would make it easier to identify the output as being synthesized.  In their words: **“We address the problem of synthesizing a photo-realistic video portrait of a target actor that mimics the actions of a source actor, where source and target can be different subjects,” they write. “Our approach enables a source actor to take **full control of the rigid head pose, face expressions and eye motion of the target actor”. (Emphasis mine.)**   Technique:** The technique involves a few stages: first, the researchers track the people within the source and target videos via a monocular face reconstruction approach, which allows them to extract information about the identity, head pose, expression, eye gaze, and scene lighting for each video frame. They also separately track the gaze of each subject. They then essentially transfer the synthetic renderings of the input actor onto the target actor and perform a couple of clever tricks to make the resulting output high fidelity and less prone to synthetic tells like visual smearing/blurring of the background behind the manipulated actor.  Why it matters: **Techniques like this will have a bunch of benefits for people working in media and CGI, but they’ll also be used by nation states, fringe groups, and extremists, to attack and pollute information spaces and reduce overall trust in the digital infrastructure of societal discourse and information transmittion. I worry that we’re woefully unprepared for the ramifications of the rapid proliferation of these techniques and applications. (And controlling the spread of such a technology is a) extremely difficult and b) of dubious practicality and c) potentially harmful to broader beneficial scientific progress.)   An astonishing absence of consideration: I find it remarkable that the researchers don’t take time in the paper to discuss the ramifications of this sort of technology, given that they’re demonstrating it by doing things like transferring President Obama’s expressions onto Putin’s, or Obama’s onto Reagan’s. They make no mention of the political dimension to this work in their ‘Applications’ section, which focuses on the technical details of the approach and how it can be used for applications like ‘visual dubbing’ (getting an actor’s mouth movements to map to an audio track’.  Read more:** Deep Video Portraits (Arxiv).**   Watch video for details:** Deep Video Portraits – SIGGRAPH 2018 (YouTube).DARPA to host synthetic video/image competition:..US defense-oriented research organization to try and push state-of-the-art in creation and detection of fakes…Nation states have become aware of the tremendous potential for subterfuge that this technology poses and are reacting by dumping research money into both exploiting this for gain and for defending against it. This summer, DARPA will hold a competition to see who can create the most convincing synthetic images, and also to see who can detect them.  Read more: The US military is funding an effort to catch deepfakes and other AI trickery (MIT Technology Review).



Google will not renew contract for Project Maven, plans to release principles for how it approaches military and intelligence contracting:….Big Tech continues to grapple with ethical challenges around military AI…Google announced internally on Friday that it would not be renewing the contract it had with the US military for Project Maven, an AI-infused drone surveillance platform, according to Gizmodo. Google also said it is drafting ethical principles for how it approaches military and intelligence contracting.   Why it matters: Military uses of AI remains one of the most contentious issues for the industry, and society, to grapple with. Tech giants will have a role in setting standards for the industry at large. (One wonders how much of a role Google can play here in the USA now, given that it will now be viewed as deeply partisan by the DoD – Jack) *Given that AI is a particularly powerful ‘dual-use’ technology, direct military applications may end up being one of the *easier ethical dilemmas the industry faces in the future.**   Read more: Google plans not to renew Project Maven contract (Gizmodo).   Read more:** How a Pentagon Contract Became an Identity Crisis for Google (NYT).

UK public opposed to AI decision-making in most parts of public life:…The Brits don’t like the bots…The RSA and DeepMind have initiated a project to create ‘meaningful public engagement on the real-world impacts of AI’. The project’s first report includes a survey of UK public attitudes towards automated decision-making systems.   Lack of familiarity: With the exception of online advertising (48% familiar), respondents were overwhelmingly unfamiliar with the use of automated decision-making in key areas. Only 9% were aware of its use in the criminal justice system, 14% in immigration, and 15% in the workplace.**   Opposition to AI decision-making: Most respondents were opposed to the usage of these methods in most parts of society. The strongest opposition was in the usage of AI in the workplace (60% opposed vs 11% support) and criminal justice (60% opposed vs. 12% support).   What the public want:** While 29% said nothing would increase their support for automated decision-making, the poll pointed to a few potential re-mediations that people would support:  36%: The right to demand an explanation for an automated decision.   33%: Penalties for companies failing to monitor systems appropriately.   24%: A set of common principles guiding the use of such systems.   Why it matters: **The report notes that a public backlash against these technologies cannot be ruled out if issues are not addressed. The RSA’s proposal for public engagement via deliberative processes and ‘citizens’ juries’, if successful, could provide a model for other countries.   Read more:** Artificial Intelligence – Real Public Engagement (RSA). Open Philanthropy Project launches AI Fellows program:Over $1 million in funding for high-impact AI researchers…The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is providing $1.1m in PhD funding for seven AI/ML researchers focused on minimizing potential risks from advanced AI systems. “Increasing the probability of positive outcomes from transformative AI”, is one of Open Philanthropy’s priorities.  Read more: Announcing the 2018 AI Fellows.**   Read more:** AI Fellowship Program.

OpenAI Bits & Pieces:

OpenAI Fellows:We designed this program for people who want to be an AI researcher, but do not have a formal background in the field. Applications for Fellows starting in September are open now and will close on July 8th at 12AM PST.**   Read more: OpenAI Fellows (OpenAI Blog). **

Walking Through Shadows That Feel Like Sand

People walked off cliffs. People walked into the middle of the street. People left stoves on. People forgot to eat. People lost their minds.

But the technology got better.

At some point the technology started *saving *more people than it killed.

People stopped short of ledges. People pulled back from traffic. People remembered to turn stoves off. People would eat just the right amount. People healed their minds.

Did we adopt it? Yes, as we always do. Are we happy? Yes, most of us are. And the more of us that are happy, the more likely everyone is going to be happy.

Now, where are we? We are everywhere.

We walk through city streets and get to feel the density of other people via vibrations superimposed onto our bodies by our clothes.We watch our own pet computer programs climbing the synth-neon signs that hang off of real church steeples.We see sunsets superimposed on one another and we can choose whenever to see them, even in the thick of night.

We are many and always growing and learning. What we experience is our choice. But our world grows richer by the day and we feel the world receding, as though a masterpiece overlaid with other paints from other artists, growing richer by the moment.

We do not feel this world, this base layer, so much anymore.

We worry we do not understand the people that choose to stay within it.We worry they do not understand why we choose to stay above it.

Things that inspired this story: Augmented Reality, Virtual Reality, group AR simulations such as Pokemon Go, touch-aware clothing, force feedback, cameras, social and class and technological diffusion dynamics of the 21st century, self-adjusting feedback engines, communal escapism, cults.  Like this:Like Loading…