人口普查局的Marta Murray-Close和Misty L. Heggeness比较了当前人口调查对所得税报告的收入反应。前者可以是捏造的,而后者则是法律准确的。研究人员发现了一个统计差异,表明当妻子比丈夫更多时,他们在调查中报告的差距较小。
Top 20 Python AI and Machine Learning Open Source Projects
TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization. The system is designed to facilitate research in machine learning, and to make it quick and easy to transition from research prototype to production system.Contributors: 1324 (168% up), Commits: 28476, Stars: 92359. Github URL: Tensorflow
Import AI:
Rosie the Robot takes a step closer with new CMU robotics research:…What’s the best way to gather a new robotics research dataset – AirBNB?!…Carnegie Mellon researchers have done the robotics research equivalent of ‘having cake and eating it too; – they have created a new dataset to evaluate generalization within robotics, and have successfully built low-cost robotics which have been able to show meaningful performance on the dataset. The motivation for the research is that most robotics datasets are specific to highly-controlled lab environments, and instead it’s worth exploring generating and gathering data from more real world locations (in this case, homes rented on AirBNB), then see if it’s possible to develop a system that can learn to grasp objects within these datasets, and see if the use of these datasets improves generalization relative to other techniques. How it works: The approach has three key components: a Grasp Prediction Network (GPN) which takes in pixel imagery and tries to predict the correct grasp to take (and which is fine-tuned from a pretrained ResNet-18 model); a Noise Modelling Network (NMN) which tries to estimate the latent noise based on the image of the scene and information from the robot; and a marginalization layer which helps combine the two data streams to predict the best grasp to use.** The robot: They use a Dobot Magician robotic arm with five degrees of freedom, customized with a two axis wrist with electric gripper, and mounted on a Kobuki mobile base. For sensing, they re-quip it with an Intel R200 RGB camera with a pan-tilt attachment positioned 1m above the ground. The robot’s onboard processor is a laptop with an i5-8250U CPU with 8GB of RAM. Each of these robots costs about $3,000 – far less than the $20k+ prices for most other robots. Data gathering:** To gather data for the robots the researchers used six different properties from AirBNB. They then deployed the robot in this home, used a low-cost ‘YOLO’ model to generate bounding boxes around objects near the robot, then let the robot’s GPN and NMN work together to help it predict how to grasp objects. They collect about 28,000 grasps in this manner.** Results:** The researchers try to evaluate their new dataset (which they call Home-LCA) as well as their new ‘Robust-Grasp’ two-part GPN network architecture. First, they examine the test accuracy of their Robus-Grasp network trained on the Home-LCA dataset and applied to other home environments, as well as two datasets which have been collected in traditional lab settings (Lab-Baxter and Lab-LCA). The results here are very encouraging as their approach seems to generalize better to the lab datasets than other approaches, suggesting that the Home-LCA dataset is rich enough to create policies which can generalize somewhat. They also test their approach on deployed physical environments in unseen home environments (three novel AirBNBs). The results show that Home-LCA does substantially better than Lab-derived datasets, showing performance of around 60% accuracy, compared to between 20% and 30% for other approaches – convincing results. Why it matters: Most robotics research suffers from one of two things: 1) either the robot is being trained and tested entirely in simulation, so it’s hard to trust the results. 2) the robot is being evaluated on such a constricted task that it’s hard to get a sense for whether algorithmic progress leading to improved task performance will generalize to other tasks. This paper neatly deals with both of those problems by situating the task and robot in reality, collecting real data, and also evaluating generalization. It also provides further evidence that robot component costs are falling while network performance is improving sufficiently for academic researchers to conduct large-scale real world robotic trials and development, which will no doubt further accelerate progress in this domain.** Read more: **Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias (Arxiv).
Year 3 of Data, Beer, & Inspiration
Two years ago, we hosted our first Meetup in NYC. Our goal? Community. We’re still at it today, and our data community is growing stronger than ever. But as it grows, we want to know more about you - and we need your help to tell us!
AI, Machine Learning and Data Science Roundup: July 2018
A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications I’ve noted over the past month or so.
AWS Deep Learning AMIs now with optimized TensorFlow 1.9 and Apache MXNet 1.2 with Keras 2 support to accelerate deep learning on Amazon EC2 instances
The AWS Deep Learning AMIs for Ubuntu and Amazon Linux now come with an optimized build of TensorFlow 1.9 custom-built directly from source and fine-tuned for high performance training across Amazon EC2 instances. In addition, the AMIs come with the latest Apache MXNet 1.2 with several performance and usability improvements, the new Keras 2-MXNet backend with high performance multi-GPU training support, and a new MXBoard tool for improved debugging and visualization for training MXNet models.
DeepLearning-Github排行
Defining data science in 2018
I got my first data science job in 2012, the year Harvard Business Review announced data scientist to be the sexiest job of the 21st century. Two years later, I published a post on my then-favourite definition of data science, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of making data scientist useless as a job title. However, I still call myself a data scientist. Even better – I still get paid for being a data scientist. But what does it mean? What do I actually do here? This article is a short summary of my understanding of the definition of data science in 2018.
Of statistics class and judo class: Beyond the paradigm of sequential education
In judo class they kinda do the same thing every time: you warm up and then work on different moves. Different moves in different classes, and there are different levels, but within any level the classes don’t really have a sequence. You just start where you start, practice over and over, and gradually improve. Different students in the class are at different levels, both when it comes to specific judo expertise and also general strength, endurance, and flexibility, so it wouldn’t make sense to set up the class sequentially. Even during the semester, some people show up at the dojo once a week, others twice or three times a week.
The Real Problems with Neural Machine Translation
TLDR: No! Your Machine Translation Model is not “prophesying”, but let’s look at the six major issues with neural machine translation (NMT).