SunJackson Blog

A more systematic look at suppressed data by @ellis2013nz

转载自：http://feedproxy.google.com/~r/RBloggers/~3/E7lQKqzKQbI/

free range statistics - R

发表于 2018-11-17

Last week I blogged about some different ways of dealing with data in a cross tab that has been suppressed as a means of disclosure control, when the count in a cell is less than six. I tried simple replacement of those cells with “3”, two different multiple imputation methods, and left-censored Poisson regression based on survival methods. I tested those methods on a single two-way simulated cross-tab of the counts of three different types of animals in four different regions, with two suppressed cells.

阅读全文 »

Anticipating the next move in data science – my interview with Thomson Reuters

转载自：http://feedproxy.google.com/~r/kdnuggets-data-mining-analytics/~3/R4C13noCo9c/gps-anticipating-next-move-data-science.html

Gregory Piatetsky

发表于 2018-11-17

阅读全文 »

Congress Over Time

转载自：http://feedproxy.google.com/~r/RBloggers/~3/GnK5BQktORs/

R on kieranhealy.org

发表于 2018-11-17

Since the U.S. midterm elections I’ve been playing around with some Congressional Quarterly data about the composition of the House and Senate since 1945. Unfortunately I’m not allowed to share the data, but here are two or three things I had to do with it that you might find useful.

阅读全文 »

“Using numbers to replace judgment”

转载自：https://andrewgelman.com/2018/11/17/using-numbers-replace-judgment/

Andrew

发表于 2018-11-17

Julian Marewski and Lutz Bornmann write:

阅读全文 »

Benford’s Law for Fraud Detection with an Application to all Brazilian Presidential Elections from 2002 to 2018

转载自：http://feedproxy.google.com/~r/RBloggers/~3/ARgdw51dMmE/

insightr

发表于 2018-11-17

The intuition Let us begin with a brief explanation about Benford’s law and why should it work as a fraud detector method. Given a set of numbers, the first thing we need to do is to extract the first digit of each number. For example, for (121,245,12,55) the first digits will be (1,2,1,5). Perhaps our intuition would say that for a large set of numbers, each first digit, from 1 to 9, would appear in equal proportion, that is $P(x = digit) = 1/9$ $P(x = digit) = 1/9$ for each digit between 1 and 9. However, Benford’s law shows us that this is not true. In fact, smaller digits will have larger probabilits. If you want to see a very didactic explanation of why this happen just watch this video https://www.youtube.com/watch?v=XXjlR2OK1kM&t=460s . We could not give a better explanation.

阅读全文 »

Convert Data Frame to Dictionary List in R

转载自：http://feedproxy.google.com/~r/RBloggers/~3/OrphPx6l2Yo/

statcompute

发表于 2018-11-17

In R, there are a couple ways to convert the column-oriented data frame to a row-oriented dictionary list or alike, e.g. a list of lists.

阅读全文 »

Document worth reading： “Multi-Agent Reinforcement Learning： A Report on Challenges and Approaches”

转载自：https://analytixon.com/2018/11/17/document-worth-reading-multi-agent-reinforcement-learning-a-report-on-challenges-and-approaches/

Michael Laux

发表于 2018-11-17

Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are unique to Multi-Agent Systems interacting in mixed cooperative and competitive environments. The report concludes with advances in the paradigm of training Multi-Agent Systems called \textit{Decentralized Actor, Centralized Critic}, based on an extension of MDPs called \textit{Decentralized Partially Observable MDP}s, which has seen a renewed interest lately. Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

阅读全文 »

Tis the Season to Check your SSL/TLS Cipher List Thrice (RCurl/curl/openssl)

转载自：http://feedproxy.google.com/~r/RBloggers/~3/SYfKrSm-XAM/

hrbrmstr

发表于 2018-11-17

The libcurl library (the foundational library behind the RCurl and curl packages) has switched to using OpenSSLâ€™s default ciphers since version 7.56.0 (October 4 2017). If youâ€™re a regular updater of curl/httr you should be fairly current with these cipher suites, but if youâ€™re not a keen updater or use RCurl for your web-content tasks, you are likely not working with a recent cipher list and may start running into trouble as the internet self-proclaimed web guardians keep their wild abandon push towards â€œHTTPS Everywhereâ€�.

阅读全文 »

If you did not already know

转载自：https://analytixon.com/2018/11/17/if-you-did-not-already-know-547/

Michael Laux

发表于 2018-11-17

Halide Halide is a computer programming language designed for writing digital image processing code that takes advantage of memory locality, vectorized computation and multi-core CPUs and GPUs. Halide is implemented as an internal domain-specific language (DSL) in C++. The main innovation Halide brings is the separation of the algorithm being implemented from its execution schedule, i.e. code specifying the loop nesting, parallelization, loop unrolling and vector instruction. These two are usually interleaved together and experimenting with changing the schedule requires the programmer to rewrite large portions of the algorithm with every change. With Halide, changing the schedule does not require any changes to the algorithm and this allows the programmer to experiment with scheduling and finding the most efficient one. DNN Dataflow Choice Is Overrated …

阅读全文 »

If you did not already know

转载自：https://analytixon.com/2018/11/16/if-you-did-not-already-know-546/

Michael Laux

发表于 2018-11-16

Information Extraction Technology With rise of digital age, there is an explosion of information in the form of news, articles, social media, and so on. Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and labor intensive. This explosion of information and need for more sophisticated and efficient information handling tools gives rise to Information Extraction(IE) and Information Retrieval(IR) technology. Information Extraction systems takes natural language text as input and produces structured information specified by certain criteria, that is relevant to a particular application. Various sub-tasks of IE such as Named Entity Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, Knowledge Base reasoning forms the building blocks of various high end Natural Language Processing (NLP) tasks such as Machine Translation, Question-Answering System, Natural Language Understanding, Text Summarization and Digital Assistants like Siri, Cortana and Google Now. This paper introduces Information Extraction technology, its various sub-tasks, highlights state-of-the-art research in various IE subtasks, current challenges and future research directions. …

阅读全文 »