Combining CNNs and RNNs – Crazy or Genius?
Whats new on arXiv
X-GANs: Image Reconstruction Made Easy for Extreme Cases
Document worth reading: “Sequences, yet Functions: The Dual Nature of Data-Stream Processing”
Data-stream processing has continuously risen in importance as the amount of available data has been steadily increasing over the last decade. Besides traditional domains such as data-center monitoring and click analytics, there is an increasing number of network-enabled production machines that generate continuous streams of data. Due to their continuous nature, queries on data-streams can be more complex, and distinctly harder to understand then database queries. As users have to consider operational details, maintenance and debugging become challenging. Current approaches model data-streams as sequences, because this is the way they are physically received. These models result in an implementation-focused perspective. We explore an alternate way of modeling datastreams by focusing on time-slicing semantics. This focus results in a model based on functions, which is better suited for reasoning about query semantics. By adapting the definitions of relevant concepts in stream processing to our model, we illustrate the practical useful- ness of our approach. Thereby, we link data-streams and query primitives to concepts in functional programming and mathematics. Most noteworthy, we prove that data-streams are monads, and show how to derive monad definitions for current data-stream models. We provide an abstract, yet practical perspective on data- stream related subjects based on a sound, consistent query model. Our work can serve as solid foundation for future data-stream query-languages. Sequences, yet Functions: The Dual Nature of Data-Stream Processing
R Packages worth a look
Estimation and Simulation of Trawl Processes (trawl)Contains R functions for simulating and estimating integer-valued trawl processes as described in the article ‘Modelling, simulation and inference for …
✚ Visualization Away from the Computer, Developing Ideas, Bring in the Constraints
Made-by-hand visualization has been making a mini comeback as of late, and it’s been fun to see what people do with data away from the computer.
The Law and Order of Data Science
Roger Peng ** 2018/08/15
Document worth reading: “How Important Is a Neuron”
The problem of attributing a deep network’s prediction to its \emph{input/base} features is well-studied. We introduce the notion of \emph{conductance} to extend the notion of attribution to the understanding the importance of \emph{hidden} units. Informally, the conductance of a hidden unit of a deep network is the \emph{flow} of attribution via this hidden unit. We use conductance to understand the importance of a hidden unit to the prediction for a specific input, or over a set of inputs. We evaluate the effectiveness of conductance in multiple ways, including theoretical properties, ablation studies, and a feature selection task. The empirical evaluations are done using the Inception network over ImageNet data, and a sentiment analysis network over reviews. In both cases, we demonstrate the effectiveness of conductance in identifying interesting insights about the internal workings of these networks. How Important Is a Neuron
It should be ok to just publish the data.
Gur Huberman asked for my reaction to a recent manuscript, Are CEOs Different? Characteristics of Top Managers, by Steven Kaplan and Morten Sorensen. The paper begins:
Build an automatic alert system to easily moderate content at scale with Amazon Rekognition Video
There has been a steep increase in people creating videos, spending time watching videos, and sharing videos. Most of the videos created today are user-generated content, but publishing this raw content comes with risk. To help ensure a positive website experience for customers by removing inappropriate or unwanted content, companies need a scalable content moderation process.
Document worth reading: “Radial Basis Function Approximations: Comparison and Applications”
Approximation of scattered data is often a task in many engineering problems. The Radial Basis Function (RBF) approximation is appropriate for large scattered (unordered) datasets in d-dimensional space. This approach is useful for a higher dimension d>2, because the other methods require the conversion of a scattered dataset to an ordered dataset (i.e. a semi-regular mesh is obtained by using some tessellation techniques), which is computationally expensive. The RBF approximation is non-separable, as it is based on the distance between two points. This method leads to a solution of Linear System of Equations (LSE) Ac=h. In this paper several RBF approximation methods are briefly introduced and a comparison of those is made with respect to the stability and accuracy of computation. The proposed RBF approximation offers lower memory requirements and better quality of approximation. Radial Basis Function Approximations: Comparison and Applications