Reading List
By @lucasdicioccio, 597 words, 0 code snippets, 144 links, 0images.
CS
- composing contracts: an adventure in financial eng
- Gale shapley algo (pdf)
- analysis of random forests models
- stochastic neighbor embedding (pdf)
- separation logic for sequential programs
- finger trees explained (YouTube video)
- achieving high-perf the functional way
- grokking the sequent calculus
- a SDL for experimental game theory (pdf)
- data types a la carte (pdf)
- applicative programming with effects (pdf)
- the zipper
- more on zippers
- a program to solve sudokus (pdf)
- generic discrimination (acm)
- probabilistic functional programming
- FP programming for scalable Bayesian modelling
- hyper-loglog
- data streams algorithms and applications
- an online edge-deletion problem
- max-flow min-cost flow in almost-linear time
- avoiding scalability collapse by restricting concurrency
Systems
nice systems
There always is a host of interesting things to learn by studying large systems.
- The Architecture of Open Source Applications
- Readings in db systems
- How does a database work?
- sqlgraph
- bigtable
- dremel
- raft
- cassandra
- maglev
- tao
- dynamo db
- kafka
- ragel
- graphviz
- graphviz (more)
Haskell Garbage Collection
Functional programming used to have a bad reputation of being slow mostly due to garbage-collection. The Haskell garbage-collector (GC) actually has received some care and a number of options or programming constructs allow to reduce GC pause times.
- Generational GC For Haskell
- Optimising GC Overhead
- Incremental GC
- GHC Memory management
- GHC Special Objects
- Writing Fast Haskell
- Fast XML parser
- Performance Checklist
- Parallelism and GC
- Pushers moving away from Haskell to Go for GC reasons
- Is Haskell real-time ready yet?
metrics and scores
- entropy
- perplexity
- jaccard similarity
- tf/idf
- cosine similarity
- KL
- silouhette score
- davies bouldin
- dunn index
- Calinski Harabasz
- RMS
- estimating likelihood in DL (NLLoss)
- rouge: a package for automatic evaluation of summaries
- greedy function approximation: a gradient boosting machine (pdf)
- random forests (pdf)
NN basics
- learning internal representations by backprop
- handwritten digit recogniztion with backprop (pdf)
- Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity (pdf)
- long short term memory (pdf)
- efficient backprop (pdf)
- Gradient-Based Learning Applied to Document Recognition (pdf)
- convolutional networks and applications in vision
- On the importance of initialization and momentum in deep learning
- Dropout
- CReLU
- ReLU (pdf)
- Convolution for image classification
- ImageNet Classification with DCNN (pdf)
- AdaDelta
- layer normalization
- dropout layers: Improving neural networks by preventing co-adaptation of feature detectors
nns, llms and transformers
- a practical guide to training RBMs
- reducing the dimensionality of data with NNs
- neural machine translation by jointly learning to align and translate
- generating sequences with RNNs
- pointer sentinel mixture models
- Attention is all you need
- LLMs are unsurpervised multitask learners (pdf)
- scalable diffusions models with transformers
- LL-Diffusion-Ms
training, optim
- A General Language Assistant as a Laboratory for Alignment
- Training language models to follow instructions with human feedback
- Fine-tuning LMs for Factuality
- Distilling the Knowledge in NNs
- Training Compute-Optimal Large Language Models
- Distillation scaling laws
reasoning
- Determinants of LLM-assisted decision making
- HybridFlow
- CoT empowers transformers to solve serial problems
- ReAct: Synergizing Reasoning and Acting in Language Models
- llm post-training: a deep dive into reasoning LLM
- DeepSeek-R1
- deepseek-math
- infinite retrieval
- Learning to Model the World with Language website
- world models
- Reasoning with LM is planning with WM
perf optim, deployment
- TinyStories
- Code LLama
- LaTable: towards LTM
- Operationalizing ML
- Scaling laws for Neural LMs
- Rules of ML
- the ultra-scale playbook
- Methods and tools for efficient training on single GPU
- playing pokemon with RL
- OPT: open pre-trained transformer
reviews, courses and misc
- a general language assistant as a laboratory for alignment
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
- Visualizing and Understanding Neural Models in NLP
- Generating wikipedia by summarizing long sequences
- Improving language understanding by generative pre-training (pdf)
- Attention Mechanisms
- What is the alignment objective of GRPO?
- Practical DL for coders
- Nvidia Course
- large lambda model - gpt2 inference in haskell
- real-time single image and video super-resolution using efficient sub-pixel CNN (pixel-shuffle)
- deep-dive llama3
- foundations of LLMs
- probabilistic AI
- generalized interpolated discrete diffusion
Optimization
It’s a bit harder to find good blog-style material on #optimization topics. Besides articles on this site here are some links.
Mixed topics
- What Sequential Games, the Tychonoff Theorem, and the double-negation shift have in common
- Lazy Time Reversal, and Automatic Differentiation
Static and personal site technology
If you read how this blog is built, you’ll figure out that I have some affection for personal and personalized websites. Thus I collect (do not hesitate to send yours to me) other viewpoints on how they built their website.
- My Website is one binary by jes
- XHTMLBoy’s Website recipe
- A list of Cool things people do with their blogs by Wouter Groeneveld
- Theming Static Sites