Curiosities for July 2022
Stressing myself out on history books, a few notes from my blogpost, etc.
Howdy folks,
Wow, July took its time. I don’t know about you, but the nonstop rains here in Manila made time seem slow. But it’s always a good time to collect and share some curiosities! So strap in and enjoy this month’s finds!
📝 From the blog
I worked on a spaCy pipeline component called the Span Categorizer a few months ago. It’s one of the best deep dives I did in an open-source library, and I learned a lot. If you ask me the most exciting part of the spaCy codebase, I’ll answer that it’s the registry system. It’s an interesting API design that, in my opinion, streamlines a lot of class loading and serialization. Under the hood, it uses a library called catalogue. You won’t notice this if you only interact with spaCy through the command line. Still, as a library maintainer, it’s fun to play around with these components under the hood! I wrote a walkthrough in my blog, using the spancat architecture as an example. If you want to add new patterns to your API design toolkit, you’ll learn a lot from how spaCy does it internally.
If you're a data science practitioner, did you ever encounter cases when your model works well in your test set but fails terribly in the wild? I investigated this phenomenon and concluded that the way we split our data may do us a disservice by overestimating our model's capability. Two papers, "We need to talk about standard splits" (Gorman and Bedrick, 2019) and "We need to talk about random splits" (Søgaard et al., 2021) , studies this effect on different natural language processing (NLP) tasks. It's pretty fun! I got intrigued by this concept and did my own little experiment for NER. I also explored ways to do alternative splits, i.e., creating a "worst" test set to get a more realistic range of performance for your model. I ran out of ideas for an edgy title, so I named the blog post, "Your train-test split may be doing you a disservice."
Outside of machine learning, I wrote a blog post that serves as a love letter to my analog writing tools, i.e., pen and paper. I'm a stationery geek, and I've been very productive with just pen and paper ever since. Last year, I tried using Obsidian and some personal knowledge management (PKM) frameworks. Still, I got overwhelmed in the end and decided that it was not for me. Of course, I still use Obsidian for collecting info, but all my thinking has been guided by writing with pen and paper.
I enjoyed writing this blog post! And even decided to share it in Hacker News. It got a lot of traction (and, of course, attracted its fair share of pedantic comments). But I really enjoyed the long thread about fountain pens and notebooks. I use Lamy 2000 as my daily driver. To be honest, my “search” for the best fountain pen ended after I used a Lamy. It weighs a bit hefty and is smooth to write. If I were to buy another fountain pen, it would probably be Pilot 823. It has a large ink capacity and has a very classic appearance. Neil Gaiman also uses it! To be honest, I just need someone to geek with on stationery!
Recently
📖 Reading
I’ve been rereading the Guns, Germs and Steel book again, as I’d admit that I skimmed through it in my first reading. Aside from that, these two books are hot on my bookshelf:
Why Nations Fail by Acemoglu and Robinson. Man, I really don't know where to start. This book has been hyped up, but I was slightly disappointed by how simplistic the analysis is. The central thesis of the book here was: inclusive institutions, good. Exclusive institutions, bad.1 That's true, but it fails to extrapolate what makes an institution inclusive or exclusive. It's also funny that there seems to be a slight glorification of Western examples (Britain's parliament is inclusive, etc.) and an "after-the-fact" explanation of what's happening in the Global South. Bill Gates feels the same way, and Walden Bello (a prominent social activist in the Philippines) has some harsh thoughts too.
Development as Freedom by Amartya Sen. I got too stressed reading Why Nations Fail that I went back and read a copy of this book. I appreciate that this book considers sustainable development as a framework, not an effect. Development is not as simple as "increase X, then Y happens." Instead, it's a series of overlapping mechanisms that are integrated. I highly recommend this book!
Anime and Manga
Aside from that, Chainsaw Man Vol. 2 has started its weekly release so I’ve been catching up on that. Fujimoto’s really awesome, if a 190-chapter manga isn’t your cup of tea, I recommend checking out his other works, Sayonara Eri and Look Back. They are one-shots, consisting of at most a hundred pages. Lately, I’ve also picked up Ayashimon (from Yuji Kaku, the creator of Jigokoraku). It is set in the world of modern Japan where monsters run gangs and bike squads. The premise is interesting and I love the story, but unfortunately the manga got cancelled (albeit ending in a hopeful note).
Interactive Fiction
Speaking of fiction, I recently joined the online Narrascope Conference last week! It’s a conference about interactive fiction, adventure games, and nonlinear narratives. It’s interesting to see a combination of writers and software developers in one place! As a start, I highly recommend watching Aaron Reed’s talk on “5 Lessons from 50 Years of Text Games,” it was pretty fun:
You might want to supplement this with Game Maker’s Toolkit’s (not Narrascope related) deep dive of the Return of the Obra Dinn. Narrascope also had a workshop on Ink, a scripting language for making text-based games. Pretty cool! If you want something more generative than “rules-based,” why not read this awesome Worldplay paper “Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code” (don’t forget to check out Lyn Cherny’s keynote on Text Toys and Glitch Poetics).
🎮 Playing
Funny because I haven’t played anything new this month. I know I’m missing out on Stray and the PC release of Final Fantasy VII Remake. So yeah, it’s quite sad. Nevertheless, I want to share a few fun coop games I played with my girlfriend last month.
Nidhogg 2. This is one of the most fun couch games ever. I played this game drunk, sober, and happy. It’s a straightforward combat game. There’s a 1v1 mode where you try to reach the end of the game screen using a variety of weapons. The music, as it turns out, is soo groovy. I mean, listen to this:
Groovy, huh? Listen to this personal favorite of mine. It’s just “Oh Yeah” all throughout:
Playing Nidhogg 2 is always a good time. This is a good game when you have friends coming over and just want to try something different aside from the usual Super Smash Bros.
Towerfall Ascension. This is the game that both of us enjoyed. The cooperation element here is too slick. Towerfall is from the creators of Celeste, a personal favorite of mine. It’s an archery combat platformer that you can play with up to 3 more friends. You have a limited number of arrows to hit enemies, but you can pick them up again. It shines best once you and your partner starts synergizing: she shoots an arrow here, I pick it up then shoot it somewhere. It’s a game with simple mechanics but a very high skill ceiling. Watch some high-level gameplay here:
And that’s it! You might notice that I haven’t watched anything new today. I don’t know. There are no new series that interest me recently. I tried watching Money Heist (some of you know how I love heist movies), but the first episode didn’t really stole my heart. I might give it a try again, but if there’s anything you’d recommend, go on and email ahead!
Wow, we’re way past the first half of this year. Looking back, time flies really fast. I wish you well in the coming -ber months! Stay healthy :)
Best,
Lj
I know that this is an overt simplification of their work.