The Boring Future of GenAI
Today, most people use LLMs as an information lookup and generation system.
Develop a personalized workout plan for me.
Summarize and respond to this email.
Give me some recipes I can make with these ingredients.
Without some new breakthrough in AI that opens the door to AGI, we probably won’t get any novel answers when we ask ChatGPT how to monetize it.
Big Data Tar Pits
What is Big Data
My personal definition of Big Data is:
Any Data Processing that requires special considerations to overcome time and/or space limitations. Yours Truly
Which lines up pretty closely with Wikipedia’s definition of Big Data:
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Wikipedia
In essence “Big Data” is an ambiguous term that is generally understood to mean any data that can’t processed using “traditional”, for some definition of traditional, data processing mechanisms. The need for special data processing usually stems from either the speed which the data is changing, call the data velocity; Or the size of the data, called the volume.
Managing big data usually means designing a system that will allow for distributing the data, and the processing of the data, across multiple compute and storage nodes in a cluster aka a Distributed Data Processing System. Even if the exact mechanics of distributing the data and computation is abstracted behind a framework like Hadoop, Spark, or Flink the distributed nature of the system needs to be taken into account.
What is the Fediverse
By the time you are reading this you’ve probably already heard of “The Fediverse”, or at least Mastodon. At the time I am writing this post, the Fediverse is the fastest growing social media platform. At the same time, most people don’t know what the Fediverse is or what makes it different from Facebook, TikTok or X, formerly Twitter.
Lets start with what the Fediverse is. Wikipedia defines the Fediverse as:
The fediverse (a portmanteau of “federation” and “universe”) is an ensemble of federated (i.e. interconnected) servers that are used for web publishing […] and file hosting, but which, while independently hosted, can communicate with each other.
More commonly the Fediverse is defined as the set of applications (e.g. Mastodon or PeerTube) that can communicate using the ActivityPub protocol, the servers and groups that host the applications, and the users of those applications and the content they create. All of that together makes up the Fediverse. The key difference between the Fediverse and traditional social media sites like Facebook or YouTube is the addition of the ActivityPub protocol and independent servers into the mix. Anyone can setup a Mastodon server, and anyone can write a new application that uses the ActivityPub protocol, and that server or that application can, within reason, communicate with any other server or application in the Fediverse.
DNS security has been getting a lot of attention these past couple of years. This has lead to a number of DNS security-enhancing standards to be proposed, with the three big ones being DNS-over-TLS, DNSSEC and DNS-over-HTTPS. In this article we will discuss all three of those standards, the threat model they assume and what protection the provide.
DevComo Bitcoin Transactions Presentation
Here is a link to the slides to the a presentation I gave at DevComo describing Bitcoin Transactions and how they work: The Anatomy of Bitcoin Transactions
FCC Filing in Support of Net Neutrality
With the internet having become an integral part of our Americans lives, it is necessary to protect and preserve free, open and nondiscriminatory internet access for all of us. Internet Service Providers are tasked with connecting users with the content and services available on the internet, not with regulating and managing what content users are able to connect to and how they connect.
It would be unacceptable for a phone company to redirect phone calls from one business to another and would be unfair for a phone company to charge different rates for equal service to two equal businesses. It would also be scandalous for a phone company to record customers phone conversations and then sell that data in order to inject advertisements into a customers telephone conversation.
In the same manner, it is and should be unacceptable for an Internet Service Provider to redirect requests from one website to another, and for an ISP to provide more bandwidth when requesting one site over another. It should be illegal for an ISP to record a customer’s web history and later sell that history to advertisers in order to inject targeted ads into the pages a customer has requested.
Title II places restrictions on phone companies that both protect consumers and create fertile ground for a healthy and robust communication infrastructure. In the same manner the public needs restrictions on ISP to both protect consumers from ISP overreach and create a healthy and fertile internet communication infrastructure that benefits all.
Introduction
Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.
Benjamin Franklin
There has been a recent push by governments and government agencies to ban End-to-End Strong Encryption, all under the guise of stopping terrorists and pedophiles. My hope here is to provide simple arguments in layman’s terms as to why banning strong end-to-end encryption will not improve the government’s, nor it’s agencies’, ability to catch the “Bad Guys”. In fact banning strong encryption will only daemonize legitimate users of strong encryption and undermine the rest of the population’s security, while having almost no effect on those who wish to use it for nefarious means.
Markov Chains
Introduction
A Markov Chain is a set of transitions from one state to the next; Such that the transition from the current state to the next depends only on the current state, the previous and future states do not effect the probability of the transition. A transitions independence from future and past sates is called the Markov Property. What we are going to do is explore Markov Chains through a little story and some code.
MiniMax and Tic-Tac-Toe
What it is
Lets start this post of with some techno-babble. Minimax is a depth-first search algorithm to find the least-lost strategy for zero-sum two person turn based games.
A zero-sum game is a game where, if one player loses, than another player must win the same number of points and vice-versa. For example chess is a zero-sum game because if one player wins (Lets say winning a score of +1) than the other player must lose (getting a score of -1), or the game is a draw and both players get a score of 0. We don’t give players that lose a score of 0 and the winner +1 because that would mean than one player won the game and the other player is at a draw, which is impossible in chess. The same goes for other games such as tic-tac-toe, Go, and four in a row. Another set of zero sum games are games where the total number of points is constant (think poker where players can’t buy chips) and if someone wins (a bet) than the rest of the players must lose points (money) equal to the number of points won.
Depth-first search means that we follow the algorithm all the way down until the end, and then start moving backwards, looking for the best move as we go. The reason why we have to follow the algorithm all the way to the end will become clear soon.
What Tor Users Connect To
The other day I was playing with my Tor node’s configuration when I found out that with absolutely no trickery you can get Tor to log some interesting addresses. One of the things Tor logs is the address of edge (aka. exit) connections, which tells you what the node is connecting to but not who the connection is for.
Naturally it would be interesting to know what other people are using Tor for, so I kept a couple of days worth of logs and wrote this little perl script to parse them out for me.