Show HN: The Population Project Two years ago, I turned 50. After a successful career as an entrepreneur, a business angel and a novelist, I set out to start a philanthropic venture under the following constraints: - it had to be global. - it had to be beautiful (in my eyes, at least). - it had to be technology and stats driven. I decided I would try to list the full name and date of birth of all humans alive. While some may find the concept pointless, I immediately knew I had struck gold: - it was global and incredibly hard. - it had an almost artistic quality to it, like an ever-changing installation. - as a libertarian, I resent that states conduct censuses and then sit on the data. - One billion people in the world aren't officially registered. At least someone would acknowledge their existence. I created a non-profit called The Population Project. I would never make a dime off it, but at least my costs would be tax-deductible. I then started researching lists of names online. I quickly adopted two principles. First I would collect a minimal set of information : full name, birth date, and birth place. Second, I would only scrape public information, i.e. nothing behind a password. After a few months, I realized I needed help from more experienced developers. I chose to work on 4D, a platform I had used in the past to develop my company's information system. It was a tough choice: 4D is not a leading player in the back-end world, but I figured the growth of API tooling would make language choice less critical. The first iteration of our database was frustrating - way too slow to publish a website. I learned the power of incremental change, with each marginal improvement saving you a few percent of speed or space. I also got to implement concepts I had heard about but never implemented, such as mirroring, partitioning, or hash-indexing. Then I hired a team of six data processors in Madagascar who clean up and process the lists found online. Lots of Python and Excel macros in their day-to-day. I have instilled in them an obsession with quality. A bad record will sit in our base forever. After trying dozens of softwares, we've settled on Adobe Acrobat and Octoparse. The final piece was the website. I lucked out in finding a strong team in Romania. They build with Next.js and deploy on Vercel. I gave them Wikipedia as the model to aim for. We/they haven't been able to match Wikipedia's simplicity. Our pages are too heavy. But I find the site user-friendly, pleasing to the eye and reasonably fast. We can and we will do better. A word about privacy. Some people complain that because it publishes names and DOBs, the Population Project infringes on their privacy. We obviously don't see it that way. - All our info is public. That DOB you find on the site is probably in the voter list of your state, a list that anyone can request or plainfully download. - The info we publish is minimal. Basically, we say that you exist. No one will find anything about your race, religion, sexual preferences, job or income. - We have adopted Wikipedia's privacy policy. We do not record your IP, unless you create or edit a record. - We're using Matomo for our Analytics. Great stuff. It's not free but they do not use your data like GA. Why am I telling you all this? From the beginning, I've envisioned a three-step process: 1) Build the database and populate it with millions of Western profiles. 2) Launch the site, where anybody can create or edit records and share them with their family. 3) When we've reached critical mass (1B records?), start making deals with NGOs and governments, and venture into other alphabets. We have just completed step 1. Step 2 is daunting as hell. I have grown a business but I have never grown a website. While I am ready to spend a bit of money on PR or SEO, I am not delusional: to reach the level of success we have in mind, we need this thing to go (somewhat) viral. How do you do that? https://ift.tt/F7E8k4x August 7, 2023 at 09:16PM
0 Comments: