Paul's List of Database Driven Project Ideas

PaulH · November 12, 2018, 8:19pm

Here are some ideas using public domain data from data.gov This would be data anyone can use and make money from…
https://www.data.gov/

These are just rough ideas, you will have to dig through the data to refine them.

Create a map explorer of poverty levels and education levels with this data:
https://catalog.data.gov/dataset/county-level-data-sets
(Can also add correlations)

Make a filter table for economic data, with other data visualisations:
https://catalog.data.gov/dataset/statistics-of-us-businesses-susb

Make a real time map of airquality:
https://catalog.data.gov/dataset/air-quality-measures-on-the-national-environmental-health-tracking-network

PaulH · November 14, 2018, 11:22pm

Another idea (though not necessarily database driven) would be to create a program that trains people to make better decisions per prospect theory.

PaulH · December 27, 2018, 1:26am

The NIH database of toxic substances is very cool and in the public domain: https://toxnet.nlm.nih.gov/

How cool would it be to scan the barcode of a product and have the toxic substance warnings appear?

Plus a nice clear reading of safe levels etc…

Josh · December 27, 2018, 4:16am

I wonder if open-source machine learning libraries are good enough yet to identify specific products. That could be interesting.

PaulH · December 27, 2018, 4:12pm

It would be nice to build an ingredient database if one is not out there already. Also the toxnet data is not formatted well for the “toxic” levels of certain chemicals, so it would be a job to format that as well.

PaulH · March 29, 2019, 7:01pm

Chatting with Deena, another nice idea would be to enter some texts and have a program go through peer-reviewed literature for good references.

Deena · March 29, 2019, 7:21pm

Hey Paul,
Can you give us an example and a link to your website, to show where the reference would be cited? -D

PaulH · March 29, 2019, 7:44pm

Thanks for the interest Deena.

Here is an article: https://www.myfooddata.com/articles/foods-high-in-calcium.php

References are linked by number beside the claim. Then there is a references section at the bottom. I don’t really follow proper citation guidelines…though maybe I should change that. Or I might just try the publication title and journal. Names always had odd characters which caused rendering problems…

Josh · March 29, 2019, 9:42pm

I’m not sure exactly what you’re looking for, but there are some “APIs for scholarly resources”:

Home - Resources and Tools for Computational Research - LibGuides at MIT Libraries
scholarly · PyPI
https://scholar.google.com/ (probably has CAPTCHAs to interfere with automation)

PaulH · August 9, 2019, 5:59pm

Random thought: LiChess has a “luck” factor in your profile that quantifies when you having won because of a “blunder” of your opponent vs your skill. IT would be nice to get the aggregate data of luck and see the distribution. Are there people who really are more lucky than others, or does everyone have the same average of luck? (In terms of chess)

bdjewkes · August 9, 2019, 7:05pm

because of a “blunder” of your opponent vs your skill.

The interesting question to me here is how they’re defining a ‘blunder’. Are they able to determine a game-losing misplay, or a move that is unexpectedly bad given a player’s rating? Or are their blunders just a coarse “you’ve beaten someone of equal skill, therefore the loser made a mistake”.

if its the latter, your luck factor factor would just be a leading signal that your rating needs adjustment.
if its the former, luck sounds more proximate to weaknesses in the rating algorithm than anything else… which might actually be interesting to look at

PaulH · August 9, 2019, 7:29pm

Chess is a defined game and a computer can cycle through every possibility. From the total basket of moves it can then rate which move is “best”. If the move you make is 2 standard deviations from the basket of “good” and “best” moves that counts as a blunder.



inacc. = 0.5-1 deviation (EngineProposal-Your move)
mistake = 1-2 deviation
blunder > 2 deviation

So in this case, if you are in too many games where you opponent blunders your rating would go up. I think the rating system is purely based on number of games played, and then winning and losing based on the rating of the opponent.

Interesting questions though, thanks for the interest!
Also, from the slack:
I was talking with other people about this concept of going on a “streak” with many wins in a row, and so on. The luck data matches that. Other times, I feel like nothing works. Kind of reflects “good days” and “bad days”. There probably are studies out quantifying

Also RE: database driven projects – I am not sure that LiChess would release everyone’s luck factor, but it would be interesting! One controversial statement is that “good” players are less likely to be influenced by luck than “bad” players… hmm…

gkarp · August 17, 2019, 10:05pm

Luck would be pretty interesting on a short time scale, like analyzing the results of a specific tournament. Though I bet that the rate of blunders gets really small among higher skilled players, to the point where it wouldn’t have a significant impact.
On a related topic I wonder if certain openings lead to games with a higher rate of blunders. For example if you could look at your game history, and identify that you make a lot more blunders when you play as black against the King’s Gambit.