Adam Oudad

Adam Oudad

(Machine) Learning log.

1 minute read

Machine Learning requires data. To obtain some, one way is to scrape or retrieve data directly from somewhere. Another way is to reuse datasets freely available over the internet, specially made for most common tasks. Because it has become so frequent for data scientists to take much time searching and processing data, Google has its own Dataset search tool. There are also big lists of datasets for research purpose, such as this one.

Here is my personal list of datasets I have used or been interested in using, organized by types of data.

Music

Name Notes
MIDI DB midi, music
Feelyoursound music, chord, progression
Jingle jingle, music
Lakh pianoroll midi, pianoroll, music
RWC popular, music
Composing.ai midi
Chord progressions of 5000 songs chords

Links to dataset lists

Lyrics

MoodyLyrics is a sentiment annotated lyrics dataset. DALI is a dataset of words, lines and paragraphs in lyrics aligned with notes.

Text

Name Notes
Sentiment140 sentiment, twitter
Twitter 2010 twitter
Twitter sentiment twitter, sentiment
OPUS parallel, translation
Emergent stance, politics
Europarl speech
DEFT speech, french

Links to dataset lists

comments powered by Disqus

Recent posts

See more

Categories

About

This website is a weblog were I write about computer science, machine learning, language learning.