QScore¶
What is QScore ?¶
QScore is a competition platform for Data Science.
It is simple, scalable and can host your competition in a minute.
It works with Node.js, Python, RabbitMQ, Redis, Auth0, AngularJS’s CoreUI and it is open source!
Why do we create QScore ?¶
Qscore supports a lot of users in a short time.
During the competition of “Le Meilleur Datascientist de France 2018”, we had peaks of 300 submissions in less than 5 seconds. Most open source platforms we have tested do not work under these stress.
Who use QScore ?¶
QScore is used by Zelros for “Le Meilleur Datascientist de France 2018”.
Documentation¶
You can begin with the My first submission or look at the Changelog.
Now, you can continue with Installation, and become an expert with Advanced.
My first submission¶
Register to the competition¶
TODO: To be written
Get all the data & tutorial¶
TODO: To be written
Open the tutorial notebook¶
TODO: To be written
Set your submission key¶
TODO: To be written
Submit a prediction¶
TODO: To be written
The Apache 2.0 Licence¶
Copyright 2018 Fabien Vauchelles
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Simple installation¶
Recommanded requirements¶
You should use a virtual machine with theses specifications. It is recommanded but not required.
Hardware¶
- RAM: 8Go
- vCPU: 2
- Hdd: 10Go
Software¶
- OS: Ubuntu/Debian
- Node.js: 8.9
- Docker: 18.03-ce (with docker-compose)
Clone the repository¶
Clone the QScore repository:
git clone https://github.com/fabienvauchelles/qscore.git
Go in the qscore
directory:
cd qscore
Configure parameters¶
Go in the deployment/simple
directory:
cd deployment/simple
Copy the configuration template:
cp variables.example.env variables.env
Fill the missing parameters in variables.env
:
Parameter | Description | Example |
---|---|---|
AUTH_PLAYER_ISSUER | Use Domain from Auth0. Template is: https://<domain>/ | https://stuff.eu.auth0.com/ |
AUTH_PLAYER_JWKS_URI | Use Domain from Auth0. Template is: https://<domain>/.well-known/jwks.json | https://stuff.eu.auth0.com/.well-known/jwks.json |
NG_QS_AUTH_PLAYER_AUDIENCE | Use Identifier from Auth0 | https://www.stuff.com |
NG_QS_AUTH_PLAYER_CLIENT_ID | Use Client ID from Auth0 | 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ |
NG_QS_AUTH_PLAYER_DOMAIN | Use Domain from Auth0 | stuff.eu.auth0.com |
NG_QS_AUTH_PLAYER_REDIRECT_URI | Use your server URL like http://<your server url>/callback | http://localhost:3000/callback |
AUTH_ADMIN_SECRET | Use a random string | FgkqZ41Qlal410q40calw412SQSF |
Load the environment¶
Go in the deployment/simple
directory:
export $(cat variables.env | grep "^[^#]" | xargs)
Deploy the project¶
Go in the deployment/simple
directory:
docker-compose build
docker-compose up -d
Connect to the interface¶
See Connect to QScore.
Make yourself an admin¶
See Be an admin.
Create your first competition¶
See My first competition.
Create your own scorer¶
Create the scorer¶
Step 1: Create a new directory for your scorer¶
- Go in the
score-engine/src/scorers
directory - Create a new directory for your scorer
mkdir myscorer
Step 2: Create a new scorer¶
Create a new scorer file __init__.py
:
# -*- coding: utf-8 -*-
from .. import BaseScorer
import pandas as pd
class Scorer(BaseScorer):
def __init__(self):
super().__init__()
def score(self, data_submission):
df_submission = pd.read_csv(data_submission)
score = # Score processing
return score
Re-Deploy the project¶
Go in the deployment/simple
directory:
docker-compose down
docker-compose build
docker-compose up -d
Use the new scorer in your competition¶
- Go to http://localhost:3000
- Open the competition
- Select Edit info on the sidebar
- Write
scorers.myscorer.Scorer
in Scorer Class - Click on Update
Example 1: Scorer of MDSF 2016¶
Here is the scorer of the competition “Le Meilleur Data Scientist de France 2016”.
We use a MAPE metric:
# -*- coding: utf-8 -*-
from .. import BaseScorer
import pandas as pd
import numpy as np
# Mean Absolute Percentage Error
def mape_error(y_true, y_pred):
return np.mean(np.abs((y_true - y_pred) / y_true))[0]
class Scorer(BaseScorer):
def __init__(self):
super().__init__()
def score(self, data_submission):
df_submission = pd.read_csv(
data_submission,
sep=';',
decimal='.',
index_col=0,
header=0,
names=['id', 'price'],
)
submission_columns_count = df_submission.shape[1]
if submission_columns_count != 1:
raise Exception('Submission has {} columns and should have 1 columns with ";" separator'.format(
submission_columns_count
))
df_reference = pd.read_csv(
'scorers/mdsf2016/y_test.csv',
sep=';',
decimal='.',
index_col=0,
header=0,
names=['id', 'price'],
)
reference_rows_count = df_reference.shape[0]
submission_rows_count = df_submission.shape[0]
if submission_rows_count != reference_rows_count:
raise Exception('Submission has {} rows and should have {} rows'.format(
submission_rows_count, reference_rows_count)
)
df_reference.sort_index(inplace=True)
df_submission.sort_index(inplace=True)
score = mape_error(df_reference, df_submission)
return score
Example 2: Scorer of MDSF 2018¶
Here is the scorer of the competition “Le Meilleur Data Scientist de France 2018”.
We use a Logloss metric:
# -*- coding: utf-8 -*-
from .. import BaseScorer
from sklearn.metrics import log_loss
import pandas as pd
class Scorer(BaseScorer):
def __init__(self):
super().__init__()
def score(self, data_submission):
df_submission = pd.read_csv(
data_submission,
sep=',',
decimal='.',
header=0,
names=['id', 'cl1', 'cl2', 'cl3'],
index_col=0,
)
submission_columns_count = df_submission.shape[1]
if submission_columns_count != 3:
raise Exception('Submission has {} columns and should have 3 columns with comma separator'.format(
submission_columns_count
))
df_reference = pd.read_csv(
'scorers/mdsf2018/y_test.csv',
sep=',',
decimal='.',
index_col=0,
header=0,
names=['id', 'delai_vente'],
)
reference_rows_count = df_reference.shape[0]
submission_rows_count = df_submission.shape[0]
if submission_rows_count != reference_rows_count:
raise Exception('Submission has {} rows and should have {} rows'.format(
submission_rows_count, reference_rows_count)
)
df_reference.sort_index(inplace=True)
df_submission.sort_index(inplace=True)
score = log_loss(df_reference, df_submission)
return score
Distributed installation with Jenkins¶
TODO: To be written
Contribute¶
You can open an issue on this repository for any feedback (bug, question, request, pull request, etc.).