GNES is Generic Neural Elastic Search¶

GNES jee-nes is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form.
Highlights¶
☁️Cloud-Native & Elastic |
🐣Easy-to-Use |
🔬State-of-the-Art |
---|---|---|
GNES is all-in-microservice! Encoder, indexer, preprocessor and router are all running in their own containers. They communicate via versioned APIs and collaborate under the orchestration of Docker Swarm/Kubernetes etc. Scaling, load-balancing, automated recovering, they come off-the-shelf in GNES. | How long would it take to deploy a change that involves just switching a layer in VGG? In GNES, this is just one line change in a YAML file. We abstract the encoding and indexing logic to a YAML config, so that you can change or stack encoders and indexers without even touching the codebase. | Taking advantage of fast-evolving AI/ML/NLP/CV communities, we learn from best-of-breed deep learning models and plug them into GNES, making sure you always enjoy the state-of-the-art performance. |
🌌Generic & Universal |
📦Model as Plugin |
💯Best Practice |
Searching for texts, image or even short-videos? Using Python/C/Java/Go/HTTP as the client? Doesn't matter which content form you have or which language do you use, GNES can handle them all. | When built-in models do not meet your requirments, simply build your own with one Python file and one YAML file. No need to rebuilt GNES framework, as your models will be loaded as plugins and directly rollout online. | We love to learn the best practice from the community, helping our GNES to achieve the next level of availability, resiliency, performance, and durability. If you have any ideas or suggestions, feel free to contribute. |
All Microservices in GNES¶
[32mGNES v0.0.46: Generic Neural Elastic Search[0m, a cloud-native semantic search system based on deep neural network. It enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any content form. Visit [4m[36mhttps://gnes.ai[0m for tutorials and documentations.
usage: gnes [-h] [-v] [--verbose]
{frontend,encode,index,route,preprocess,grpc,client,compose,healthcheck}
...
Named Arguments¶
-v, --version | show program’s version number and exit |
--verbose | turn on detailed logging for debug Default: False |
GNES sub-commands¶
use “gnes [sub-command] –help” to get detailed information about each sub-command
cli | Possible choices: frontend, encode, index, route, preprocess, grpc, client, compose, healthcheck |
Sub-commands:¶
frontend¶
start a frontend service
gnes frontend [-h] [--port_in PORT_IN] [--port_out PORT_OUT]
[--host_in HOST_IN] [--host_out HOST_OUT]
[--socket_in {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--socket_out {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--port_ctrl PORT_CTRL] [--timeout TIMEOUT]
[--dump_interval DUMP_INTERVAL] [--read_only]
[--parallel_backend {thread,process}]
[--num_parallel NUM_PARALLEL]
[--parallel_type {PUSH_BLOCK,PUSH_NONBLOCK,PUB_BLOCK,PUB_NONBLOCK}]
[--check_version] [--identity IDENTITY] [--route_table]
[--squeeze_pb] [--ctrl_with_ipc] [--grpc_host GRPC_HOST]
[--grpc_port GRPC_PORT] [--max_message_size MAX_MESSAGE_SIZE]
[--proxy] [--max_concurrency MAX_CONCURRENCY]
[--dump_route DUMP_ROUTE]
[--max_pending_request MAX_PENDING_REQUEST]
Named Arguments¶
--port_in | port for input data, default a random port between [49152, 65536] Default: 65456 |
--port_out | port for output data, default a random port between [49152, 65536] Default: 62176 |
--host_in | host address for input Default: “0.0.0.0” |
--host_out | host address for output Default: “0.0.0.0” |
--socket_in | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for input port Default: PULL_BIND |
--socket_out | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for output port Default: PUSH_BIND |
--port_ctrl | port for controlling the service, default a random port between [49152, 65536] Default: 57219 |
--timeout | timeout (ms) of all communication, -1 for waiting forever Default: -1 |
--dump_interval | |
serialize the model in the service every n seconds if model changes. -1 means –read_only. Default: 5 | |
--read_only | do not allow the service to modify the model, dump_interval will be ignored Default: True |
--parallel_backend | |
Possible choices: thread, process parallel backend of the service Default: “thread” | |
--num_parallel, --replicas | |
number of parallel services running at the same time (i.e. replicas), port_in and port_out will be set to random, and routers will be added automatically when necessary Default: 1 | |
--parallel_type, --replica_type | |
Possible choices: PUSH_BLOCK, PUSH_NONBLOCK, PUB_BLOCK, PUB_NONBLOCK parallel type of the concurrent services Default: PUSH_NONBLOCK | |
--check_version, --no-check_version, --no_check_version | |
comparing the GNES and proto version of incoming message with local setup, mismatch raise an exception Default: True | |
--identity | identity of the service, empty by default Default: “” |
--route_table, --no-route_table, --no_route_table | |
showing a route table with time cost after receiving the result Default: False | |
--squeeze_pb, --no-squeeze_pb, --no_squeeze_pb | |
sending bytes and ndarray separately apart from the protobuf message, usually yields better network efficiency Default: True | |
--ctrl_with_ipc | |
use ipc protocol for control socket Default: False | |
--grpc_host | host address of the grpc service Default: “0.0.0.0” |
--grpc_port | host port of the grpc service Default: 8800 |
--max_message_size | |
maximum send and receive size for grpc server in bytes, -1 means unlimited Default: -1 | |
--proxy, --no-proxy, --no_proxy | |
respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems perfer –no_proxy Default: False | |
--max_concurrency | |
maximum concurrent connections allowed Default: 10 | |
--dump_route | dumping route information to a file |
--max_pending_request | |
maximum number of pending requests allowed, when exceed wait until we receive the response Default: 100 |
encode¶
start an encoder service
gnes encode [-h] [--port_in PORT_IN] [--port_out PORT_OUT] [--host_in HOST_IN]
[--host_out HOST_OUT]
[--socket_in {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--socket_out {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--port_ctrl PORT_CTRL] [--timeout TIMEOUT]
[--dump_interval DUMP_INTERVAL] [--read_only]
[--parallel_backend {thread,process}]
[--num_parallel NUM_PARALLEL]
[--parallel_type {PUSH_BLOCK,PUSH_NONBLOCK,PUB_BLOCK,PUB_NONBLOCK}]
[--check_version] [--identity IDENTITY] [--route_table]
[--squeeze_pb] [--ctrl_with_ipc] --yaml_path YAML_PATH
[--py_path PY_PATH [PY_PATH ...]]
Named Arguments¶
--port_in | port for input data, default a random port between [49152, 65536] Default: 56247 |
--port_out | port for output data, default a random port between [49152, 65536] Default: 53409 |
--host_in | host address for input Default: “0.0.0.0” |
--host_out | host address for output Default: “0.0.0.0” |
--socket_in | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for input port Default: PULL_BIND |
--socket_out | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for output port Default: PUSH_BIND |
--port_ctrl | port for controlling the service, default a random port between [49152, 65536] Default: 49270 |
--timeout | timeout (ms) of all communication, -1 for waiting forever Default: -1 |
--dump_interval | |
serialize the model in the service every n seconds if model changes. -1 means –read_only. Default: 5 | |
--read_only | do not allow the service to modify the model, dump_interval will be ignored Default: False |
--parallel_backend | |
Possible choices: thread, process parallel backend of the service Default: “thread” | |
--num_parallel, --replicas | |
number of parallel services running at the same time (i.e. replicas), port_in and port_out will be set to random, and routers will be added automatically when necessary Default: 1 | |
--parallel_type, --replica_type | |
Possible choices: PUSH_BLOCK, PUSH_NONBLOCK, PUB_BLOCK, PUB_NONBLOCK parallel type of the concurrent services Default: PUSH_NONBLOCK | |
--check_version, --no-check_version, --no_check_version | |
comparing the GNES and proto version of incoming message with local setup, mismatch raise an exception Default: True | |
--identity | identity of the service, empty by default Default: “” |
--route_table, --no-route_table, --no_route_table | |
showing a route table with time cost after receiving the result Default: False | |
--squeeze_pb, --no-squeeze_pb, --no_squeeze_pb | |
sending bytes and ndarray separately apart from the protobuf message, usually yields better network efficiency Default: True | |
--ctrl_with_ipc | |
use ipc protocol for control socket Default: False | |
--yaml_path | yaml config of the service, it should be a readable stream, or a valid file path, or a supported class name. |
--py_path | the file path(s) of an external python module(s). |
index¶
start an indexer service
gnes index [-h] [--port_in PORT_IN] [--port_out PORT_OUT] [--host_in HOST_IN]
[--host_out HOST_OUT]
[--socket_in {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--socket_out {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--port_ctrl PORT_CTRL] [--timeout TIMEOUT]
[--dump_interval DUMP_INTERVAL] [--read_only]
[--parallel_backend {thread,process}] [--num_parallel NUM_PARALLEL]
[--parallel_type {PUSH_BLOCK,PUSH_NONBLOCK,PUB_BLOCK,PUB_NONBLOCK}]
[--check_version] [--identity IDENTITY] [--route_table]
[--squeeze_pb] [--ctrl_with_ipc] --yaml_path YAML_PATH
[--py_path PY_PATH [PY_PATH ...]] [--sorted_response]
[--as_response AS_RESPONSE]
Named Arguments¶
--port_in | port for input data, default a random port between [49152, 65536] Default: 64397 |
--port_out | port for output data, default a random port between [49152, 65536] Default: 52558 |
--host_in | host address for input Default: “0.0.0.0” |
--host_out | host address for output Default: “0.0.0.0” |
--socket_in | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for input port Default: PULL_BIND |
--socket_out | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for output port Default: PUSH_BIND |
--port_ctrl | port for controlling the service, default a random port between [49152, 65536] Default: 50698 |
--timeout | timeout (ms) of all communication, -1 for waiting forever Default: -1 |
--dump_interval | |
serialize the model in the service every n seconds if model changes. -1 means –read_only. Default: 5 | |
--read_only | do not allow the service to modify the model, dump_interval will be ignored Default: False |
--parallel_backend | |
Possible choices: thread, process parallel backend of the service Default: “thread” | |
--num_parallel, --replicas | |
number of parallel services running at the same time (i.e. replicas), port_in and port_out will be set to random, and routers will be added automatically when necessary Default: 1 | |
--parallel_type, --replica_type | |
Possible choices: PUSH_BLOCK, PUSH_NONBLOCK, PUB_BLOCK, PUB_NONBLOCK parallel type of the concurrent services Default: PUSH_NONBLOCK | |
--check_version, --no-check_version, --no_check_version | |
comparing the GNES and proto version of incoming message with local setup, mismatch raise an exception Default: True | |
--identity | identity of the service, empty by default Default: “” |
--route_table, --no-route_table, --no_route_table | |
showing a route table with time cost after receiving the result Default: False | |
--squeeze_pb, --no-squeeze_pb, --no_squeeze_pb | |
sending bytes and ndarray separately apart from the protobuf message, usually yields better network efficiency Default: True | |
--ctrl_with_ipc | |
use ipc protocol for control socket Default: False | |
--yaml_path | yaml config of the service, it should be a readable stream, or a valid file path, or a supported class name. |
--py_path | the file path(s) of an external python module(s). |
--sorted_response | |
sort the response (if exist) by the score Default: False | |
--as_response | convert the message type from request to response after indexing. turn it off if you want to chain other services after this index service. Default: True |
route¶
start a router service
gnes route [-h] [--port_in PORT_IN] [--port_out PORT_OUT] [--host_in HOST_IN]
[--host_out HOST_OUT]
[--socket_in {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--socket_out {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--port_ctrl PORT_CTRL] [--timeout TIMEOUT]
[--dump_interval DUMP_INTERVAL] [--read_only]
[--parallel_backend {thread,process}] [--num_parallel NUM_PARALLEL]
[--parallel_type {PUSH_BLOCK,PUSH_NONBLOCK,PUB_BLOCK,PUB_NONBLOCK}]
[--check_version] [--identity IDENTITY] [--route_table]
[--squeeze_pb] [--ctrl_with_ipc] --yaml_path YAML_PATH
[--py_path PY_PATH [PY_PATH ...]] [--sorted_response]
[--num_part NUM_PART]
Named Arguments¶
--port_in | port for input data, default a random port between [49152, 65536] Default: 55870 |
--port_out | port for output data, default a random port between [49152, 65536] Default: 53106 |
--host_in | host address for input Default: “0.0.0.0” |
--host_out | host address for output Default: “0.0.0.0” |
--socket_in | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for input port Default: PULL_BIND |
--socket_out | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for output port Default: PUSH_BIND |
--port_ctrl | port for controlling the service, default a random port between [49152, 65536] Default: 54305 |
--timeout | timeout (ms) of all communication, -1 for waiting forever Default: -1 |
--dump_interval | |
serialize the model in the service every n seconds if model changes. -1 means –read_only. Default: 5 | |
--read_only | do not allow the service to modify the model, dump_interval will be ignored Default: True |
--parallel_backend | |
Possible choices: thread, process parallel backend of the service Default: “thread” | |
--num_parallel, --replicas | |
number of parallel services running at the same time (i.e. replicas), port_in and port_out will be set to random, and routers will be added automatically when necessary Default: 1 | |
--parallel_type, --replica_type | |
Possible choices: PUSH_BLOCK, PUSH_NONBLOCK, PUB_BLOCK, PUB_NONBLOCK parallel type of the concurrent services Default: PUSH_NONBLOCK | |
--check_version, --no-check_version, --no_check_version | |
comparing the GNES and proto version of incoming message with local setup, mismatch raise an exception Default: True | |
--identity | identity of the service, empty by default Default: “” |
--route_table, --no-route_table, --no_route_table | |
showing a route table with time cost after receiving the result Default: False | |
--squeeze_pb, --no-squeeze_pb, --no_squeeze_pb | |
sending bytes and ndarray separately apart from the protobuf message, usually yields better network efficiency Default: True | |
--ctrl_with_ipc | |
use ipc protocol for control socket Default: False | |
--yaml_path | yaml config of the service, it should be a readable stream, or a valid file path, or a supported class name. |
--py_path | the file path(s) of an external python module(s). |
--sorted_response | |
sort the response (if exist) by the score Default: False | |
--num_part | explicitly set the number of parts of message |
preprocess¶
start a preprocessor service
gnes preprocess [-h] [--port_in PORT_IN] [--port_out PORT_OUT]
[--host_in HOST_IN] [--host_out HOST_OUT]
[--socket_in {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--socket_out {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--port_ctrl PORT_CTRL] [--timeout TIMEOUT]
[--dump_interval DUMP_INTERVAL] [--read_only]
[--parallel_backend {thread,process}]
[--num_parallel NUM_PARALLEL]
[--parallel_type {PUSH_BLOCK,PUSH_NONBLOCK,PUB_BLOCK,PUB_NONBLOCK}]
[--check_version] [--identity IDENTITY] [--route_table]
[--squeeze_pb] [--ctrl_with_ipc] --yaml_path YAML_PATH
[--py_path PY_PATH [PY_PATH ...]]
Named Arguments¶
--port_in | port for input data, default a random port between [49152, 65536] Default: 51109 |
--port_out | port for output data, default a random port between [49152, 65536] Default: 61281 |
--host_in | host address for input Default: “0.0.0.0” |
--host_out | host address for output Default: “0.0.0.0” |
--socket_in | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for input port Default: PULL_BIND |
--socket_out | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for output port Default: PUSH_BIND |
--port_ctrl | port for controlling the service, default a random port between [49152, 65536] Default: 55495 |
--timeout | timeout (ms) of all communication, -1 for waiting forever Default: -1 |
--dump_interval | |
serialize the model in the service every n seconds if model changes. -1 means –read_only. Default: 5 | |
--read_only | do not allow the service to modify the model, dump_interval will be ignored Default: True |
--parallel_backend | |
Possible choices: thread, process parallel backend of the service Default: “thread” | |
--num_parallel, --replicas | |
number of parallel services running at the same time (i.e. replicas), port_in and port_out will be set to random, and routers will be added automatically when necessary Default: 1 | |
--parallel_type, --replica_type | |
Possible choices: PUSH_BLOCK, PUSH_NONBLOCK, PUB_BLOCK, PUB_NONBLOCK parallel type of the concurrent services Default: PUSH_NONBLOCK | |
--check_version, --no-check_version, --no_check_version | |
comparing the GNES and proto version of incoming message with local setup, mismatch raise an exception Default: True | |
--identity | identity of the service, empty by default Default: “” |
--route_table, --no-route_table, --no_route_table | |
showing a route table with time cost after receiving the result Default: False | |
--squeeze_pb, --no-squeeze_pb, --no_squeeze_pb | |
sending bytes and ndarray separately apart from the protobuf message, usually yields better network efficiency Default: True | |
--ctrl_with_ipc | |
use ipc protocol for control socket Default: False | |
--yaml_path | yaml config of the service, it should be a readable stream, or a valid file path, or a supported class name. |
--py_path | the file path(s) of an external python module(s). |
grpc¶
start a general purpose grpc service
gnes grpc [-h] [--port_in PORT_IN] [--port_out PORT_OUT] [--host_in HOST_IN]
[--host_out HOST_OUT]
[--socket_in {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--socket_out {PULL_BIND,PULL_CONNECT,PUSH_BIND,PUSH_CONNECT,SUB_BIND,SUB_CONNECT,PUB_BIND,PUB_CONNECT,PAIR_BIND,PAIR_CONNECT}]
[--port_ctrl PORT_CTRL] [--timeout TIMEOUT]
[--dump_interval DUMP_INTERVAL] [--read_only]
[--parallel_backend {thread,process}] [--num_parallel NUM_PARALLEL]
[--parallel_type {PUSH_BLOCK,PUSH_NONBLOCK,PUB_BLOCK,PUB_NONBLOCK}]
[--check_version] [--identity IDENTITY] [--route_table]
[--squeeze_pb] [--ctrl_with_ipc] [--grpc_host GRPC_HOST]
[--grpc_port GRPC_PORT] [--max_message_size MAX_MESSAGE_SIZE]
[--proxy] --pb2_path PB2_PATH --pb2_grpc_path PB2_GRPC_PATH
--stub_name STUB_NAME --api_name API_NAME
Named Arguments¶
--port_in | port for input data, default a random port between [49152, 65536] Default: 57049 |
--port_out | port for output data, default a random port between [49152, 65536] Default: 59811 |
--host_in | host address for input Default: “0.0.0.0” |
--host_out | host address for output Default: “0.0.0.0” |
--socket_in | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for input port Default: PULL_BIND |
--socket_out | Possible choices: PULL_BIND, PULL_CONNECT, PUSH_BIND, PUSH_CONNECT, SUB_BIND, SUB_CONNECT, PUB_BIND, PUB_CONNECT, PAIR_BIND, PAIR_CONNECT socket type for output port Default: PUSH_BIND |
--port_ctrl | port for controlling the service, default a random port between [49152, 65536] Default: 54864 |
--timeout | timeout (ms) of all communication, -1 for waiting forever Default: -1 |
--dump_interval | |
serialize the model in the service every n seconds if model changes. -1 means –read_only. Default: 5 | |
--read_only | do not allow the service to modify the model, dump_interval will be ignored Default: False |
--parallel_backend | |
Possible choices: thread, process parallel backend of the service Default: “thread” | |
--num_parallel, --replicas | |
number of parallel services running at the same time (i.e. replicas), port_in and port_out will be set to random, and routers will be added automatically when necessary Default: 1 | |
--parallel_type, --replica_type | |
Possible choices: PUSH_BLOCK, PUSH_NONBLOCK, PUB_BLOCK, PUB_NONBLOCK parallel type of the concurrent services Default: PUSH_NONBLOCK | |
--check_version, --no-check_version, --no_check_version | |
comparing the GNES and proto version of incoming message with local setup, mismatch raise an exception Default: True | |
--identity | identity of the service, empty by default Default: “” |
--route_table, --no-route_table, --no_route_table | |
showing a route table with time cost after receiving the result Default: False | |
--squeeze_pb, --no-squeeze_pb, --no_squeeze_pb | |
sending bytes and ndarray separately apart from the protobuf message, usually yields better network efficiency Default: True | |
--ctrl_with_ipc | |
use ipc protocol for control socket Default: False | |
--grpc_host | host address of the grpc service Default: “0.0.0.0” |
--grpc_port | host port of the grpc service Default: 8800 |
--max_message_size | |
maximum send and receive size for grpc server in bytes, -1 means unlimited Default: -1 | |
--proxy, --no-proxy, --no_proxy | |
respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems perfer –no_proxy Default: False | |
--pb2_path | the path of the python file protocol buffer compiler |
--pb2_grpc_path | |
the path of the python file generated by the gRPC Python protocol compiler plugin | |
--stub_name | the name of the gRPC Stub |
--api_name | the api name for calling the stub |
client¶
start a GNES client of the selected type
gnes client [-h] {http,cli} ...
GNES client sub-commands¶
use “gnes client [sub-command] –help” to get detailed information about each client sub-command
client | Possible choices: http, cli |
Sub-commands:¶
start a client that allows HTTP requests as input
gnes client http [-h] [--grpc_host GRPC_HOST] [--grpc_port GRPC_PORT]
[--max_message_size MAX_MESSAGE_SIZE] [--proxy]
[--http_port HTTP_PORT] [--http_host HTTP_HOST]
[--max_workers MAX_WORKERS] [--top_k TOP_K]
[--batch_size BATCH_SIZE]
--grpc_host | host address of the grpc service Default: “0.0.0.0” |
--grpc_port | host port of the grpc service Default: 8800 |
--max_message_size | |
maximum send and receive size for grpc server in bytes, -1 means unlimited Default: -1 | |
--proxy, --no-proxy, --no_proxy | |
respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems perfer –no_proxy Default: False | |
--http_port | http port to deploy the service Default: 80 |
--http_host | http host to deploy the service Default: “0.0.0.0” |
--max_workers | max workers to deal with the message Default: 100 |
--top_k | default top_k for query mode Default: 10 |
--batch_size | batch size for feed data for train mode Default: 2560 |
start a client that allows stdin as input
gnes client cli [-h] [--grpc_host GRPC_HOST] [--grpc_port GRPC_PORT]
[--max_message_size MAX_MESSAGE_SIZE] [--proxy]
[--txt_file TXT_FILE | --image_zip_file IMAGE_ZIP_FILE | --video_zip_file VIDEO_ZIP_FILE]
[--batch_size BATCH_SIZE] --mode {index,query,train}
[--top_k TOP_K] [--start_doc_id START_DOC_ID]
[--max_concurrency MAX_CONCURRENCY]
--grpc_host | host address of the grpc service Default: “0.0.0.0” |
--grpc_port | host port of the grpc service Default: 8800 |
--max_message_size | |
maximum send and receive size for grpc server in bytes, -1 means unlimited Default: -1 | |
--proxy, --no-proxy, --no_proxy | |
respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems perfer –no_proxy Default: False | |
--txt_file | text file to be used, each line is a doc/query Default: <_io.TextIOWrapper name=’<stdin>’ mode=’r’ encoding=’UTF-8’> |
--image_zip_file | |
image zip file to be used, consists of multiple images | |
--video_zip_file | |
video zip file to be used, consists of multiple videos | |
--batch_size | the size of the request to split Default: 100 |
--mode | Possible choices: index, query, train the mode of the client and the server |
--top_k | top_k results returned in the query mode Default: 10 |
--start_doc_id | the start number of doc id Default: 0 |
--max_concurrency | |
maximum concurrent connections allowed Default: 10 |
compose¶
start a GNES Board to visualize YAML configs
gnes compose [-h] [--port PORT] [--name NAME] [--yaml_path YAML_PATH]
[--html_path HTML_PATH] [--shell_path SHELL_PATH]
[--swarm_path SWARM_PATH] [--k8s_path K8S_PATH]
[--graph_path GRAPH_PATH]
[--shell_log_redirect SHELL_LOG_REDIRECT] [--mermaid_leftright]
[--docker_img DOCKER_IMG] [--flask | --serve]
[--http_port HTTP_PORT]
Named Arguments¶
--port | host port of the grpc service Default: 8800 |
--name | name of the instance Default: “GNES app” |
--yaml_path | yaml config of the service Default: <_io.BufferedReader name=’/home/docs/checkouts/readthedocs.org/user_builds/gnes/checkouts/latest/gnes/resources/compose/gnes-example.yml’> |
--html_path | output path of the HTML file, will contain all possible generations |
--shell_path | output path of the shell-based starting script |
--swarm_path | output path of the docker-compose file for Docker Swarm |
--k8s_path | output path of the docker-compose file for Docker Swarm |
--graph_path | output path of the mermaid graph file |
--shell_log_redirect | |
the file path for redirecting shell output. when not given, the output will be flushed to stdout | |
--mermaid_leftright | |
showing the flow in left-to-right manner rather than top down Default: False | |
--docker_img | the docker image used in Docker Swarm & Kubernetes Default: “gnes/gnes:latest-alpine” |
--flask | start a Flask server and serve the composer in interactive mode, aka GNES board Default: False |
--serve | start a basic HTTP server and serve the composer in interactive mode, aka GNES board Default: False |
--http_port | server port for receiving HTTP requests Default: 8080 |
healthcheck¶
do health check on any GNES microservice
gnes healthcheck [-h] [--host HOST] --port PORT [--timeout TIMEOUT]
[--retries RETRIES]
Named Arguments¶
--host | host address of the checked service Default: “127.0.0.1” |
--port | control port of the checked service |
--timeout | timeout (ms) of one check, -1 for waiting forever Default: 1000 |
--retries | max number of tried health checks before exit 1 Default: 3 |
gnes package¶
Subpackages¶
gnes.base package¶
Module contents¶
-
class
gnes.base.
TrainableBase
(*args, **kwargs)[source]¶ Bases:
object
The base class for preprocessor, encoder, indexer and router
-
dump
(filename=None)[source]¶ Serialize the object to a binary file
Parameters: filename ( Optional
[str
]) – file path of the serialized file, if not given thendump_full_path
is usedReturn type: None
-
dump_full_path
¶ Get the binary dump path
Returns:
-
dump_yaml
(filename=None)[source]¶ Serialize the object to a yaml file
Parameters: filename ( Optional
[str
]) – file path of the yaml file, if not given thendump_yaml_path
is usedReturn type: None
-
store_args_kwargs
= False¶
-
yaml_full_path
¶ Get the file path of the yaml config
Returns:
-
gnes.cli package¶
gnes.client package¶
Submodules¶
-
class
gnes.client.base.
GrpcClient
(args)[source]¶ Bases:
object
A Base Unary gRPC client which the other client application can build from.
-
handler
= <gnes.client.base.ResponseHandler object>¶
-
-
class
gnes.client.cli.
CLIClient
(args, start_at_init=True)[source]¶ Bases:
gnes.client.base.GrpcClient
-
bytes_generator
¶ Return type: Iterator
[bytes
]
-
-
class
gnes.client.stream.
StreamingClient
(args)[source]¶ Bases:
gnes.client.base.GrpcClient
-
handler
= <gnes.client.base.ResponseHandler object>¶
-
Module contents¶
gnes.composer package¶
Submodules¶
-
class
gnes.composer.base.
YamlComposer
(args)[source]¶ Bases:
object
-
class
Layer
(layer_id=0)[source]¶ Bases:
object
-
default_values
= {'image': None, 'income': 'pull', 'name': None, 'py_path': None, 'replicas': 1, 'yaml_path': None}¶
-
get_component_name
¶
-
is_heto_single_component
¶
-
is_homo_multi_component
¶
-
is_homogenous
¶
-
is_single_component
¶
-
-
static
build_dockerswarm
(all_layers, docker_img='gnes/gnes:latest-alpine', volumes=None, networks=None)[source]¶ Return type: str
-
comp2args
= {'Encoder': Namespace(check_version=True, ctrl_with_ipc=False, dump_interval=5, host_in='0.0.0.0', host_out='0.0.0.0', identity='', num_parallel=1, parallel_backend='thread', parallel_type=<ParallelType.PUSH_NONBLOCK: 1>, port_ctrl=61193, port_in=58714, port_out=55122, py_path=None, read_only=False, route_table=False, socket_in=<SocketType.PULL_BIND: 0>, socket_out=<SocketType.PUSH_BIND: 2>, squeeze_pb=True, timeout=-1, verbose=False, yaml_path=<_io.StringIO object>), 'Frontend': Namespace(check_version=True, ctrl_with_ipc=False, dump_interval=5, dump_route=None, grpc_host='0.0.0.0', grpc_port=8800, host_in='0.0.0.0', host_out='0.0.0.0', identity='', max_concurrency=10, max_message_size=-1, max_pending_request=100, num_parallel=1, parallel_backend='thread', parallel_type=<ParallelType.PUSH_NONBLOCK: 1>, port_ctrl=57108, port_in=54958, port_out=54698, proxy=False, read_only=True, route_table=False, socket_in=<SocketType.PULL_BIND: 0>, socket_out=<SocketType.PUSH_BIND: 2>, squeeze_pb=True, timeout=-1, verbose=False), 'Indexer': Namespace(as_response=True, check_version=True, ctrl_with_ipc=False, dump_interval=5, host_in='0.0.0.0', host_out='0.0.0.0', identity='', num_parallel=1, parallel_backend='thread', parallel_type=<ParallelType.PUSH_NONBLOCK: 1>, port_ctrl=57195, port_in=62576, port_out=56615, py_path=None, read_only=False, route_table=False, socket_in=<SocketType.PULL_BIND: 0>, socket_out=<SocketType.PUSH_BIND: 2>, sorted_response=False, squeeze_pb=True, timeout=-1, verbose=False, yaml_path=<_io.StringIO object>), 'Preprocessor': Namespace(check_version=True, ctrl_with_ipc=False, dump_interval=5, host_in='0.0.0.0', host_out='0.0.0.0', identity='', num_parallel=1, parallel_backend='thread', parallel_type=<ParallelType.PUSH_NONBLOCK: 1>, port_ctrl=55187, port_in=59890, port_out=49570, py_path=None, read_only=True, route_table=False, socket_in=<SocketType.PULL_BIND: 0>, socket_out=<SocketType.PUSH_BIND: 2>, squeeze_pb=True, timeout=-1, verbose=False, yaml_path=<_io.StringIO object>), 'Router': Namespace(check_version=True, ctrl_with_ipc=False, dump_interval=5, host_in='0.0.0.0', host_out='0.0.0.0', identity='', num_parallel=1, num_part=None, parallel_backend='thread', parallel_type=<ParallelType.PUSH_NONBLOCK: 1>, port_ctrl=61112, port_in=54976, port_out=59192, py_path=None, read_only=True, route_table=False, socket_in=<SocketType.PULL_BIND: 0>, socket_out=<SocketType.PUSH_BIND: 2>, sorted_response=False, squeeze_pb=True, timeout=-1, verbose=False, yaml_path=<_io.StringIO object>)}¶
-
comp2file
= {'Encoder': 'encode', 'Frontend': 'frontend', 'Indexer': 'index', 'Preprocessor': 'preprocess', 'Router': 'route'}¶
-
class
Module contents¶
gnes.encoder package¶
Subpackages¶
Global parameters for the VGGish model.
See vggish_slim.py for more information.
Post-process embeddings from VGGish.
-
class
gnes.encoder.audio.vggish_cores.vggish_postprocess.
Postprocessor
(pca_params_npz_path)[source]¶ Bases:
object
Post-processes VGGish embeddings.
The initial release of AudioSet included 128-D VGGish embeddings for each segment of AudioSet. These released embeddings were produced by applying a PCA transformation (technically, a whitening transform is included as well) and 8-bit quantization to the raw embedding output from VGGish, in order to stay compatible with the YouTube-8M project which provides visual embeddings in the same format for a large set of YouTube videos. This class implements the same PCA (with whitening) and quantization transformations.
Constructs a postprocessor.
- Args:
- pca_params_npz_path: Path to a NumPy-format .npz file that
- contains the PCA parameters used in postprocessing.
-
postprocess
(embeddings_batch)[source]¶ Applies postprocessing to a batch of embeddings.
- Args:
- embeddings_batch: An nparray of shape [batch_size, embedding_size]
- containing output from the embedding layer of VGGish.
- Returns:
- An nparray of the same shape as the input but of type uint8, containing the PCA-transformed and quantized version of the input.
-
class
gnes.encoder.image.cvae.
CVAEEncoder
(model_dir, latent_dim=300, select_method='MEAN', l2_normalize=False, use_gpu=True, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseImageEncoder
-
batch_size
= 64¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.image.inception.
TFInceptionEncoder
(model_dir, select_layer='PreLogitsFlatten', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseImageEncoder
-
batch_size
= 64¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.image.onnx.
BaseONNXImageEncoder
(model_name, model_dir, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseImageEncoder
-
batch_size
= 64¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.numeric.hash.
HashEncoder
(num_bytes, num_bits=8, num_idx=3, kmeans_clusters=100, method='product_uniform', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseNumericEncoder
-
batch_size
= 2048¶
-
-
class
gnes.encoder.numeric.pca.
PCAEncoder
(output_dim, whiten=False, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseNumericEncoder
-
batch_size
= 2048¶
-
-
class
gnes.encoder.numeric.pca.
PCALocalEncoder
(output_dim, num_locals, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseNumericEncoder
-
batch_size
= 2048¶
-
-
class
gnes.encoder.numeric.pooling.
PoolingEncoder
(pooling_strategy='REDUCE_MEAN', backend='numpy', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseNumericEncoder
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.numeric.quantizer.
QuantizerEncoder
(dim_per_byte, cluster_per_byte=255, upper_bound=10000, lower_bound=-10000, partition_method='average', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseBinaryEncoder
-
batch_size
= 2048¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.numeric.standarder.
StandarderEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseNumericEncoder
-
batch_size
= 2048¶
-
-
class
gnes.encoder.numeric.tf_pq.
TFPQEncoder
(num_bytes, cluster_per_byte=255, *args, **kwargs)[source]¶ Bases:
gnes.encoder.numeric.pq.PQEncoder
-
batch_size
= 8192¶
-
train
(vecs, *args, **kwargs)¶
-
-
class
gnes.encoder.numeric.vlad.
VladEncoder
(num_clusters, using_faiss_pred=False, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseNumericEncoder
-
batch_size
= 2048¶
-
-
class
gnes.encoder.text.bert.
BertEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseTextEncoder
-
is_trained
= True¶
-
store_args_kwargs
= True¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.text.char.
CharEmbeddingEncoder
(dim=128, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseTextEncoder
A random character embedding model. Only useful for testing
-
is_trained
= True¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.text.flair.
FlairEncoder
(word_embedding='glove', flair_embeddings=('news-forward', 'news-backward'), pooling_strategy='mean', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseTextEncoder
-
is_trained
= True¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.video.incep_mixture.
IncepMixtureEncoder
(model_dir_inception, model_dir_mixture, select_layer='PreLogitsFlatten', feature_size=300, vocab_size=28, cluster_size=256, method='fvnet', input_size=1536, vocab_size_2=174, max_frames=30, multitask_method='Attention', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseVideoEncoder
-
batch_size
= 64¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.video.inception.
InceptionVideoEncoder
(model_dir, select_layer='PreLogitsFlatten', *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseVideoEncoder
-
batch_size
= 64¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.video.yt8m_feature_extractor.
YouTube8MFeatureExtractor
(model_dir, pca_dir, select_layer='PreLogits', ignore_audio_feature=True, *args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseVideoEncoder
Extracts YouTube8M features for RGB frames.
First time constructing this class will create directory yt8m inside your home directory, and will download inception model (85 MB) and YouTube8M PCA matrix (15 MB). If you want to use another directory, then pass it to argument model_dir of constructor.
If the model_dir exist and contains the necessary files, then files will be re-used without download.
Usage Example:
from PIL import Image import numpy
# Instantiate extractor. Slow if called first time on your machine, as it # needs to download 100 MB. extractor = YouTube8MFeatureExtractor()
image_file = os.path.join(extractor._model_dir, ‘cropped_panda.jpg’)
im = numpy.array(Image.open(image_file)) features = extractor.extract_rgb_frame_features(im)
** Note: OpenCV reverses the order of channels (i.e. orders channels as BGR instead of RGB). If you are using OpenCV, then you must do:
im = im[:, :, ::-1] # Reverses order on last (i.e. channel) dimension.then call extractor.extract_rgb_frame_features(im)
-
batch_size
= 64¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Submodules¶
-
class
gnes.encoder.base.
BaseAudioEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseEncoder
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.base.
BaseBinaryEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseEncoder
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.base.
BaseEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.base.TrainableBase
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.base.
BaseImageEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseEncoder
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.base.
BaseNumericEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseEncoder
Note that all NumericEncoder can not be used as the first encoder of the pipeline
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.base.
BaseTextEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseEncoder
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.encoder.base.
BaseVideoEncoder
(*args, **kwargs)[source]¶ Bases:
gnes.encoder.base.BaseEncoder
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Module contents¶
gnes.flow package¶
Submodules¶
-
class
gnes.flow.base.
BaseIndexFlow
(*args, **kwargs)[source]¶ Bases:
gnes.flow.Flow
BaseIndexFlow defines a common service pipeline when indexing.
It can not be directly used as all services are using the base module by default. You have to use
set()
to change the yaml_path of each service.-
train
(bytes_gen=None, **kwargs)¶ Do training on the current flow
It will start a
CLIClient
and calltrain()
.Example,
with f.build(backend='thread') as flow: flow.train(txt_file='aa.txt') flow.train(image_zip_file='aa.zip', batch_size=64) flow.train(video_zip_file='aa.zip') ...
This will call the pre-built reader to read files into an iterator of bytes and feed to the flow.
One may also build a reader/generator on your own.
Example,
def my_reader(): for _ in range(10): yield b'abcdfeg' # each yield generates a document for training with f.build(backend='thread') as flow: flow.train(bytes_gen=my_reader())
Parameters: - bytes_gen (
Optional
[Iterator
[bytes
]]) – An iterator of bytes. If not given, then you have to specify it in kwargs. - kwargs – accepts all keyword arguments of gnes client CLI
- bytes_gen (
-
-
class
gnes.flow.base.
BaseQueryFlow
(*args, **kwargs)[source]¶ Bases:
gnes.flow.Flow
BaseIndexFlow defines a common service pipeline when indexing.
It can not be directly used as all services are using the base module by default. You have to use
set()
to change the yaml_path of each service.-
train
(bytes_gen=None, **kwargs)¶ Do training on the current flow
It will start a
CLIClient
and calltrain()
.Example,
with f.build(backend='thread') as flow: flow.train(txt_file='aa.txt') flow.train(image_zip_file='aa.zip', batch_size=64) flow.train(video_zip_file='aa.zip') ...
This will call the pre-built reader to read files into an iterator of bytes and feed to the flow.
One may also build a reader/generator on your own.
Example,
def my_reader(): for _ in range(10): yield b'abcdfeg' # each yield generates a document for training with f.build(backend='thread') as flow: flow.train(bytes_gen=my_reader())
Parameters: - bytes_gen (
Optional
[Iterator
[bytes
]]) – An iterator of bytes. If not given, then you have to specify it in kwargs. - kwargs – accepts all keyword arguments of gnes client CLI
- bytes_gen (
-
-
class
gnes.flow.helper.
BuildLevel
[source]¶ Bases:
gnes.service.base.BetterEnum
An enumeration.
-
EMPTY
= 0¶
-
GRAPH
= 1¶
-
RUNTIME
= 2¶
-
-
exception
gnes.flow.helper.
FlowBuildLevelMismatch
[source]¶ Bases:
ValueError
Exception when required level is higher than the current build level
-
exception
gnes.flow.helper.
FlowIncompleteError
[source]¶ Bases:
ValueError
Exception when the flow missing some important component to run
-
exception
gnes.flow.helper.
FlowMissingNode
[source]¶ Bases:
ValueError
Exception when the topology is ambiguous
-
exception
gnes.flow.helper.
FlowTopologyError
[source]¶ Bases:
ValueError
Exception when the topology is ambiguous
Module contents¶
-
class
gnes.flow.
Flow
(with_frontend=True, is_trained=True, *args, **kwargs)[source]¶ Bases:
gnes.base.TrainableBase
GNES Flow: an intuitive way to build workflow for GNES.
You can use
add()
thenbuild()
to customize your own workflow. For example:from gnes.flow import Flow f = (Flow(check_version=False, route_table=True) .add_preprocessor(yaml_path='BasePreprocessor') .add_encoder(yaml_path='BaseEncoder') .add_router(yaml_path='BaseRouter')) with f.build(backend='thread') as flow: flow.index() ...
You can also use add(‘Encoder’, …) or add(Service.Encoder, …) to add service to the flow. The generic
add()
provides a convenient way to build the flow.As shown above, it is recommend to use flow in the context manner as showed above, as it manages all opened sockets/processes/threads automatically when exit from the context.
Note the different copy behaviors in
add()
andbuild()
:add()
always copy the flow by default, whereasbuild()
modify the flow in place. You can change this behavior by specifying th argument copy_flow=False.Create a new Flow object.
Parameters: -
Frontend
= 0¶
-
add
(service, name=None, recv_from=None, send_to=None, copy_flow=True, **kwargs)[source]¶ Add a service to the current flow object and return the new modified flow object. The attribute of the service can be later changed with
set()
or deleted withremove()
Note there are shortcut versions of this method. Recommend to use
add_encoder()
,add_preprocessor()
,add_router()
,add_indexer()
whenever possible.Parameters: - service (
Union
[Service
,str
]) – a ‘Service’ enum or string, possible choices: Encoder, Router, Preprocessor, Indexer, Frontend - name (
Optional
[str
]) – the name identifier of the service, can be used in ‘recv_from’, ‘send_to’,set()
andremove()
. - recv_from (
Union
[str
,Tuple
[str
],List
[str
],Service
,None
]) – the name of the service(s) that this service receives data from. One can also use ‘Service.Frontend’ to indicate the connection with the frontend. - send_to (
Union
[str
,Tuple
[str
],List
[str
],Service
,None
]) – the name of the service(s) that this service sends data to. One can also use ‘Service.Frontend’ to indicate the connection with the frontend. - copy_flow (
bool
) – when set to true, then always copy the current flow and do the modification on top of it then return, otherwise, do in-line modification - kwargs – other keyword-value arguments that the service CLI supports
Return type: Returns: a (new) flow object with modification
- service (
-
add_encoder
(*args, **kwargs)[source]¶ Add an encoder to the current flow, a shortcut of
add(Service.Encoder)()
Return type: Flow
-
add_frontend
(*args, **kwargs)[source]¶ Add a frontend to the current flow, a shortcut of
add(Service.Frontend)()
. Usually you dont need to call this function explicitly, a flow object contains a frontend service by default. This function is useful when you build a flow without the frontend and want to customize the frontend later.Return type: Flow
-
add_indexer
(*args, **kwargs)[source]¶ Add an indexer to the current flow, a shortcut of
add(Service.Indexer)()
Return type: Flow
-
add_preprocessor
(*args, **kwargs)[source]¶ Add a preprocessor to the current flow, a shortcut of
add(Service.Preprocessor)()
Return type: Flow
-
add_router
(*args, **kwargs)[source]¶ Add a router to the current flow, a shortcut of
add(Service.Router)()
Return type: Flow
-
build
(backend='process', copy_flow=False, *args, **kwargs)[source]¶ Build the current flow and make it ready to use
Parameters: - backend (
Optional
[str
]) – supported ‘thread’, ‘process’, ‘swarm’, ‘k8s’, ‘shell’, if None then only build graph only - copy_flow (
bool
) – return the copy of the current flow
Return type: Returns: the current flow (by default)
- backend (
-
index
(bytes_gen=None, **kwargs)[source]¶ Do indexing on the current flow
Example,
with f.build(backend='thread') as flow: flow.index(txt_file='aa.txt') flow.index(image_zip_file='aa.zip', batch_size=64) flow.index(video_zip_file='aa.zip') ...
This will call the pre-built reader to read files into an iterator of bytes and feed to the flow.
One may also build a reader/generator on your own.
Example,
def my_reader(): for _ in range(10): yield b'abcdfeg' # each yield generates a document to index with f.build(backend='thread') as flow: flow.index(bytes_gen=my_reader())
It will start a
CLIClient
and callindex()
.Parameters: - bytes_gen (
Optional
[Iterator
[bytes
]]) – An iterator of bytes. If not given, then you have to specify it in kwargs. - kwargs – accepts all keyword arguments of gnes client CLI
- bytes_gen (
-
query
(bytes_gen=None, **kwargs)[source]¶ Do indexing on the current flow
It will start a
CLIClient
and callquery()
.Example,
with f.build(backend='thread') as flow: flow.query(txt_file='aa.txt') flow.query(image_zip_file='aa.zip', batch_size=64) flow.query(video_zip_file='aa.zip') ...
This will call the pre-built reader to read files into an iterator of bytes and feed to the flow.
One may also build a reader/generator on your own.
Example,
def my_reader(): for _ in range(10): yield b'abcdfeg' # each yield generates a query for searching with f.build(backend='thread') as flow: flow.query(bytes_gen=my_reader())
Parameters: - bytes_gen (
Optional
[Iterator
[bytes
]]) – An iterator of bytes. If not given, then you have to specify it in kwargs. - kwargs – accepts all keyword arguments of gnes client CLI
- bytes_gen (
-
remove
(name=None, copy_flow=True)[source]¶ Remove a service from the flow.
Parameters: - name (
Optional
[str
]) – the name of the existing service - copy_flow (
bool
) – when set to true, then always copy the current flow and do the modification on top of it then return, otherwise, do in-line modification
Return type: Returns: a (new) flow object with modification
- name (
-
set
(name, recv_from=None, send_to=None, copy_flow=True, clear_old_attr=False, as_last_service=False, **kwargs)[source]¶ Set the attribute of an existing service (added by
add()
) in the flow. For the attributes or kwargs that aren’t given, they will remain unchanged as before.Parameters: - name (
str
) – the name of the existing service - recv_from (
Union
[str
,Tuple
[str
],List
[str
],Service
,None
]) – the name of the service(s) that this service receives data from. One can also use ‘Service.Frontend’ to indicate the connection with the frontend. - send_to (
Union
[str
,Tuple
[str
],List
[str
],Service
,None
]) – the name of the service(s) that this service sends data to. One can also use ‘Service.Frontend’ to indicate the connection with the frontend. - copy_flow (
bool
) – when set to true, then always copy the current flow and do the modification on top of it then return, otherwise, do in-line modification - clear_old_attr (
bool
) – remove old attribute value before setting the new one - as_last_service (
bool
) – whether setting the changed service as the last service in the graph - kwargs – other keyword-value arguments that the service CLI supports
Return type: Returns: a (new) flow object with modification
- name (
-
set_last_service
(name, copy_flow=True)[source]¶ Set a service as the last service in the flow, useful when modifying the flow.
Parameters: - name (
str
) – the name of the existing service - copy_flow (
bool
) – when set to true, then always copy the current flow and do the modification on top of it then return, otherwise, do in-line modification
Return type: Returns: a (new) flow object with modification
- name (
-
to_jpg
(path='flow.jpg', **kwargs)[source]¶ Rendering the current flow as a jpg image, this will call
to_mermaid()
and it needs internet connectionParameters: - path (
str
) – the file path of the image - kwargs – keyword arguments of
to_mermaid()
Return type: None
Returns: - path (
-
to_mermaid
(left_right=True)[source]¶ Output the mermaid graph for visualization
Parameters: left_right ( bool
) – render the flow in left-to-right manner, otherwise top-down manner.Return type: str
Returns: a mermaid-formatted string
-
to_python_code
(indent=4)[source]¶ Generate the python code of this flow
Parameters: indent ( int
) – the number of whitespaces of indentReturn type: str
Returns: the generated python code
-
to_swarm_yaml
(image='gnes/gnes:latest-alpine')[source]¶ Generate the docker swarm YAML compose file
Parameters: image ( str
) – the default GNES docker imageReturn type: str
Returns: the generated YAML compose file
-
to_url
(**kwargs)[source]¶ Rendering the current flow as a url points to a SVG, it needs internet connection
Parameters: kwargs – keyword arguments of to_mermaid()
Return type: str
Returns: the url points to a SVG
-
train
(bytes_gen=None, **kwargs)[source]¶ Do training on the current flow
It will start a
CLIClient
and calltrain()
.Example,
with f.build(backend='thread') as flow: flow.train(txt_file='aa.txt') flow.train(image_zip_file='aa.zip', batch_size=64) flow.train(video_zip_file='aa.zip') ...
This will call the pre-built reader to read files into an iterator of bytes and feed to the flow.
One may also build a reader/generator on your own.
Example,
def my_reader(): for _ in range(10): yield b'abcdfeg' # each yield generates a document for training with f.build(backend='thread') as flow: flow.train(bytes_gen=my_reader())
Parameters: - bytes_gen (
Optional
[Iterator
[bytes
]]) – An iterator of bytes. If not given, then you have to specify it in kwargs. - kwargs – accepts all keyword arguments of gnes client CLI
- bytes_gen (
-
gnes.indexer package¶
Subpackages¶
-
class
gnes.indexer.chunk.annoy.
AnnoyIndexer
(num_dim, data_path, metric='angular', n_trees=10, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexer
Initialize an AnnoyIndexer
Parameters: - num_dim (
int
) – when set to -1, then num_dim is auto decided on first .add() - data_path (
str
) – index data file managed by the annoy indexer - metric (
str
) – - n_trees (
int
) – - args –
- kwargs –
-
add
(keys, vectors, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,Any
]]) – list of (doc_id, offset) tuple - vectors (
ndarray
) – vector representations - weights (
List
[float
]) – weight of the chunks
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
- num_dim (
-
class
gnes.indexer.chunk.faiss.
FaissIndexer
(num_dim, index_key, data_path, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexer
Initialize an FaissIndexer
Parameters: - num_dim (
int
) – when set to -1, then num_dim is auto decided on first .add() - data_path (
str
) – index data file managed by the faiss indexer
-
add
(keys, vectors, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,Any
]]) – list of (doc_id, offset) tuple - vectors (
ndarray
) – vector representations - weights (
List
[float
]) – weight of the chunks
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
- num_dim (
-
class
gnes.indexer.chunk.helper.
DictKeyIndexer
(*args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexerHelper
-
add
(keys, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,int
]]) – list of (doc_id, offset) tuple - vectors – vector representations
- weights (
List
[float
]) – weight of the chunks
Return type: int
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.chunk.helper.
ListKeyIndexer
(*args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexerHelper
-
add
(keys, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,int
]]) – list of (doc_id, offset) tuple - vectors – vector representations
- weights (
List
[float
]) – weight of the chunks
Return type: int
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.chunk.helper.
ListNumpyKeyIndexer
(*args, **kwargs)[source]¶ Bases:
gnes.indexer.chunk.helper.ListKeyIndexer
-
add
(*args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys – list of (doc_id, offset) tuple
- vectors – vector representations
- weights – weight of the chunks
Return type: int
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.chunk.helper.
NumpyKeyIndexer
(buffer_size=10000, col_size=3, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexerHelper
-
add
(keys, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,int
]]) – list of (doc_id, offset) tuple - vectors – vector representations
- weights (
List
[float
]) – weight of the chunks
Return type: int
- keys (
-
capacity
¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.chunk.numpy.
NumpyIndexer
(is_binary=False, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexer
An exhaustive search indexer using numpy The distance is computed as L1 distance normalized by the number of dimension
-
add
(keys, vectors, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,Any
]]) – list of (doc_id, offset) tuple - vectors (
ndarray
) – vector representations - weights (
List
[float
]) – weight of the chunks
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.doc.dict.
DictIndexer
(*args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseDocIndexer
-
add
(keys, docs, *args, **kwargs)[source]¶ adding new docs and their protobuf representation
Parameters: - keys (
List
[int
]) – list of doc_id - docs (
List
[Document
]) – list of protobuf Document objects
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.doc.filesys.
DirectoryIndexer
(data_path, keep_na_doc=True, file_suffix='gif', *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseDocIndexer
-
add
(keys, docs, *args, **kwargs)[source]¶ write GIFs of each document into disk folder structure: /data_path/doc_id/0.gif, 1.gif…
Parameters: - keys (
List
[int
]) – list of doc id - docs (
List
[Document
]) – list of docs
- keys (
-
query
(keys, *args, **kwargs)[source]¶ Find the doc according to the keys
Parameters: keys ( List
[int
]) – list of doc idReturn type: List
[Document
]Returns: list of documents whose chunks field contain all the GIFs of this doc(one GIF per chunk)
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.doc.leveldb.
AsyncLVDBIndexer
(data_path, keep_na_doc=True, drop_raw_bytes=False, drop_chunk_blob=False, *args, **kwargs)[source]¶ Bases:
gnes.indexer.doc.leveldb.LVDBIndexer
-
add
(keys, docs, *args, **kwargs)[source]¶ adding new docs and their protobuf representation
Parameters: - keys (
List
[int
]) – list of doc_id - docs (
List
[Document
]) – list of protobuf Document objects
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.doc.leveldb.
LVDBIndexer
(data_path, keep_na_doc=True, drop_raw_bytes=False, drop_chunk_blob=False, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseDocIndexer
-
add
(keys, docs, *args, **kwargs)[source]¶ adding new docs and their protobuf representation
Parameters: - keys (
List
[int
]) – list of doc_id - docs (
List
[Document
]) – list of protobuf Document objects
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.doc.rocksdb.
RocksDBIndexer
(data_path, drop_raw_data=False, drop_chunk_blob=False, read_only=False, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseDocIndexer
-
add
(keys, docs, *args, **kwargs)[source]¶ adding new docs and their protobuf representation
Parameters: - keys (
List
[int
]) – list of doc_id - docs (
List
[Document
]) – list of protobuf Document objects
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Submodules¶
-
class
gnes.indexer.base.
BaseChunkIndexer
(helper_indexer=None, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseIndexer
Storing chunks and their vector representations
-
add
(keys, vectors, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,int
]]) – list of (doc_id, offset) tuple - vectors (
ndarray
) – vector representations - weights (
List
[float
]) – weight of the chunks
- keys (
-
num_chunks
¶
-
num_docs
¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.base.
BaseChunkIndexerHelper
(helper_indexer=None, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseChunkIndexer
A helper class for storing chunk info, doc mapping, weights. This is especially useful when ChunkIndexer can not store these information by itself
-
add
(keys, weights, *args, **kwargs)[source]¶ adding new chunks and their vector representations
Parameters: - keys (
List
[Tuple
[int
,int
]]) – list of (doc_id, offset) tuple - vectors – vector representations
- weights (
List
[float
]) – weight of the chunks
Return type: int
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.base.
BaseDocIndexer
(normalize_fn=None, score_fn=None, is_big_score_similar=False, *args, **kwargs)[source]¶ Bases:
gnes.indexer.base.BaseIndexer
Storing documents and contents
Base indexer, a valid indexer must implement
add()
andquery()
methods-
add
(keys, docs, *args, **kwargs)[source]¶ adding new docs and their protobuf representation
Parameters: - keys (
List
[int
]) – list of doc_id - docs (
List
[Document
]) – list of protobuf Document objects
- keys (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.base.
BaseIndexer
(normalize_fn=None, score_fn=None, is_big_score_similar=False, *args, **kwargs)[source]¶ Bases:
gnes.base.TrainableBase
Base indexer, a valid indexer must implement
add()
andquery()
methods-
num_chunks
¶
-
num_docs
¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.indexer.base.
JointIndexer
(*args, **kwargs)[source]¶ Bases:
gnes.base.CompositionalTrainableBase
-
components
¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Module contents¶
gnes.preprocessor package¶
Subpackages¶
-
gnes.preprocessor.audio.vggish_example_helper.mel_features.
frame
(data, window_length, hop_length)[source]¶ Convert array into a sequence of successive possibly overlapping frames.
An n-dimensional array of shape (num_samples, …) is converted into an (n+1)-D array of shape (num_frames, window_length, …), where each frame starts hop_length points after the preceding one.
This is accomplished using stride_tricks, so the original data is not copied. However, there is no zero-padding, so any incomplete frames at the end are not included.
- Args:
- data: np.array of dimension N >= 1. window_length: Number of samples in each frame. hop_length: Advance (in samples) between each window.
- Returns:
- (N+1)-D np.array with as many rows as there are complete frames that can be extracted.
-
gnes.preprocessor.audio.vggish_example_helper.mel_features.
hertz_to_mel
(frequencies_hertz)[source]¶ Convert frequencies to mel scale using HTK formula.
- Args:
- frequencies_hertz: Scalar or np.array of frequencies in hertz.
- Returns:
- Object of same size as frequencies_hertz containing corresponding values on the mel scale.
-
gnes.preprocessor.audio.vggish_example_helper.mel_features.
log_mel_spectrogram
(data, audio_sample_rate=8000, log_offset=0.0, window_length_secs=0.025, hop_length_secs=0.01, **kwargs)[source]¶ Convert waveform to a log magnitude mel-frequency spectrogram.
- Args:
- data: 1D np.array of waveform data. audio_sample_rate: The sampling rate of data. log_offset: Add this to values when taking log to avoid -Infs. window_length_secs: Duration of each window to analyze. hop_length_secs: Advance between successive analysis windows. **kwargs: Additional arguments to pass to spectrogram_to_mel_matrix.
- Returns:
- 2D np.array of (num_frames, num_mel_bins) consisting of log mel filterbank magnitudes for successive frames.
-
gnes.preprocessor.audio.vggish_example_helper.mel_features.
periodic_hann
(window_length)[source]¶ Calculate a “periodic” Hann window.
The classic Hann window is defined as a raised cosine that starts and ends on zero, and where every value appears twice, except the middle point for an odd-length window. Matlab calls this a “symmetric” window and np.hanning() returns it. However, for Fourier analysis, this actually represents just over one cycle of a period N-1 cosine, and thus is not compactly expressed on a length-N Fourier basis. Instead, it’s better to use a raised cosine that ends just before the final zero value - i.e. a complete cycle of a period-N cosine. Matlab calls this a “periodic” window. This routine calculates it.
- Args:
- window_length: The number of points in the returned window.
- Returns:
- A 1D np.array containing the periodic hann window.
-
gnes.preprocessor.audio.vggish_example_helper.mel_features.
spectrogram_to_mel_matrix
(num_mel_bins=20, num_spectrogram_bins=129, audio_sample_rate=8000, lower_edge_hertz=125.0, upper_edge_hertz=3800.0)[source]¶ Return a matrix that can post-multiply spectrogram rows to make mel.
Returns a np.array matrix A that can be used to post-multiply a matrix S of spectrogram values (STFT magnitudes) arranged as frames x bins to generate a “mel spectrogram” M of frames x num_mel_bins. M = S A.
The classic HTK algorithm exploits the complementarity of adjacent mel bands to multiply each FFT bin by only one mel weight, then add it, with positive and negative signs, to the two adjacent mel bands to which that bin contributes. Here, by expressing this operation as a matrix multiply, we go from num_fft multiplies per frame (plus around 2*num_fft adds) to around num_fft^2 multiplies and adds. However, because these are all presumably accomplished in a single call to np.dot(), it’s not clear which approach is faster in Python. The matrix multiplication has the attraction of being more general and flexible, and much easier to read.
- Args:
- num_mel_bins: How many bands in the resulting mel spectrum. This is
- the number of columns in the output matrix.
- num_spectrogram_bins: How many bins there are in the source spectrogram
- data, which is understood to be fft_size/2 + 1, i.e. the spectrogram only contains the nonredundant FFT bins.
- audio_sample_rate: Samples per second of the audio at the input to the
- spectrogram. We need this to figure out the actual frequencies for each spectrogram bin, which dictates how they are mapped into mel.
- lower_edge_hertz: Lower bound on the frequencies to be included in the mel
- spectrum. This corresponds to the lower edge of the lowest triangular band.
upper_edge_hertz: The desired top edge of the highest frequency band.
- Returns:
- An np.array with shape (num_spectrogram_bins, num_mel_bins).
- Raises:
- ValueError: if frequency edges are incorrectly ordered or out of range.
-
gnes.preprocessor.audio.vggish_example_helper.mel_features.
stft_magnitude
(signal, fft_length, hop_length=None, window_length=None)[source]¶ Calculate the short-time Fourier transform magnitude.
- Args:
- signal: 1D np.array of the input time-domain signal. fft_length: Size of the FFT to apply. hop_length: Advance (in samples) between each frame passed to FFT. window_length: Length of each block of samples to pass to FFT.
- Returns:
- 2D np.array where each row contains the magnitudes of the fft_length/2+1 unique values of the FFT for the corresponding frame of input samples.
-
class
gnes.preprocessor.audio.vggish_example.
VggishPreprocessor
(num_frames=96, num_bands=64, sample_rate=16000, log_offset=0.01, example_window_seconds=0.96, example_hop_seconds=0.96, stft_window_length_seconds=0.025, stft_hop_length_seconds=0.01, mel_min_hz=125, mel_max_hz=7500, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseAudioPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
waveform_to_examples
(data, sample_rate)[source]¶ Converts audio waveform into an array of examples for VGGish.
- Args:
- data: np.array of either one dimension (mono) or two dimensions
- (multi-channel, with the outer dimension representing channels). Each sample is generally expected to lie in the range [-1.0, +1.0], although this is not required.
sample_rate: Sample rate of data.
- Returns:
- 3-D np.array of shape [num_examples, num_frames, num_bands] which represents a sequence of examples, each of which contains a patch of log mel spectrogram, covering num_frames frames of audio and num_bands mel frequency bands, where the frame length is vggish_params.STFT_HOP_LENGTH_SECONDS.
-
-
class
gnes.preprocessor.image.resize.
ResizeChunkPreprocessor
(target_width=224, target_height=224, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.image.resize.SizedPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.image.resize.
SizedPreprocessor
(target_width=224, target_height=224, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseImagePreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.image.segmentation.
SegmentPreprocessor
(model_name, model_dir, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.image.resize.SizedPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
gnes.preprocessor.io_utils.ffmpeg.
compile_args
(input_fn='pipe:', output_fn='pipe:', video_filters=[], audio_filters=[], input_options={}, output_options={}, overwrite_output=True)[source]¶ Wrapper for various FFmpeg related applications (ffmpeg, ffprobe).
-
gnes.preprocessor.io_utils.helper.
run_command
(cmd_args, input=None, pipe_stdin=True, pipe_stdout=False, pipe_stderr=False, quiet=False)[source]¶
-
gnes.preprocessor.io_utils.video.
capture_frames
(input_fn='pipe:', input_data=None, pix_fmt='rgb24', fps=-1, scale=None, start_time=None, end_time=None, vframes=-1, **kwargs)[source]¶ Return type: List
[ndarray
]
-
class
gnes.preprocessor.text.split.
SentSplitPreprocessor
(min_sent_len=1, max_sent_len=256, deliminator='.!?。!?', is_json=False, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseTextPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.video.ffmpeg.
FFmpegPreprocessor
(frame_size='192:168', frame_rate=10, frame_num=-1, duplicate_rm=True, use_phash_weight=False, phash_thresh=5, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseVideoPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.video.ffmpeg.
FFmpegVideoSegmentor
(frame_size='192:168', frame_rate=10, frame_num=-1, segment_method='cut_by_frame', segment_interval=-1, segment_num=3, max_frames_per_doc=-1, use_image_input=False, splitter='__split__', *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseVideoPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.video.ffmpeg.
GifChunkPreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.RawChunkPreprocessor
,gnes.preprocessor.base.BaseVideoPreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.video.shot_detector.
ShotDetectorPreprocessor
(descriptor='block_hsv_histogram', distance_metric='bhattacharya', detect_method='threshold', frame_size=None, frame_rate=10, vframes=-1, sframes=-1, drop_raw_data=False, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseVideoPreprocessor
-
store_args_kwargs
= True¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.video.video_decoder.
VideoDecoderPreprocessor
(frame_rate=10, frame_size=None, vframes=-1, drop_raw_data=False, chunk_spliter=None, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BaseVideoPreprocessor
-
store_args_kwargs
= True¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Submodules¶
-
class
gnes.preprocessor.base.
BaseAudioPreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BasePreprocessor
-
doc_type
= 4¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.base.
BaseImagePreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BasePreprocessor
-
doc_type
= 2¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.base.
BasePreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.base.TrainableBase
-
doc_type
= 0¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.base.
BaseTextPreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BasePreprocessor
-
doc_type
= 1¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.base.
BaseVideoPreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BasePreprocessor
-
doc_type
= 3¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.preprocessor.base.
RawChunkPreprocessor
(uniform_doc_weight=True, *args, **kwargs)[source]¶ Bases:
gnes.preprocessor.base.BasePreprocessor
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
gnes.preprocessor.helper.
block_descriptor
(image, descriptor_fn, num_blocks=3)[source]¶ Return type: ndarray
-
gnes.preprocessor.helper.
check_motion
(prev_dists, cur_dist, motion_threshold=0.75)[source]¶ Returns a boolean value to decide if the peak is due to a motion
-
gnes.preprocessor.helper.
compare_descriptor
(descriptor1, descriptor2, metric='chisqr')[source]¶ Return type: float
-
gnes.preprocessor.helper.
compute_descriptor
(image, method='rgb_histogram', **kwargs)[source]¶ Return type: <built-in function array>
-
gnes.preprocessor.helper.
detect_peak_boundary
(distances, method='kmeans', **kwargs)[source]¶ Return type: List
[int
]
-
gnes.preprocessor.helper.
get_audio
(buffer_data, sample_rate, interval, duration)[source]¶ Return type: List
[ndarray
]
Module contents¶
gnes.proto package¶
Submodules¶
Module contents¶
-
class
gnes.proto.
RequestGenerator
[source]¶ Bases:
object
-
gnes.proto.
send_message
(sock, msg, timeout=-1, squeeze_pb=False, **kwargs)[source]¶ Return type: None
gnes.router package¶
Submodules¶
-
class
gnes.router.base.
BaseEmbedReduceRouter
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseReduceRouter
-
apply
(msg, accum_msgs, *args, **kwargs)[source]¶ reduce embeddings from encoders (means, concat ….)
Parameters: - msg (
Message
) – the current message - accum_msgs (
List
[Message
]) – accumulated messages
Return type: None
- msg (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.base.
BaseMapRouter
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseRouter
-
apply
(msg, *args, **kwargs)[source]¶ Modify the incoming message
Parameters: msg ( Message
) – incoming messageReturn type: Generator
[+T_co, -T_contra, +V_co]
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.base.
BaseReduceRouter
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseRouter
-
apply
(msg, accum_msgs, *args, **kwargs)[source]¶ Modify the current message based on accumulated messages
Parameters: - msg (
Message
) – the current message - accum_msgs (
List
[Message
]) – accumulated messages
Return type: None
- msg (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.base.
BaseRouter
(*args, **kwargs)[source]¶ Bases:
gnes.base.TrainableBase
Base class for the router. Inherit from this class to create a new router.
Router forwards messages between services. Essentially, it receives a ‘gnes_pb2.Message’ and call apply() method on it.
-
apply
(msg, *args, **kwargs)[source]¶ Modify the incoming message
Parameters: msg ( Message
) – incoming message
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.base.
BaseTopkReduceRouter
(reduce_op='sum', *args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseReduceRouter
-
apply
(msg, accum_msgs, *args, **kwargs)[source]¶ Modify the current message based on accumulated messages
Parameters: - msg (
Message
) – the current message - accum_msgs (
List
[Message
]) – accumulated messages
- msg (
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.map.
BlockRouter
(sleep_sec=5, *args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseMapRouter
Wait for ‘sleep_sec’ seconds and forward messages, useful for benchmark
-
apply
(msg, *args, **kwargs)[source]¶ Modify the incoming message
Parameters: msg ( Message
) – incoming message
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.map.
DocBatchRouter
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseMapRouter
-
apply
(msg, *args, **kwargs)[source]¶ Modify the incoming message
Parameters: msg ( Message
) – incoming messageReturn type: Generator
[+T_co, -T_contra, +V_co]
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.map.
PublishRouter
(num_part, *args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseMapRouter
Copy a message ‘num_part’ time and forward it, useful for PUB-SUB sockets. ‘num_part’ is an indicator for downstream sync-barrier, e.g. a ReduceRouter
-
apply
(msg, *args, **kwargs)[source]¶ Modify the incoming message
Parameters: msg ( Message
) – incoming messageReturn type: Generator
[+T_co, -T_contra, +V_co]
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.reduce.
AvgEmbedRouter
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseEmbedReduceRouter
Gather all embeddings from multiple encoders and do average on a specific axis. In default, average will happen on the first axis. chunk_idx, doc_idx denote index in for loop used in BaseEmbedReduceRouter
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.reduce.
Chunk2DocTopkReducer
(reduce_op='sum', *args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseTopkReduceRouter
Gather all chunks by their doc_id, result in a topk doc list. This is almost always useful, as the final result should be group by doc_id not chunk
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.reduce.
ChunkTopkReducer
(reduce_op='sum', *args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseTopkReduceRouter
Gather all chunks by their chunk_id from all shards, aka doc_id-offset, result in a topk chunk list
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.reduce.
ConcatEmbedRouter
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseEmbedReduceRouter
Gather all embeddings from multiple encoders and concat them on a specific axis. In default, concat will happen on the last axis. chunk_idx, doc_idx denote index in for loop used in BaseEmbedReduceRouter
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.reduce.
DocFillReducer
(*args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseReduceRouter
Gather all documents raw content from multiple shards. This is only useful when you have - multiple doc-indexer and docs are spreaded over multiple shards. - require full-doc retrieval with the original content, not just an doc id Ideally, only each doc can only belong to one shard.
-
apply
(msg, accum_msgs, *args, **kwargs)[source]¶ Modify the current message based on accumulated messages
Parameters: - msg (gnes_pb2.Message) – the current message
- accum_msgs (
List
[Message')
]) – accumulated messages
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.router.reduce.
DocTopkReducer
(reduce_op='sum', *args, **kwargs)[source]¶ Bases:
gnes.router.base.BaseTopkReduceRouter
Gather all docs by their doc_id, result in a topk doc list
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Module contents¶
gnes.score_fn package¶
Submodules¶
-
class
gnes.score_fn.base.
BaseScoreFn
(context=None, *args, **kwargs)[source]¶ Bases:
gnes.base.TrainableBase
Base score function. A score function must implement __call__ method
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
warn_unnamed
= False¶
-
-
class
gnes.score_fn.base.
CombinedScoreFn
(score_mode='multiply', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.BaseScoreFn
Combine multiple scores into one score, defaults to ‘multiply’
Parameters: score_mode ( str
) – specifies how the computed scores are combined-
supported_ops
¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.base.
ModifierScoreFn
(modifier='none', factor=1.0, factor_name='GivenConstant', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.BaseScoreFn
Modifier to apply to the value score = modifier(factor * value)
-
supported_ops
¶
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.base.
ScoreOps
[source]¶ Bases:
object
-
abs
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
avg
= <gnes.score_fn.base.CombinedScoreFn object>¶
-
ln
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
ln1p
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
ln2p
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
log
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
log1p
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
log2p
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
max
= <gnes.score_fn.base.CombinedScoreFn object>¶
-
min
= <gnes.score_fn.base.CombinedScoreFn object>¶
-
multiply
= <gnes.score_fn.base.CombinedScoreFn object>¶
-
none
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
reciprocal
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
reciprocal1p
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
sqrt
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
square
= <gnes.score_fn.base.ModifierScoreFn object>¶
-
sum
= <gnes.score_fn.base.CombinedScoreFn object>¶
-
-
class
gnes.score_fn.chunk.
BM25ChunkScoreFn
(threshold=0.8, *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
- score = relevance * idf(q_chunk) * tf(q_chunk) * (k1 + 1) / (tf(q_chunk) +
- k1 * (1 - b + b * (chunk_in_doc / avg_chunk_in_doc)))
- in bm25 algorithm:
- idf(q_chunk) = log(1 + (doc_count - f(q_chunk) +0.5) / (f(q_chunk) + 0.5)),
where f(q_chunk) is number of docs that contains q_chunk. In our system, this denotes number of docs appearing in query results.
In elastic search, b = 0.75, k1 = 1.2
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
class
gnes.score_fn.chunk.
CoordChunkScoreFn
(score_mode='multiply', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
score = relevance * query_coordination query_coordination: #chunks return / #chunks in this doc(query doc)
Parameters: score_mode ( str
) – specifies how the computed scores are combined-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.chunk.
TFIDFChunkScoreFn
(threshold=0.8, *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
score = relevance * tf(q_chunk) * (idf(q_chunk)**2) tf(q_chunk) is calculated based on the relevance of query result. tf(q_chunk) = number of queried chunks where relevance >= threshold idf(q_chunk) = log(total_chunks / tf(q_chunk) + 1)
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.chunk.
WeightedChunkOffsetScoreFn
(score_mode='multiply', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
score = d_chunk.weight * relevance * offset_divergence * q_chunk.weight offset_divergence is calculated based on doc_type:
TEXT && VIDEO && AUDIO: offset is 1-D IMAGE: offset is 2-DParameters: score_mode ( str
) – specifies how the computed scores are combined-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.chunk.
WeightedChunkScoreFn
(score_mode='multiply', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
score = d_chunk.weight * relevance * q_chunk.weight
Parameters: score_mode ( str
) – specifies how the computed scores are combined-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.doc.
CoordDocScoreFn
(score_mode='multiply', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
score = score * query_coordination query_coordination: #chunks recalled / #chunks in this doc
Parameters: score_mode ( str
) – specifies how the computed scores are combined-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.doc.
WeightedDocScoreFn
(score_mode='multiply', *args, **kwargs)[source]¶ Bases:
gnes.score_fn.base.CombinedScoreFn
Parameters: score_mode ( str
) – specifies how the computed scores are combined-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.normalize.
Normalizer1
[source]¶ Bases:
gnes.score_fn.base.ModifierScoreFn
Do normalizing: score = 1 / (1 + sqrt(score))
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.normalize.
Normalizer2
(num_dim)[source]¶ Bases:
gnes.score_fn.base.ModifierScoreFn
Do normalizing: score = 1 / (1 + score / num_dim)
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.normalize.
Normalizer3
(num_dim)[source]¶ Bases:
gnes.score_fn.normalize.Normalizer2
Do normalizing: score = 1 / (1 + sqrt(score) / num_dim)
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.normalize.
Normalizer4
(num_bytes)[source]¶ Bases:
gnes.score_fn.base.ModifierScoreFn
Do normalizing: score = 1 - score / num_bytes
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
-
class
gnes.score_fn.normalize.
Normalizer5
[source]¶ Bases:
gnes.score_fn.base.ModifierScoreFn
Do normalizing: score = 1 / (1 + sqrt(abs(score)))
-
train
(*args, **kwargs)¶ Train the model, need to be overrided
-
Module contents¶
gnes.service package¶
Submodules¶
-
class
gnes.service.base.
BaseService
(args)[source]¶ Bases:
object
-
default_host
= '0.0.0.0'¶
-
handler
= <gnes.service.base.MessageHandler object>¶
-
status
¶
-
-
class
gnes.service.base.
MessageHandler
(mh=None)[source]¶ Bases:
object
-
class
gnes.service.base.
ParallelType
[source]¶ Bases:
gnes.service.base.BetterEnum
An enumeration.
-
PUB_BLOCK
= 2¶
-
PUB_NONBLOCK
= 3¶
-
PUSH_BLOCK
= 0¶
-
PUSH_NONBLOCK
= 1¶
-
is_block
¶
-
is_push
¶
-
-
class
gnes.service.base.
ReduceOp
[source]¶ Bases:
gnes.service.base.BetterEnum
An enumeration.
-
ALWAYS_ONE
= 1¶
-
CONCAT
= 0¶
-
-
class
gnes.service.base.
SocketType
[source]¶ Bases:
gnes.service.base.BetterEnum
An enumeration.
-
PAIR_BIND
= 8¶
-
PAIR_CONNECT
= 9¶
-
PUB_BIND
= 6¶
-
PUB_CONNECT
= 7¶
-
PULL_BIND
= 0¶
-
PULL_CONNECT
= 1¶
-
PUSH_BIND
= 2¶
-
PUSH_CONNECT
= 3¶
-
SUB_BIND
= 4¶
-
SUB_CONNECT
= 5¶
-
is_bind
¶
-
paired
¶
-
Module contents¶
Submodules¶
gnes.component module¶
gnes.helper module¶
-
gnes.helper.
batching
(func=None, *, batch_size=None, num_batch=None, iter_axis=0, concat_axis=0, chunk_dim=-1)[source]¶
-
gnes.helper.
profiling
(func)¶
-
class
gnes.helper.
FileLock
(lock_file='LOCK')[source]¶ Bases:
object
Implements the Posix based file locking (Linux, Ubuntu, MacOS, etc.)
-
is_locked
¶
-
-
gnes.helper.
progressbar
(i, prefix='', suffix='', count=100, size=60)[source]¶ Example:
- for i in range(10000):
- progressbar(i, prefix=”computing: “, count=100, size=60)
- The resulted output is:
- computing: [###########################################################.] 99/100 computing: [###########################################################.] 199/200 computing: [###########################################################.] 299/300 computing: [###########################################################.] 399/400 computing: [###########################################################.] 499/500 computing: [###########################################################.] 599/600 computing: [###########################################################.] 699/700 computing: [###########################################################.] 799/800 computing: [###########################################################.] 899/900 computing: [#############################………………………….] 950/1000
gnes.uuid module¶
Module contents¶
Troubleshooting¶
Check if docker swarm/stack runs successfully¶
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
j7b533zxmzg5 gnes-swarm-2654_encoder replicated 0/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:master
0vlxu4acg1ph gnes-swarm-2654_income-proxy replicated 0/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:master *:4962->4962/tcp
equqrhsn7pky gnes-swarm-2654_indexer replicated 0/3 ccr.ccs.tencentyun.com/gnes/aipd-gnes:master
nd7euo7mcpa9 gnes-swarm-2654_middleman-proxy replicated 0/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:master
ssdlk9gzmggw gnes-swarm-2654_outgoing-proxy replicated 0/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:master *:4963->4963/tcp
xgxeetyhos6t my-gnes_encoder replicated 1/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:a799a0f
zny37400p225 my-gnes_income-proxy replicated 1/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:a799a0f *:8598->8598/tcp
taqqg6qwrxlw my-gnes_indexer replicated 3/3 ccr.ccs.tencentyun.com/gnes/aipd-gnes:a799a0f
j96gnny8ysbn my-gnes_middleman-proxy replicated 1/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:a799a0f
e28spnuksjw8 my-gnes_outgoing-proxy replicated 1/1 ccr.ccs.tencentyun.com/gnes/aipd-gnes:a799a0f *:8599->8599/tcp
In the above example, we started two swarms, i.e. gnes-swarm-2654
and my-gnes
. Unfortunately, gnes-swarm-2654
fails to start and is not running at all. But how can one tell that?
Note the column REPLICAS
, which indicates the number of running service (versus the number of required services). gnes-swarm-2654
gives 0/0
for all services. This suggests the swarm fails to start. The next step is to investigate the reason.
Investigate the reason of a failed service¶
One can not print out all logs of a docker swarm. Instead, one can inspect service by service, e.g.
docker service ps gnes-swarm-2654_encoder --format "{{json .Error}}" --no-trunc
"\"invalid mount config for type \"bind\": bind source path does not exist: /data/han/test-shell/output_data\""
"\"invalid mount config for type \"bind\": bind source path does not exist: /data/han/test-shell/output_data\""
"\"invalid mount config for type \"bind\": bind source path does not exist: /data/han/test-shell/output_data\""
"\"invalid mount config for type \"bind\": bind source path does not exist: /data/han/test-shell/output_data\""
Now the reason is clear, output_data
does not exist when starting the swarm. But why there are duplicated lines there? This is because docker swarm did three retries before giving up on starting this service, where each time it met the same problem. Thus four duplicated lines in total.
Delete a failed service¶
Now that the reason is clear, we can delete the failed service and release the resources.
docker stack rm gnes-swarm-2654
Removing service gnes-swarm-2654_encoder
Removing service gnes-swarm-2654_income-proxy
Removing service gnes-swarm-2654_indexer
Removing service gnes-swarm-2654_middleman-proxy
Removing service gnes-swarm-2654_outgoing-proxy
Removing network gnes-swarm-2654_gnes-net
Locate internal errors by looking at logs¶
Sometime the service fails to start but docker service ps
gives no error,
docker service ps gnes-swarm-4254_encoder --format "{{json .Error}}" --no-trunc
""
Or it shows an error that is not explanatory.
"\"task: non-zero exit (2)\""
Often in this case, the service fails to start not due tothe docker config, but due to the GNES internal error. To see that,
docker service logs gnes-swarm-4254_income-proxy
gnes-swarm-4254_income-proxy.1.yj5v8n4dhfgv@VM-0-3-ubuntu | [--proxy_type {BS,Dict,MapProxyService,Message,MessageHandler,ProxyService,ReduceProxyService,defaultdict}]
gnes-swarm-4254_income-proxy.1.yj5v8n4dhfgv@VM-0-3-ubuntu | [--batch_size BATCH_SIZE] [--num_part NUM_PART]
gnes-swarm-4254_income-proxy.1.kmgk21qo6m0n@VM-0-3-ubuntu | [--proxy_type {BS,Dict,MapProxyService,Message,MessageHandler,ProxyService,ReduceProxyService,defaultdict}]
gnes-swarm-4254_income-proxy.1.w04d552cuj93@VM-0-3-ubuntu | gnes proxy: error: argument --batch_size: invalid int value: ''
gnes-swarm-4254_income-proxy.1.kmgk21qo6m0n@VM-0-3-ubuntu | [--batch_size BATCH_SIZE] [--num_part NUM_PART]
One can now clearly see that the error comes from an incorrectly given --batch_size
, which throws from GNES CLI.
Protobuf Implementation¶
The file gnes/proto/gnes.proto
defines the protobuf used in GNES. It is the core message protocol used in communicating between services. It also defines the interface of a gGRPC service.
gnes_pb2.py
and gnes_pb2_grpc.py
are python interfaces automatically generated by protobuf tools.
For developers who want to change the protobuf definition, one needs to first edit gnes/proto/gnes.proto
and then regenerate the python codes (i.e. gnes_pb2.py
and gnes_pb2_grpc.py
).
Generating gnes_pb2.py
and gnes_pb2_grpc.py
¶
Take MacOS as an example,
- Download
protoc-$VERSION-$PLATFORM.zip
from the official site and decompress it. - Copy the binary file and include to your system path:
cp ~/Downloads/protoc-3.7.1-osx-x86_64/bin/protoc /usr/local/bin/
cp -r ~/Downloads/protoc-3.7.1-osx-x86_64/include/* /usr/local/include/
- Install gRPC tools dependencies:
brew install automake autoconf libtool
- Install gRPC and
grpc_python_plugin
from the source:
git clone https://github.com/grpc/grpc.git
git submodule update --init
make grpc_python_plugin
- This will compile the grpc-python-plugin and build it to, e.g.,
/Documents/grpc/bins/opt/grpc_python_plugin
- Generate the python codes:
SRC_DIR=gnes/proto/
PLUGIN_PATH=/Documents/grpc/bins/opt/grpc_python_plugin
protoc -I $SRC_DIR --python_out=$SRC_DIR --grpc_python_out=$SRC_DIR --plugin=protoc-gen-grpc_python=${PLUGIN_PATH} ${SRC_DIR}gnes.proto
- Fixing the import in
gnes_pb2_grpc.py
. For some reason (probably a bug of gRPC?), the generated code ofimport
is not correct ingnes_pb2_grpc.py
, you have to change it to the following:
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
import grpc
from . import gnes_pb2 as gnes__pb2
Environment Variables¶
There are couple of environment variables that GNES respect during runtime.
GNES_PROFILING
¶
Set to any non-empty string to turn on service-level time profiling for GNES.
Default is disabled.
GNES_PROFILING_MEM
¶
Set to any non-empty string to turn on service-level memory profiling for GNES. Warning, memory profiling could hurt the efficiency significantly.
Default is disabled.
GNES_WARN_UNNAMED_COMPONENT
¶
Set to 0
to turn off the warning like this object is not named ("name" is not found under "gnes_config" in YAML config), i will call it "BaseRouter-51ce94cc". naming the object is important as it provides an unique identifier when serializing/deserializing this object.
Set to 1
to enable it.
Default is enabled.
GNES_VCS_VERSION
¶
Git version of GNES. This is used when --check_version
is turned on. For GNES official docker image, GNES_VCS_VERSION
is automatically set to the git version during the building procedure.
Default is the git head version when building docker image. Otherwise it is not set.
GNES_CONTROL_PORT
¶
Control port of the microservice. Useful when doing health check via gnes healthcheck
.
Default is not set. A random port will be used.
GNES_CONTRIB_MODULE
¶
(depreciated) Paths of the third party components. See examples in GNES hub for latest usage.
GNES_IPC_SOCK_TMP
¶
Temp directory for ipc sockets, not used on Windows.
Using GNES with Docker Swarm¶
Build your first GNES app on local machine¶
Let’s start with a typical indexing procedure by writing a YAML config (see the left column of the table):
YAML config | GNES workflow (generated by GNES board) |
---|---|
port: 5566
services:
- name: Preprocessor
yaml_path: text-prep.yml
- name: Encoder
yaml_path: gpt2.yml
- name: Indexer
yaml_path: b-indexer.yml
|
|
Now let’s see what the YAML config says. First impression, it is pretty intuitive. It defines a pipeline workflow consists of preprocessing, encoding and indexing, where the output of the former component is the input of the next. This pipeline is a typical workflow of index or query runtime. Under each component, we also associate it with a YAML config specifying how it should work. Right now they are not important for understanding the big picture, nonetheless curious readers can checkout how each YAML looks like by expanding the items below.
Preprocessor config: text-prep.yml (click to expand...)
!SentSplitPreprocessor
parameters:
start_doc_id: 0
random_doc_id: True
deliminator: "[.!?]+"
gnes_config:
is_trained: true
Encoder config: gpt2.yml (click to expand...)
!PipelineEncoder
components:
- !GPT2Encoder
parameters:
model_dir: $GPT2_CI_MODEL
pooling_stragy: REDUCE_MEAN
gnes_config:
is_trained: true
- !PCALocalEncoder
parameters:
output_dim: 32
num_locals: 8
gnes_config:
batch_size: 2048
- !PQEncoder
parameters:
cluster_per_byte: 8
num_bytes: 8
gnes_config:
work_dir: ./
name: gpt2bin-pipe
Indexer config: b-indexer.yml (click to expand...)
!BIndexer
parameters:
num_bytes: 8
data_path: /out_data/idx.binary
gnes_config:
work_dir: ./
name: bindexer
On the right side of the above table, you can see how the actual data flow looks like. There is an additional component gRPCFrontend
automatically added to the workflow, it allows you to feed the data and fetch the result via gRPC protocol through port 5566
.
Now it’s time to run! GNES board can automatically generate a starting script/config based on the YAML config you give, saving troubles of writing them on your own.
💡 You can also start a GNES board locally. Simply rundocker run -d -p 0.0.0.0:80:8080/tcp gnes/gnes compose --serve
As a cloud-native application, GNES requires an orchestration engine to coordinate all micro-services. We support Kubernetes, Docker Swarm and shell-based multi-process. Let’s see what the generated script looks like in this case.
Shell-based starting script (click to expand...)
#!/usr/bin/env bash
set -e
trap 'kill $(jobs -p)' EXIT
printf "starting service gRPCFrontend with 0 replicas...\n"
gnes frontend --grpc_port 5566 --port_out 49668 --socket_out PUSH_BIND --port_in 60654 --socket_in PULL_CONNECT &
printf "starting service Preprocessor with 0 replicas...\n"
gnes preprocess --yaml_path text-prep.yml --port_in 49668 --socket_in PULL_CONNECT --port_out 61911 --socket_out PUSH_BIND &
printf "starting service Encoder with 0 replicas...\n"
gnes encode --yaml_path gpt2.yml --port_in 61911 --socket_in PULL_CONNECT --port_out 49947 --socket_out PUSH_BIND &
printf "starting service Indexer with 0 replicas...\n"
gnes index --yaml_path b-indexer.yml --port_in 49947 --socket_in PULL_CONNECT --port_out 60654 --socket_out PUSH_BIND &
wait
DockerSwarm compose file (click to expand...)
version: '3.4'
services:
gRPCFrontend00:
image: gnes/gnes-full:latest
command: frontend --grpc_port 5566 --port_out 49668 --socket_out PUSH_BIND --port_in
60654 --socket_in PULL_CONNECT --host_in Indexer30
ports:
- 5566:5566
Preprocessor10:
image: gnes/gnes-full:latest
command: preprocess --port_in 49668 --socket_in PULL_CONNECT
--port_out 61911 --socket_out PUSH_BIND --yaml_path /Preprocessor10_yaml --host_in
gRPCFrontend00
configs:
- Preprocessor10_yaml
Encoder20:
image: gnes/gnes-full:latest
command: encode --port_in 61911 --socket_in PULL_CONNECT
--port_out 49947 --socket_out PUSH_BIND --yaml_path /Encoder20_yaml --host_in
Preprocessor10
configs:
- Encoder20_yaml
Indexer30:
image: gnes/gnes-full:latest
command: index --port_in 49947 --socket_in PULL_CONNECT
--port_out 60654 --socket_out PUSH_BIND --yaml_path /Indexer30_yaml --host_in
Encoder20
configs:
- Indexer30_yaml
volumes: {}
networks:
gnes-net:
driver: overlay
attachable: true
configs:
Preprocessor10_yaml:
file: text-prep.yml
Encoder20_yaml:
file: gpt2.yml
Indexer30_yaml:
file: b-indexer.yml
For the sake of simplicity, we will just use the generated shell-script to start GNES. Create a new file say run.sh
, copy the content to it and run it via $ bash ./run.sh
. You should see the output as follows:
This suggests the GNES app is ready and waiting for the incoming data. You may now feed data to it through the gRPCFrontend
. Depending on your language (Python, C, Java, Go, HTTP, Shell, etc.) and the content form (image, video, text, etc), the data feeding part can be slightly different.
To stop a running GNES, you can simply do control + c.
Scale your GNES app to the cloud¶
Now let’s juice it up a bit. To be honest, building a single-machine process-based pipeline is not impressive anyway. The true power of GNES is that you can scale any component at any time you want. Encoding is slow? Adding more machines. Preprocessing takes too long? More machines. Index file is too large? Adding shards, aka. more machines!
In this example, we compose a more complicated GNES workflow for images. This workflow consists of multiple preprocessors, encoders and two types of indexers. In particular, we introduce two types of indexers: one for storing the encoded binary vectors, the other for storing the original images, i.e. full-text index. These two types of indexers work in parallel. Check out the YAML file on the left side of table for more details, note how replicas
is defined for each component.
YAML config | GNES workflow (generated by GNES board) |
---|---|
port: 5566
services:
- name: Preprocessor
replicas: 2
yaml_path: image-prep.yml
- name: Encoder
replicas: 3
yaml_path: incep-v3.yml
- - name: Indexer
yaml_path: faiss.yml
replicas: 4
- name: Indexer
yaml_path: fulltext.yml
replicas: 3
|
|
You may realize that besides the gRPCFrontend
, multiple Router
have been added to the workflow. Routers serve as a message broker between microservices, determining how and where the message is received and sent. In the last pipeline example, the data flow is too simple so there is no need for adding any router. In this example routers are necessary for connecting multiple preprocessors and encoders, otherwise preprocessors wouldn’t know where to send the message. GNES Board automatically adds router to the workflow when necessary based on the type of two consecutive layers. It may also add stacked routers, as you can see between encoder and indexer in the right graph.
Again, the detailed YAML config of each component is not important for understanding the big picture, hence we omit it for now.
This time we will run GNES via DockerSwarm. To do that simply copy the generated DockerSwarm YAML config to a file say my-gnes.yml
, and then do
docker stack deploy --compose-file my-gnes.yml gnes-531
Note that gnes-531
is your GNES stack name, keep that name in mind. If you forget about that name, you can always use docker stack ls
to find out. To tell whether the whole stack is running successfully or not, you can use docker service ls -f name=gnes-531
. The number of replicas 1/1
or 4/4
suggests everything is fine.
Generally, a complete and successful Docker Swarm starting process should look like the following:
When the GNES stack is ready and waiting for the incoming data, you may now feed data to it through the gRPCFrontend
. Depending on your language (Python, C, Java, Go, HTTP, Shell, etc.) and the content form (image, video, text, etc), the data feeding part can be slightly different.
To stop a running GNES stack, you can use docker stack rm gnes-531
.
Customize GNES to your need¶
With the help of GNES Board, you can easily compose a GNES app for different purposes. The table below summarizes some common compositions with the corresponding workflow visualizations. Note, we hide the component-wise YAML config (i.e. yaml_path
) for the sake of clarity.
YAML config | GNES workflow (generated by GNES board) |
---|---|
Parallel preprocessing only
port: 5566
services:
- name: Preprocessor
replicas: 2
|
|
Training an encoder
port: 5566
services:
- name: Preprocessor
replicas: 3
- name: Encoder
|
|
Index-time with 3 vector-index shards
port: 5566
services:
- name: Preprocessor
- name: Encoder
- name: Indexer
replicas: 3
|
|
Query-time with 2 vector-index shards followed by 3 full-text-index shards
port: 5566
services:
- name: Preprocessor
- name: Encoder
- name: Indexer
income: sub
replicas: 2
- name: Indexer
income: sub
replicas: 3
|
|
Contributing to GNES¶
🙇 Thanks for your interest in contributing! GNES always welcome the contribution from the open-source community, individual committers and other partners. Without you, GNES can’t be successful.
❤️ Making Your First Commit¶
The beginning is always the hardest. But fear not, even if you find a typo, a missing docstring or unit test, you can simply correct them by making a commit to GNES. Here are the steps:
- Create a new branch, say
fix-gnes-typo-1
- Fix/improve the codebase
- Commit the changes. Note the commit message must follow the naming style, say
fix(readme): improve the readability and move sections
- Make a pull request. Note the commit message must follow the naming style. It can simply be one of your commit messages, just copy paste it, e.g.
fix(readme): improve the readability and move sections
- Submit your pull request and wait for all checks passed (usually 10 minutes)
- Coding style
- Commit and PR styles check
- All unit tests
- Request reviews from one of the developers from our core team.
- Get a LGTM 👍 and PR gets merged.
Well done! Once a PR gets merged, here are the things happened next:
- all Docker images tagged with
-latest
will be automatically updated in an hour. You may check the its building status at here - on every Friday when a new release is published, PyPi packages and all Docker images tagged with
-stable
will be updated accordindly. - your contribution and commits will be included in our weekly release note. 🍻
Table of Content¶
Commit Message Naming¶
To help everyone with understanding the commit history of GNES, we employ commitlint
in the CI pipeline to enforce the commit styles. Specifically, our convention is:
type(scope?): subject
where type
is one of the following:
- build
- ci
- chore
- docs
- feat
- fix
- perf
- refactor
- revert
- style
- test
scope
is optional, represents the module your commit working on.
subject
explains the commit.
As an example, a commit that implements a new encoder should be phrased as:
feat(encoder): add new inceptionV3 as image encoder
Merging Process¶
A pull request has to meet the following conditions to be merged into master:
- Coding style check (PEP8, via Codacy)
- Commit style check (in CI pipeline via Drone.io)
- Unit tests (via Drone.io)
- Review and approval from a GNES team member.
After the merging is triggered, the build will be delivered to the followings:
- Docker Hub:
gnes:latest
will be updated. - Tencent Container Service:
gnes:latest
will be updated. - ReadTheDoc:
latest
will be updated. - Benchmark: speed test will be updated.
Note that merging into master does not mean an official releasing. For the releasing process, please refer to the next section.
Release Process¶
A new release is scheduled on every Friday (triggered and approved by Han Xiao) summarizing all new commits since the last release. The release will increment the third (revision) part of the version number, i.e. from 0.0.24
to 0.0.25
.
After a release is triggered, the build will be delivered to the followings:
- Docker Hub: a new image with the release version tag will be created,
gnes:latest
will be updated. - Tencent Container Service: a new image with the release version tag will be created,
gnes:latest
will be updated. - PyPi Package: a new version of Python package is uploaded to Pypi, allowing one to
pip install -U gnes
- ReadTheDoc: a new version of the document will be built,
latest
will be updated and the old version will be achieved - Benchmark: speed test will be updated.
Meanwhile, a new pull request containing the updated CHANGELOG and the new version number will be made automatically, pending for review and merge.
Major and minor version increments¶
- MAJOR version when GNES make incompatible API changes;
- MINOR version when GNES add functionality in a backwards-compatible manner.
The decision of incrementing major and minor version, i.e. from 0.0.0
to 0.1.0
or from 1.0.0
to 2.0.0
, is made by the GNES team.
Testing Locally¶
The best way to test GNES is using a Docker container, in which you don’t have to worry about the dependencies.
We provide a public Docker image gnes/ci-base
, which contains the required dependencies and some pretrained models used in our continuous integration pipeline.
You can find the image at here or pull the image via:
docker pull gnes/ci-base
To test GNES inside this image, you may run
docker run --network=host --rm --entrypoint "/bin/bash" -it gnes/ci-base
# now you are inside the 'gnes/ci-base' container
# first sync your local modification, then
pip install -e .[all]
python -m unittest tests/*.py
Interesting Points¶
Currently there are three major directions of contribution:
- Porting state-of-the-art models to GNES. This includes new preprocessing algorithms, new DNN networks for encoding, and new high-performance index. Believe me, it is super easy to wrap an algorithm and use it in GNES. Checkout this example.
- Adding tutorial and learning experience. What is good and what can be improved? If you apply GNES in your domain, whether it’s about NLP or CV, whether it’s a blog post or a Reddit/Twitter thread, we are always eager to hear your thoughts.
- Completing the user experience of other programming languages. GNES offers a generic interface with gRPC and protobuf, therefore it is easy to add an interface for other languages, e.g. Java, C, Go.
Release Note (v0.0.46
)¶
Release time: 2019-10-17 18:14:45
🙇 We’d like to thank all contributors for this new release! In particular, Han Xiao, felix, raccoonliukai, hanhxiao, Jem, 🙇
🐞 Bug fixes¶
- [
dbf1a5e7
] - release: add some hints to the release script (Han Xiao) - [
a641d5c7
] - shot-detector: rename shot detector (felix) - [
d48eb53a
] - preprocessor: add max_shot_num for shotdetect (raccoonliukai) - [
9331ef58
] - flow: use recommend flow api to reduce confusion (hanhxiao) - [
660f8f99
] - flow: make base flow as class not object (hanhxiao) - [
3062c43c
] - ci: remove cffi from gnes docker image (hanhxiao) - [
38147fe8
] - helper: fix gpuutil exception (hanhxiao) - [
15d9b4fe
] - ci: fix cffi version to 1.12.3 (hanhxiao) - [
cbac2de4
] - ci: fix cffi to 1.12.3 (hanhxiao) - [
24e41bec
] - preprocessor: add numpy transform (Jem) - [
6008f7d1
] - service: revert socket log (Han Xiao) - [
474deddf
] - control-sock: build control socket at the begining (felix) - [
707d9e96
] - service-logging: show socket creating (felix) - [
01531f74
] - stream-call: hungry mode to receive responses (felix)
💥 Breaking Changes (v0.0.45 -> v0.0.46
)¶
The new GNES Flow API introduced since v0.0.46
has become the main API of GNES. It provides a pythonic and intuitive way of building pipelines in GNES, enabling run/debug on a local machine. It also supports graph visualization, swarm/k8s config export, etc. More information about GNES Flow can be found at here.
As a consequence, the composer
module as well as gnes compose
CLI and GNES board web UI will be removed in the next releases.
GNES board will be redesigned using the GNES Flow API. We highly welcome your contribution on this thread!
Release Note (v0.0.45
)¶
Release time: 2019-10-15 14:01:07
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, felix, Han Xiao, 🙇
🆕 New Features¶
- [
166698ce
] - flow: add index and query flow as common flow (hanhxiao) - [
8a60c261
] - flow: flow can not export docker swarm config (hanhxiao) - [
80cb530e
] - flow: add flow to python generator (hanhxiao) - [
f6536c87
] - flow: add eq operator to the flow to enable comparison (hanhxiao) - [
9ca757b4
] - flow: add set remove and set_last (hanhxiao) - [
3c3c54b5
] - webp-encoder: support webp encoder (felix) - [
b94490f1
] - flow: allow add service to be str (hanhxiao) - [
4055ad8e
] - flow: add support to replicas plot (hanhxiao) - [
7265f76c
] - grpc: add proxy argument to cli (hanhxiao) - [
3901078c
] - incep_v4_encoder: add inception v4 encoder for video (felix)
🐞 Bug fixes¶
- [
8911314b
] - style: double quote to single quote (Han Xiao) - [
228a2b19
] - flow: fix unit test assert in flow (hanhxiao) - [
7d2c681e
] - flow: add warning to jpg downloader (hanhxiao) - [
fce94d94
] - service: fix ServiceManager replicas router (hanhxiao) - [
2705c287
] - video-decoder: none chunk spliter (felix)
🚧 Code Refactoring¶
Release Note (v0.0.44
)¶
Release time: 2019-10-11 15:27:37
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, felix, Jem, 🙇
🆕 New Features¶
- [
2fb0f4f9
] - flow: add dump to jpg (hanhxiao) - [
552fcdfe
] - indexer-cli: add as_response switcher to indexer cli (hanhxiao) - [
c8cedd04
] - service: remove async dump for better stability (hanhxiao) - [
1739c7b6
] - flow: add client to flow (hanhxiao) - [
43b9d014
] - flow: add context manager to flow (hanhxiao) - [
ae0d4056
] - flow: first version of gnes flow (hanhxiao)
🐞 Bug fixes¶
- [
c23ea61f
] - frontend: fix frontend blocking behavior (hanhxiao) - [
c880c9b0
] - service: make service handler thread-safe (hanhxiao) - [
a3da0582
] - flow: fix flow unit test (hanhxiao) - [
6d118404
] - ffmpeg: threads=1 (felix) - [
bca5b5b7
] - base: fix env expansion in gnes_config (hanhxiao) - [
72f4a044
] - indexer: fix empty chunk and dump_interval (hanhxiao) - [
9b79cdf5
] - memory-leak: try to fix memory leak danger (felix) - [
16097f3f
] - video-decoder: fix name (felix) - [
199a71a6
] - frontend: remove duplicate receive (hanhxiao) - [
73dae6bd
] - service: minor fix on the dump_interval (hanhxiao) - [
6f401905
] - client: fix bugs for client (Jem) - [
c5af9308
] - parser: use str instead of textio stream to prevent serializer err (hanhxiao) - [
6a368335
] - cli: show more detailed version info in cli (hanhxiao)
Release Note (v0.0.43
)¶
Release time: 2019-09-30 17:37:58
🙇 We’d like to thank all contributors for this new release! In particular, felix, hanhxiao, raccoonliukai, 🙇
🆕 New Features¶
- [
bbf1ed8e
] - frontend: add max pending request to frontend (hanhxiao) - [
946df39b
] - indexer: delay the num_dim spec on first add (hanhxiao) - [
fdc38d57
] - cd: trigger benchmark in push and tag (hanhxiao) - [
f8b9e00e
] - cd: smaller num document for benchmarking (hanhxiao) - [
3974a1ba
] - cd: adding benchmark to cd pipeline (hanhxiao) - [
086a73cb
] - proto: add vcs version to pb (hanhxiao) - [
285d9dde
] - docker: add vcs version as env var (hanhxiao) - [
a3e22db3
] - service: add healthcheck for arbitary service (hanhxiao) - [
dedb8ba2
] - proto: add ready status for healthcheck (hanhxiao)
🐞 Bug fixes¶
- [
47add702
] - ffmpeg-threads: threads=0 (felix) - [
9365ddb9
] - video-decoder: minor revision video-decoder chunk spliter (felix) - [
be09bb09
] - cd: fix duplicate step name in cd (hanhxiao) - [
ccf4efc8
] - cd: fix trigger in cd pipeline (hanhxiao) - [
ca73b702
] - shotdetect: support get arguments from yaml (raccoonliukai) - [
17aa78da
] - shotdetect: fix bug with thre_algo after histcmp (raccoonliukai) - [
a6d1484e
] - shotdetect: fix bug with thre_algo (raccoonliukai) - [
b69591de
] - shotdetect: fix shot boundary (raccoonliukai) - [
f6c263a7
] - ffmpeg: use -threads = 1 for ffmpeg (felix) - [
beafdb3a
] - docker: fix vcs ref url and add build date as env (hanhxiao) - [
0367334a
] - service: fix error message (hanhxiao) - [
434bc8db
] - service: styling (hanhxiao)
🚧 Code Refactoring¶
Release Note (v0.0.42
)¶
Release time: 2019-09-26 10:40:48
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, felix, raccoonliukai, Jem, 🙇
🆕 New Features¶
- [
8bef90dd
] - cd: docker images now push to github during merge and tag (hanhxiao) - [
0ea566ff
] - service: send ndarray separately (hanhxiao) - [
09199d82
] - preprocessor: add frame selector (Jem) - [
a2f10589
] - parser: add raw_bytes_in_sep to cli (hanhxiao) - [
10788951
] - proto: speedup send/recv by separating raw_bytes from pb (hanhxiao) - [
2326fe97
] - preprocessor: add preprocessor for mp4 and gif decode (raccoonliukai) - [
803afb34
] - snoflake-uuid: add snowflake uuid generator (felix) - [
fe7025f5
] - frontend: dump route in the frontend (hanhxiao) - [
8fbb0945
] - router: add a block router for benchmarking (hanhxiao)
🐞 Bug fixes¶
- [
43145019
] - unittest: fix unit test for send recv (hanhxiao) - [
b6f2cdaf
] - service: fix send/recv for better compatability (hanhxiao) - [
8c6f2558
] - fix route table sum time (raccoonliukai) - [
8a0beec8
] - service: send single long message rather than multiple (hanhxiao) - [
3b1f963c
] - preprocessor: add solution for raw_video (raccoonliukai) - [
1b4a04fe
] - preprocessor: add videodecode in init (raccoonliukai) - [
7108460a
] - memory_leak: try to fix memory leak (felix) - [
82951d95
] - frontend: use poll for better efficiency (hanhxiao) - [
2f539b7a
] - snowflake: fix error shift (felix) - [
84e67792
] - frontend: fix progressbar and route table (hanhxiao) - [
9a65e4fe
] - frontend: flush dump (hanhxiao) - [
2e326af5
] - catch exception in hook function (hanhxiao) - [
402867cc
] - fix route table total time (hanhxiao) - [
30976179
] - docker: decoupling prerequest and gnes install (hanhxiao) - [
c5347a5b
] - service: make route_table as option for all services (hanhxiao) - [
45a078d9
] - docker: reduce the size of built image (hanhxiao)
Release Note (v0.0.41
)¶
Release time: 2019-09-20 20:51:39
🙇 We’d like to thank all contributors for this new release! In particular, felix, raccoonliukai, hanhxiao, Jem, 🙇
🆕 New Features¶
- [
e255bd48
] - shot-detector: limit number of frames in shots (felix) - [
6e87afa4
] - traffic-controller: network traffic controller in frontend (felix) - [
6833a27c
] - preprocessor: add sframes for shots frame number (raccoonliukai) - [
ea89d8cb
] - stream-call: only 1000 pending tasks (felix) - [
ca53c65f
] - video-encoder: encode video from list of images (felix)
🐞 Bug fixes¶
- [
6aa0c3ca
] - ffmpeg-video: fig bug for scaling videos to stdout (felix) - [
780aad0d
] - subprocess: close stdout and stderr to avoid memory leak (felix) - [
205962fb
] - socket: raise socket rec/send message exception (felix) - [
64acb4cd
] - preprocessor: fix bug when num_frames < 4 in shotdetect (raccoonliukai) - [
05db02f7
] - stream-client: request queue size is limited by 1000 (felix) - [
4f389449
] - socket-buffer: set hwm and buffer limit for zmq socket (felix) - [
092379e1
] - preprocessor: fix type of index in shotdetect (raccoonliukai) - [
9023afcd
] - test: add test to cover three runtimes (hanhxiao) - [
65fff1a9
] - cli: fix progressbar (hanhxiao) - [
8828535c
] - proto: fix version check in recv message (hanhxiao)
🚧 Code Refactoring¶
Release Note (v0.0.40
)¶
Release time: 2019-09-12 19:54:34
🙇 We’d like to thank all contributors for this new release! In particular, Han Xiao, Jem, hanhxiao, felix, 🙇
🆕 New Features¶
🐞 Bug fixes¶
- [
8704331e
] - proto: fix version check in recv message (hanhxiao) - [
563a48c7
] - cli: fix cli client required (hanhxiao) - [
3db34449
] - proto: fix merge route logic (hanhxiao) - [
c31f21db
] - parser: fix default dump interval to 5 (hanhxiao) - [
00c25f39
] - parser: remove limite on message size (hanhxiao) - [
f89b4363
] - parser: set dump_interval to -1 (hanhxiao)
🚧 Code Refactoring¶
🍹 Other Improvements¶
- [
0b22029a
] - indexer: fix styles in indexer (Han Xiao) - [
edba197a
] - clean and format codes (felix) - [
2a781aee
] - license: remove aiohttp from barebone GNES license (hanhxiao) - [
cc72cf2b
] - docker: revert alpine docker to reduce size (hanhxiao) - [
9f58fb35
] - changelog: update change log to v0.0.39 (hanhxiao)
Release Note (v0.0.39
)¶
Release time: 2019-09-11 17:22:11
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, raccoonliukai, Larry Yan, 🙇
🆕 New Features¶
🐞 Bug fixes¶
- [
31c796d3
] - client: fix weights in helper indexer (hanhxiao) - [
21c3a8a9
] - client: rename stub to _stub (hanhxiao) - [
235d901a
] - parser: add max_concurrency to client (hanhxiao) - [
c988b327
] - client: fix sync client (hanhxiao) - [
54a252e5
] - indexer: add helper indexer to registeration (hanhxiao) - [
a1aed8f4
] - client: use StreamingClient as the parent class of CLIClient (raccoonliukai) - [
a5999828
] - preprocessor: add vframe(frame_num) for video and gif frames capture (raccoonliukai) - [
8357754a
] - encoder: fix PCAEncoder mean from fp64 to fp32 (raccoonliukai) - [
654a5ba4
] - encoder: fix vlad to speed up centroids calculation (Larry Yan) - [
814b2ee6
] - encoder: fix vald encocer (Larry Yan) - [
ffc822b3
] - encoder: fix vlad unittest (Larry Yan) - [
ddf13ff1
] - encoder: fix bug in vlad encoder (Larry Yan) - [
1ba4e11c
] - encoder: fix vald encoder and add unittest (Larry Yan) - [
f8e18d06
] - encoder: fix vald in numeric encoder (Larry Yan) - [
fbfa1e47
] - transformer: add model eval (Jem)
🚧 Code Refactoring¶
Release Note (v0.0.38
)¶
Release time: 2019-09-06 17:25:48
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, 🙇
Release Note (v0.0.37
)¶
Release time: 2019-09-06 16:46:20
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, felix, Jem, raccoonliu, raccoonliukai, Han Xiao, Larry Yan, 🙇
🆕 New Features¶
- [
105a0abf
] - encoder: add debug hook (hanhxiao) - [
0f04877f
] - service: add pre and post hooks to baseservice (hanhxiao) - [
92860848
] - reducer: add concat reducer (Jem) - [
2e6e80db
] - encoder: add PCAEncoder support in gnes buster image (raccoonliukai) - [
16fa80bd
] - tests: add unittest for PCAEncoder (raccoonliukai) - [
5a745b1e
] - tests: add unittest for EncoderService and IndexerService (raccoonliukai) - [
a0fec684
] - service: logging elapsed time and body type change (hanhxiao) - [
57cc95ff
] - encoder: add quantizer (Jem) - [
00e6280d
] - score_fn: use numpy for score fn (hanhxiao) - [
201c27e7
] - cli: add –sorted_response as cli argument (hanhxiao) - [
81b21093
] - index: move sort logic to base (hanhxiao) - [
a2d55dda
] - index: move sort logic out to base (hanhxiao) - [
674a9da2
] - encoder: add lab video model (Jem) - [
50a944b6
] - encoder: add yt8m feature extractor (Jem) - [
f908f381
] - score_fn: make score_fn as a TrainableBase (hanhxiao) - [
14c7e522
] - score_fn: make score_fn dumpable (hanhxiao) - [
0b78798d
] - score_fn: add score_fn as a new module (hanhxiao) - [
da56544f
] - encoder: add PCAEncoder for incremental pca training (raccoonliukai) - [
97bb6de2
] - lab encoder: add vggish for audio (Jem) - [
8cdcb7e8
] - chunk scorer: add offset divergence (Jem)
🐞 Bug fixes¶
- [
d404b8a7
] - tests: use lowercase for true (raccoonliu) - [
bb9bbe9d
] - tests: modify EncoderService unittest (raccoonliukai) - [
cd53a24b
] - indexer: fix numpy indexer (hanhxiao) - [
d70e877e
] - shot-detector: fix case of only one shot in video (felix) - [
e631d396
] - service: indexer service return empty when no chunk (hanhxiao) - [
67b211da
] - encoder: remove image resize from TFInceptionEncoder (raccoonliukai) - [
40849abc
] - indexer: fix is_sorted in response flush away the request (hanhxiao) - [
ab819387
] - ffmpeg: use tempfile as input instead of pipe (felix) - [
a8d2acfd
] - service: is input list is false when query (Jem) - [
ba21c4e7
] - service: fix bug for doc type in encoder (Larry Yan) - [
a4658250
] - scorer: fix np float conversion (hanhxiao) - [
2d6c70fc
] - indexer: fix vec np.concat (hanhxiao) - [
2ba135db
] - indexer: fix empty chunks indexing (hanhxiao) - [
40dd1d5a
] - encoder: fix embed_chunks_in_docs function (hanhxiao) - [
d94329b3
] - preprocess: fix offset in sentence splitter (hanhxiao)
🚧 Code Refactoring¶
- [
a8e87d9f
] - service: minimize event loop, move handling to handler (hanhxiao) - [
06aab813
] - grpc-client: implement async client via multi-threaded (felix) - [
35fa3ba4
] - pb: remove unused field (hanhxiao) - [
6bbfc993
] - score_fn: rename score functions (hanhxiao) - [
e9feaa61
] - score_fn: use post_init instead of property (hanhxiao) - [
f406f8f0
] - score_fn: move normalize_fn and score_fn to the init (hanhxiao)
🍹 Other Improvements¶
Release Note (v0.0.36
)¶
Release time: 2019-08-30 17:32:23
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, felix, 🙇
🆕 New Features¶
🐞 Bug fixes¶
- [
f1402f50
] - cli: fix cli chanel close (hanhxiao) - [
b140cca9
] - service: fix exception when no chunks (hanhxiao) - [
cee99a63
] - logger: change the color semantic for loglevel (hanhxiao) - [
4efea726
] - service: raise except when empty chunk (hanhxiao) - [
31bffeb7
] - preprocessor: add min_len to split preprocessor (hanhxiao) - [
7b16354a
] - style: fix style issues (hanhxiao) - [
c6183960
] - service: fix training logic in encoderservice (hanhxiao) - [
5828d20a
] - preprocessor: fix SentSplitPreprocessor (hanhxiao) - [
522c5a4e
] - preprocessor: rename SentSplitPreprocessor (hanhxiao) - [
030d6c66
] - setup: fix path in setup script (hanhxiao) - [
3818c9a3
] - test: fix router tests (hanhxiao) - [
9d03441e
] - proto: regenerate pb2 (hanhxiao) - [
f49f9a5b
] - indexer: fix parsing in DictIndexer (hanhxiao) - [
0215c6bf
] - ffmpeg: fix issue for start and durtion argument position (felix) - [
a735a719
] - service: log error in base service (hanhxiao) - [
3263e96c
] - service: move py_import from service manager to base service (hanhxiao) - [
990c879d
] - client: fix client progress bar, http (hanhxiao) - [
d02cd757
] - router: respect num_part when set (hanhxiao) - [
a76a4604
] - ffmpeg-video: fig bug for scaling videos to stdout (felix)
🚧 Code Refactoring¶
- [
42e7c13b
] - indexer: separate score logic and index logic (hanhxiao) - [
0c6f4851
] - preprocessor: use io utils in audio and gif (Jem) - [
bae75b8c
] - router: separate router and scoring logics (hanhxiao) - [
c3ebb93a
] - proto: refactor offset nd (Jem) - [
e3bbbd9b
] - shot_detector: update ffmpeg api (felix) - [
10cef54e
] - ffmpeg: refactor ffmpeg again (felix)
🏁 Unit Test and CICD¶
Release Note (v0.0.35
)¶
Release time: 2019-08-26 18:15:02
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, 🙇
🆕 New Features¶
🐞 Bug fixes¶
🚧 Code Refactoring¶
Release Note (v0.0.34
)¶
Release time: 2019-08-23 19:00:27
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, 🙇
Release Note (v0.0.34
)¶
Release time: 2019-08-23 18:44:34
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, 🙇
Release Note (v0.0.33
)¶
Release time: 2019-08-23 18:34:28
🙇 We’d like to thank all contributors for this new release! In particular, Jem, hanhxiao, felix, raccoonliukai, 🙇
🆕 New Features¶
🐞 Bug fixes¶
- [
6cfbda9d
] - preprocessor: move dependency into function (Jem) - [
0e88b77a
] - frontend: fix request_id zero is none (hanhxiao) - [
ca28ecb9
] - video preprocessor: use rgb as standard color (raccoonliukai) - [
5b5feb0b
] - video preprocessor: use dict update (raccoonliukai) - [
47721b1c
] - video preprocessor: remove custom canny threshold (raccoonliukai) - [
16aaa777
] - video preprocessor: modify inaccurate names (raccoonliukai) - [
dfb54b62
] - video preprocessor: Remove incorrect comments (raccoonliukai)
🚧 Code Refactoring¶
- [
3d63fac6
] - proto: request_id is now an integer (hanhxiao) - [
4497d765
] - shotdetector: use updated ffmpeg api to capture frames from videos (felix) - [
dbc06a85
] - ffmpeg: refactor ffmpeg to read frames, vides and gif (felix) - [
a7b12cb6
] - preprocessor: add gif chunk prep (Jem) - [
559a9971
] - compose: unify flask and http handler (hanhxiao)
Release Note (v0.0.32
)¶
Release time: 2019-08-21 17:23:13
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Han Xiao, Jem, 🙇
🆕 New Features¶
🐞 Bug fixes¶
🚧 Code Refactoring¶
Release Note (v0.0.31
)¶
Release time: 2019-08-20 14:01:04
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, 🙇
Release Note (v0.0.30
)¶
Release time: 2019-08-19 14:13:03
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, 🙇
Release Note (v0.0.29
)¶
Release time: 2019-08-16 15:40:31
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, 🙇
🐞 Bug fixes¶
🚧 Code Refactoring¶
🏁 Unit Test and CICD¶
Release Note (v0.0.28
)¶
Release time: 2019-08-14 20:54:26
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, raccoonliukai, Larry Yan, 🙇
🆕 New Features¶
- [
0133905c
] - client: add a client for benchmarking and testing (hanhxiao) - [
732f2e64
] - encoder: add pytorch transformers support in text encoder (raccoonliukai) - [
6aab48c8
] - docker: add buster image with minimum dependencies (hanhxiao) - [
da1bbc0d
] - docker: add alpine image with minimum dependencies (hanhxiao)
🐞 Bug fixes¶
- [
315bd16a
] - doc sum router: use meta info instead of doc id to do doc sum (Jem) - [
c9e92722
] - encoder: use offline model in ci-base for pytorch transformer (raccoonliukai) - [
d7b42d39
] - setup: remove unused dependencies (hanhxiao) - [
5b8acf7c
] - test: fix routes assert in tests (hanhxiao) - [
5fedf6df
] - encoder: fix unused variable (raccoonliukai) - [
df616463
] - cli: remove unnecessary argument (hanhxiao) - [
fd76aa79
] - request_generator: send index request in index mode (Jem) - [
64163cb1
] - batching: enable to process three dimension output in batching (Jem) - [
415456d6
] - preprocessor: fix bug (Larry Yan) - [
c150ad59
] - preprocessor: modify ffmpeg video pre add video cutting method (Larry Yan) - [
b0f22d04
] - audio preprocessor: filter audio with zero length (Jem) - [
d1cfa539
] - preprocessor: modify ffmpeg video preprocessor (Larry Yan)
🏁 Unit Test and CICD¶
Release Note (v0.0.27
)¶
Release time: 2019-08-09 19:51:57
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, Larry Yan, raccoonliu, Han Xiao, raccoonliukai, 🙇
🆕 New Features¶
- [
55126f2b
] - grpc: add a general purpose grpc service (hanhxiao) - [
23c6e68a
] - reduce router: add chunk and doc reduce routers for audio (Jem) - [
6d3d2b4c
] - cli: use ServiceManager as default service runner (hanhxiao) - [
ccfd474a
] - service: add ServiceManager and enable parallel services in one container (hanhxiao) - [
63f9173f
] - service: enabling the choose of thread or process as the backend (hanhxiao) - [
2647b848
] - audio: add preprocess and mfcc encoder for audio (Jem) - [
208e1937
] - audio: add preprocess and mfcc encoder for audio, update protobuf (Jem) - [
77a2ea42
] - parser: improve yaml_path parsing (hanhxiao) - [
762535ca
] - vlad: add vlad and enable multiple chunks and frames (Jem) - [
64e948d4
] - encoder: add onnxruntime for image encoder (raccoonliukai) - [
f03e6fc2
] - encoder: add onnxruntime suport for image encoder (raccoonliukai)
🐞 Bug fixes¶
- [
5ae46d61
] - composer: rename grpcfrontend to frontend (hanhxiao) - [
4cb83383
] - audio: restrict max length for mfcc encoding (Jem) - [
e516646f
] - grpc: add max_message_size to the argparser (hanhxiao) - [
0493e6fc
] - encoder: fix netvlad (Larry Yan) - [
e773aa33
] - service manager: fix nonetype for service manager (Jem) - [
d5d15d7f
] - compose: fix a bug in doc_reduce_test (hanhxiao) - [
6856cb0a
] - compose: copy args on every request (hanhxiao) - [
f80e8c03
] - cli: set default num_part is None (hanhxiao) - [
7031fe20
] - preprocessor: add random sampling to ffmpeg (Larry Yan) - [
fd37e6d9
] - encoder: fix bug caused by batching in inception_mixture (Larry Yan) - [
2191b27b
] - composer: fix yaml generation (hanhxiao) - [
e5fefcee
] - encoder: fix batching in encoder (hanhxiao) - [
e35e3b3c
] - composer: fix composer router generation logic (hanhxiao) - [
7300e055
] - preprocessor: quanlity improvement (Larry Yan) - [
47efaba4
] - unittest: fix unittest of video preprocessor 2 (Larry Yan) - [
a6efb4af
] - unittest: fix unittest of video preprocessor (Larry Yan) - [
dd1216bb
] - unittest: fix unittest for video processor (Larry Yan) - [
8e6dc4c6
] - encoder: add func for preprocessor (Larry Yan) - [
2b21dc5a
] - encoder: fix unused import and variable (raccoonliu) - [
fd576915
] - test: fix import (Han Xiao) - [
a0fdad36
] - test: fix broken code (Han Xiao) - [
8ca07a74
] - test: fix img_process_for_test (Han Xiao) - [
7c16fb8b
] - preprocessor: fix bug in ffmpeg.py and add more func to helper (Larry Yan) - [
e6a37119
] - preprocessor: fix bug in params in ffmepg (Larry Yan) - [
f8d2abe5
] - preprocessor: fix bug in ffmpeg (Larry Yan) - [
67610f86
] - preprocessor: add more method for cutting video (Larry Yan)
🚧 Code Refactoring¶
🏁 Unit Test and CICD¶
Release Note (v0.0.26
)¶
Release time: 2019-08-02 18:18:45
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, Larry Yan, 🙇
🆕 New Features¶
🐞 Bug fixes¶
- [
fc5026da
] - board: improve gnes board 500 message (hanhxiao) - [
823bdeda
] - test: fix grpc gentle shutdown (hanhxiao) - [
f6a801f7
] - test: fix preprocessor building for image test (hanhxiao) - [
50fdc041
] - base: fix ref to CompositionalTrainableBase (hanhxiao) - [
54a931c7
] - test: fix test images by removing mac stuff (hanhxiao) - [
14cdfabe
] - sliding window: fix the boundary (Jem) - [
46b5c94e
] - encoder: fix name for video encoder (Larry Yan) - [
15eb50b4
] - encoder: fix params in basevideo encoder (Larry Yan) - [
5b0fe7c6
] - preprocessor: fix FFmpegVideoSegmentor (Larry Yan) - [
d6a46fa6
] - encoder: fix import path for mixture encoder (Larry Yan) - [
17779676
] - encoder: fix mixture encoder (Larry Yan) - [
95f03c56
] - encoder: fix bug in video mixture encoder (Larry Yan) - [
3fdf1c06
] - encoder: fix mixture (Larry Yan) - [
67991533
] - encoder: add netvlad and netfv register class (Larry Yan) - [
92500f0f
] - encoder: add netvlad and netfv (Larry Yan)
🚧 Code Refactoring¶
- [
c430ef64
] - base: better batch_size control (hanhxiao) - [
58217d8c
] - base: moving is_trained to class attribute (hanhxiao) - [
7126d496
] - preprocessor: separate resize logic from the unary preprocessor (hanhxiao) - [
52f87c7f
] - base: make pipelineencoder more general and allow pipelinepreprocessor (hanhxiao)
Release Note (v0.0.25
)¶
Release time: 2019-07-26 19:45:21
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, felix, Larry Yan, Jem, Han Xiao, Felix, 🙇
🆕 New Features¶
- [
66aec9c9
] - grpc: add StreamCall and decouple send and receive (hanhxiao) - [
5697441b
] - indexer: consider offset relevance at query time (Jem) - [
04c9c745
] - image preprocessor: calculate offsetnd for each chunk (Jem) - [
b34a765a
] - compose: add interactive mode of GNES board using Flask (hanhxiao) - [
5876c15e
] - base: support loading external modules from py and yaml (hanhxiao)
🐞 Bug fixes¶
- [
a20672d3
] - preprocessor: add logging in helper module (felix) - [
f9500c1f
] - protobuffer: add doc_type as func argument in RequestGenerator (felix) - [
1c3bb01a
] - service: fix bug in doc_type name in indexer service (Larry Yan) - [
d834f578
] - service: add doc type to req generator (Larry Yan) - [
80e234e1
] - service: fix bug in req Generator add doc_type (Larry Yan) - [
5743e258
] - indexer: fix bug in indexer service (Larry Yan) - [
11dde2bf
] - encoder: fix bug in tf inception (Larry Yan) - [
ded92c57
] - indexer: fix bug for indexer service dealing with empty doc (Larry Yan) - [
1dff06f1
] - encoder: fix bug for encoder service dealing with empty doc (Larry Yan) - [
7e43d5a2
] - preprocessor: fix ffmpeg to deal with broken image (Larry Yan) - [
83ebaced
] - preprocessor: move import imagehash to inside (hanhxiao) - [
7c669a70
] - test: rename the yaml test file (hanhxiao) - [
2cc26342
] - compose: change textarea font to monospace (hanhxiao) - [
e644e391
] - encoder: fix gpu limitation in inception (Larry Yan) - [
89d8b70c
] - grpc: fix bug in RequestGenerator query (Larry Yan) - [
c52c2cc6
] - base: fix gnes_config mixed in kwargs (hanhxiao) - [
68c15fac
] - base: fix redundant warning in pipeline encoder (hanhxiao) - [
aadeeefb
] - composer: fix composer state machine (hanhxiao) - [
c0bffe6c
] - indexer: normalize weight (Jem) - [
2c696483
] - indexer: fix weight in indexer call (Larry Yan) - [
139a02d9
] - compose: fix compose bug of pub-sub rule, duplicate yaml_path (hanhxiao) - [
649ed131
] - encoder: add normalize option in cvae encoder (Larry Yan) - [
eb487799
] - encoder: fix tf scope error in cvae encoder (Larry Yan) - [
ab6c88cc
] - encoder: fix error in cvae encoder (Larry Yan) - [
a4b883ac
] - indexer: add drop raw bytes option to leveldb (Larry Yan) - [
4b52bcba
] - grpc: fix grpc plugin path (Larry Yan) - [
d3fbbcac
] - weighting: add simple normalization to chunk search (Jem) - [
08a9a4e3
] - grpc: fix grpc service (Larry Yan) - [
6e6bbf83
] - grpc: add auto-gen grpc code (Larry Yan) - [
b89d8fa2
] - grpc: add stream index and train in proto (Larry Yan) - [
15cd7e58
] - base: fix dump and load on compositional encoder (hanhxiao) - [
bab48919
] - encoder: fix tf inception (Larry Yan) - [
973672ef
] - encoder: fix bug for encoder bin load (Larry Yan) - [
1bef3971
] - setup: fix setup script (hanhxiao) - [
67fb5766
] - compose: fix argparser (hanhxiao) - [
63c4515f
] - compose: accept parser argument only (hanhxiao) - [
887d89cc
] - release: ask BOT_URL before releasing (hanhxiao)
🚧 Code Refactoring¶
📗 Documentation¶
- [
c853e3da
] - tutorial: fix svg size (hanhxiao) - [
04cccdcd
] - tutorial: fix svg path (hanhxiao) - [
8927cd4f
] - tutorial: add yaml explain (hanhxiao) - [
5b52ce4c
] - fix doc path (hanhxiao) - [
45751e1f
] - readme: add quick start for readme (hanhxiao) - [
73891ecc
] - readme: add install guide to readme and contribution guide (hanhxiao)
🏁 Unit Test and CICD¶
- [
6ff3079b
] - unittest: skip all os environ test (hanhxiao) - [
816fa043
] - unittest: skip blocked test (hanhxiao) - [
79a9c106
] - unittest: run test in verbose mode (hanhxiao) - [
83276f90
] - torchvision: install torchvision dependency to enable tests (hanhxiao) - [
499682ce
] - base: add unit test for load a dumped pipeline from yaml (hanhxiao) - [
26a7ad18
] - composer: add unit test for flask (hanhxiao) - [
87ec1fd2
] - base: move module delete to teardown (hanhxiao) - [
479b183d
] - compose: skip unit test (hanhxiao)
Release Note (v0.0.24
)¶
Release time: 2019-07-19 18:18:46
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, Larry Yan, felix, 🙇
🆕 New Features¶
🐞 Bug fixes¶
- [
1b526832
] - base: fix dump yaml kwargs (hanhxiao) - [
086f3cea
] - base: fix ump instance (hanhxiao) - [
12dfde42
] - base: move name setting to trainable base (hanhxiao) - [
16f1a497
] - base: move set config to metaclass (hanhxiao) - [
b97acd6c
] - base: fix duplicate warning (hanhxiao) - [
991e4425
] - base: fix duplicate load and init from yaml (hanhxiao) - [
69a486e5
] - compose: fix import (hanhxiao) - [
4977aa3c
] - vector indexer: reorder relevance and chunk weight (Jem) - [
2448411d
] - encoder: modify CVAE (Larry Yan) - [
b4bf0bf8
] - indexer: add path check for dir and file (hanhxiao) - [
92f36c33
] - fasterrcnn: handle imgs with 0 chunk (Jem) - [
a1329913
] - fasterrcnn: fix bug for gpu (Jem) - [
38eca0ce
] - grpc: change grpc client message size limit (felix) - [
3836020a
] - preprocessor: fix preprocessor service handler function name error (felix) - [
599a3c3d
] - compose: fix composer logic (hanhxiao) - [
7f3b2fb5
] - release: fix git tag version (hanhxiao)
🚧 Code Refactoring¶
🏁 Unit Test and CICD¶
Release Note (v0.0.23
)¶
Release time: 2019-07-17 18:28:08
🙇 We’d like to thank all contributors for this new release! In particular, hanhxiao, Jem, felix, Larry Yan, Han Xiao, 🙇
🆕 New Features¶
- [
cb4d9cf2
] - release: add auto release and keep change log (hanhxiao) - [
c667d874
] - image_preprocessor: add fasterRCNN (Jem) - [
a6c2975b
] - composer: improve the gnes board with cards (hanhxiao) - [
6ec4233d
] - composer: add swarm and bash generator (hanhxiao) - [
08aa30f4
] - composer: add shell script generator (hanhxiao) - [
033a4b9c
] - composer: add composer and mermaid renderer (hanhxiao)
🐞 Bug fixes¶
- [
2b7c3f18
] - compose: resolve unclosed file warning (hanhxiao) - [
8030feb2
] - compose: fix router logic in compose (hanhxiao) - [
736f6053
] - gnesboard: fix cdn (hanhxiao) - [
fb07ff02
] - doc_reducer_router: fix reduce error (felix) - [
a7236308
] - image encoder: define use_cuda variable via args (felix) - [
cba5e190
] - image_encoder: enable batching encoding (felix) - [
3423ec83
] - composer: add compose api to api.py (hanhxiao) - [
70ba3fca
] - composer: in bash mode always run job in background (hanhxiao) - [
054981ce
] - composer: fix gnes board naming (hanhxiao) - [
743ec3b0
] - composer: fix unit test and add tear down (hanhxiao) - [
64aef413
] - composer: fix styling according to codacy (hanhxiao) - [
dca4b03b
] - service: fix bug grpc (Larry Yan) - [
09e68da2
] - service: fix grpc server size limit (Larry Yan) - [
3da8da19
] - encoder: rm un-used import in inception (Larry Yan) - [
8780a4da
] - bugs for integrated test (Jem) - [
38fff782
] - preprocessor: move cv2 dep to pic_weight (Han Xiao) - [
37155bba
] - preprocessor-video: move sklearn dep to apply (Han Xiao) - [
1f6a06a2
] - encoder: rm tf inception unittest (Larry Yan) - [
eaffbbff
] - encoder: register tf inception in init (Larry Yan) - [
d0099b79
] - encoder: add necessary code from tf (Larry Yan) - [
b480774a
] - encoder: add inception tf (Larry Yan)
🏁 Unit Test and CICD¶
Tutorials¶
Warning
🚧 Tutorial is still under construction. Stay tuned! Meanwhile, we sincerely welcome you to contribute your own learning experience / case study with GNES!