Blame: inference/worker/interface.py - LAION-AI/Open-Assistant

LAION-AI / Open-Assistant UNCLAIMED

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

0 0 88 Python

Normal View History Raw

added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`from typing import Literal`

			`import pydantic`
			`from oasst_shared.schemas import inference`


			`class GenerateStreamParameters(pydantic.BaseModel):`
hotfix for max_new_tokens parameter being None (#2146) the backend expects this to be a number, never null 2023-03-21 09:52:54 +01:00			`max_new_tokens: int = 1024`
added hf server 2023-03-19 15:39:31 +01:00			`do_sample: bool = True`
changed basic hf server to support quantization and streaming (#2293) I now also put the multi-worker-image PR in here because it builds heavily upon it --------- Co-authored-by: Andreas Köpf <andreas.koepf@provisio.com> Co-authored-by: Dimitri <dimitrivr@icloud.com> Co-authored-by: IRFN <irfantogluk@gmail.com> Co-authored-by: AbdBarho <ka70911@gmail.com> Co-authored-by: Oliver Stanley <olivergestanley@gmail.com> Co-authored-by: Michael Gartsbein <mikegarts@users.noreply.github.com> Co-authored-by: mishka <gartsocial@gmail.com> Co-authored-by: Theodor Peifer <teddypeifer@gmail.com> 2023-04-03 17:30:31 +02:00			`top_k: int \| None = None`
			`top_p: float \| None = None`
			`typical_p: float \| None = None`
			`temperature: float \| None = None`
			`repetition_penalty: float \| None = None`
			`seed: int \| None = None`
Introduce model configs to abstract pairings of models and hardware (#2194) - introduces the concept of "model config" defined in `model_configs.py` that can be referenced by the frontend by name - adjusted dockerfiles to automatically determine model id and quantization from a given config name - introduced an OOM test to find out sequence length and batch size limits on different GPUs, results in notion 2023-03-24 22:28:11 +01:00			`stop: list[str] = []`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`details: bool = True`
Basic implementation of an plugin system for OA (#2765) # the plugins Hi, this is my first PR here, but I was somewhat active on other fronts of OA development. This pr will bring some basic plugin functionality to the Open Assistant, and as discussed with @yk @andreaskoepf there are quite a few directions for something like this to be integrated with OA, but this should serve a purpose as some initial proof-of-concept and exploratory feature for 3rd party integrations with OA. I also included a small calculator plugin as a possible candidate for the OA internal plugin, which would be like the default one, for people to try out and also as an example, of how one could implement own plugins. If we decide to include this plugin, there should be added a deployment/hosting mechanism for it. I will push a separate branch in the next couple of days, that will serve as an alternative to the approach, so we can A/B test it along with the new models (SFT-s/RLHF-s) I also tried to comment on every weird quirk or decision in code, so one could easily understand, and change it, but there are quite a few places, where a simple new line char or like " char in specific strings, could affect LLM performance in the plugin usage scenario. Will also try to push some documentation regarding plugin development, but there are already some useful comments in calculator plugin on what should be paid attention to. Here are some of the current UI changes introduced with this PR. <details> <summary>Plugin chooser component</summary> <img width="854" alt="Screenshot 2023-04-20 at 00 55 38" src="https://user-images.githubusercontent.com/13547364/233217078-d2e4e28f-36eb-451e-a655-1679188aed52.png"> </details> <details> <summary>Plugin execution details component</summary> <img width="824" alt="Screenshot 2023-04-19 at 21 40 38" src="https://user-images.githubusercontent.com/13547364/233216884-e69bcf9c-707f-43de-a52d-41db5d92c504.png"> <img width="744" alt="Screenshot 2023-04-19 at 21 40 56" src="https://user-images.githubusercontent.com/13547364/233217161-c114f5b9-881f-4476-a2b1-459179a9353e.png"> <img width="545" alt="Screenshot 2023-04-19 at 21 30 18" src="https://user-images.githubusercontent.com/13547364/233217187-17fb87e5-e4be-43e4-96ac-7cdd84223147.png"> </details> <details open> <summary>Plugin assisted answer</summary> <img width="837" alt="Screenshot 2023-04-19 at 21 29 52" src="https://user-images.githubusercontent.com/13547364/233217260-4986f456-efa5-47a5-aabc-926a8a5a9a2f.png"> <img width="943" alt="Screenshot 2023-04-21 at 18 28 45" src="https://user-images.githubusercontent.com/13547364/233687877-0d0f9ffb-b16a-48de-96ad-e4c3a02f4c66.png"> </details> <details> <summary>Verified plugin usage UI look</summary> <img width="864" alt="Screenshot 2023-04-20 at 15 08 36" src="https://user-images.githubusercontent.com/13547364/233376402-52ed5a3d-631a-4350-9130-61548a8d7b02.png"> </details> <details> <summary>Some plugin usage examples</summary> <img width="1048" alt="Screenshot 2023-04-18 at 01 57 33" src="https://user-images.githubusercontent.com/13547364/233217685-79b262bd-81fd-4641-9ad9-110e8b689e42.png"> <img width="993" alt="Screenshot 2023-04-17 at 23 17 35" src="https://user-images.githubusercontent.com/13547364/233217687-561773a1-b16a-49f5-bdbc-f30b46bed33d.png"> </details> <details open> <summary>Mixed usage example where model chooses not to use plugin on its own</summary> <img width="690" alt="Screenshot 2023-04-20 at 21 31 46" src="https://user-images.githubusercontent.com/13547364/233469420-25c72893-7c7a-426c-9f4b-ce9144d643ae.png"> </details> --------- Co-authored-by: agi <you@example.com> Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com> Co-authored-by: Oliver Stanley <olivergestanley@gmail.com> 2023-05-02 10:21:12 +02:00			`plugins: list[inference.PluginEntry] = pydantic.Field(default_factory=list[inference.PluginEntry])`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00
			`@staticmethod`
inference worker overhaul & compliance check (#1727) 2023-02-18 23:26:05 +01:00			`def from_work_parameters(params: inference.WorkParameters) -> "GenerateStreamParameters":`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`return GenerateStreamParameters(`
Introduce model configs to abstract pairings of models and hardware (#2194) - introduces the concept of "model config" defined in `model_configs.py` that can be referenced by the frontend by name - adjusted dockerfiles to automatically determine model id and quantization from a given config name - introduced an OOM test to find out sequence length and batch size limits on different GPUs, results in notion 2023-03-24 22:28:11 +01:00			`max_new_tokens=params.sampling_parameters.max_new_tokens,`
inference worker overhaul & compliance check (#1727) 2023-02-18 23:26:05 +01:00			`do_sample=params.do_sample,`
Introduce model configs to abstract pairings of models and hardware (#2194) - introduces the concept of "model config" defined in `model_configs.py` that can be referenced by the frontend by name - adjusted dockerfiles to automatically determine model id and quantization from a given config name - introduced an OOM test to find out sequence length and batch size limits on different GPUs, results in notion 2023-03-24 22:28:11 +01:00			`top_k=params.sampling_parameters.top_k,`
			`top_p=params.sampling_parameters.top_p,`
			`typical_p=params.sampling_parameters.typical_p,`
			`temperature=params.sampling_parameters.temperature,`
			`repetition_penalty=params.sampling_parameters.repetition_penalty,`
inference worker overhaul & compliance check (#1727) 2023-02-18 23:26:05 +01:00			`seed=params.seed,`
Basic implementation of an plugin system for OA (#2765) # the plugins Hi, this is my first PR here, but I was somewhat active on other fronts of OA development. This pr will bring some basic plugin functionality to the Open Assistant, and as discussed with @yk @andreaskoepf there are quite a few directions for something like this to be integrated with OA, but this should serve a purpose as some initial proof-of-concept and exploratory feature for 3rd party integrations with OA. I also included a small calculator plugin as a possible candidate for the OA internal plugin, which would be like the default one, for people to try out and also as an example, of how one could implement own plugins. If we decide to include this plugin, there should be added a deployment/hosting mechanism for it. I will push a separate branch in the next couple of days, that will serve as an alternative to the approach, so we can A/B test it along with the new models (SFT-s/RLHF-s) I also tried to comment on every weird quirk or decision in code, so one could easily understand, and change it, but there are quite a few places, where a simple new line char or like " char in specific strings, could affect LLM performance in the plugin usage scenario. Will also try to push some documentation regarding plugin development, but there are already some useful comments in calculator plugin on what should be paid attention to. Here are some of the current UI changes introduced with this PR. <details> <summary>Plugin chooser component</summary> <img width="854" alt="Screenshot 2023-04-20 at 00 55 38" src="https://user-images.githubusercontent.com/13547364/233217078-d2e4e28f-36eb-451e-a655-1679188aed52.png"> </details> <details> <summary>Plugin execution details component</summary> <img width="824" alt="Screenshot 2023-04-19 at 21 40 38" src="https://user-images.githubusercontent.com/13547364/233216884-e69bcf9c-707f-43de-a52d-41db5d92c504.png"> <img width="744" alt="Screenshot 2023-04-19 at 21 40 56" src="https://user-images.githubusercontent.com/13547364/233217161-c114f5b9-881f-4476-a2b1-459179a9353e.png"> <img width="545" alt="Screenshot 2023-04-19 at 21 30 18" src="https://user-images.githubusercontent.com/13547364/233217187-17fb87e5-e4be-43e4-96ac-7cdd84223147.png"> </details> <details open> <summary>Plugin assisted answer</summary> <img width="837" alt="Screenshot 2023-04-19 at 21 29 52" src="https://user-images.githubusercontent.com/13547364/233217260-4986f456-efa5-47a5-aabc-926a8a5a9a2f.png"> <img width="943" alt="Screenshot 2023-04-21 at 18 28 45" src="https://user-images.githubusercontent.com/13547364/233687877-0d0f9ffb-b16a-48de-96ad-e4c3a02f4c66.png"> </details> <details> <summary>Verified plugin usage UI look</summary> <img width="864" alt="Screenshot 2023-04-20 at 15 08 36" src="https://user-images.githubusercontent.com/13547364/233376402-52ed5a3d-631a-4350-9130-61548a8d7b02.png"> </details> <details> <summary>Some plugin usage examples</summary> <img width="1048" alt="Screenshot 2023-04-18 at 01 57 33" src="https://user-images.githubusercontent.com/13547364/233217685-79b262bd-81fd-4641-9ad9-110e8b689e42.png"> <img width="993" alt="Screenshot 2023-04-17 at 23 17 35" src="https://user-images.githubusercontent.com/13547364/233217687-561773a1-b16a-49f5-bdbc-f30b46bed33d.png"> </details> <details open> <summary>Mixed usage example where model chooses not to use plugin on its own</summary> <img width="690" alt="Screenshot 2023-04-20 at 21 31 46" src="https://user-images.githubusercontent.com/13547364/233469420-25c72893-7c7a-426c-9f4b-ce9144d643ae.png"> </details> --------- Co-authored-by: agi <you@example.com> Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com> Co-authored-by: Oliver Stanley <olivergestanley@gmail.com> 2023-05-02 10:21:12 +02:00			`plugins=params.plugins,`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`)`


added hf server 2023-03-19 15:39:31 +01:00			`class GenerateStreamRequest(pydantic.BaseModel):`
			`inputs: str`
			`parameters: GenerateStreamParameters`


added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`class Token(pydantic.BaseModel):`
			`text: str`
Fixes auth TokenPair change bug (also fix for null logprob from inference server) (#2155) 2023-03-21 16:37:38 +01:00			`logprob: float \| None`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`id: int`

			`def __len__(self) -> int:`
			`return len(self.text)`

duplex communication in inference (#2098) 2023-03-17 15:42:06 +01:00			`def to_token_response(self, request_id: str) -> inference.TokenResponse:`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`return inference.TokenResponse(`
duplex communication in inference (#2098) 2023-03-17 15:42:06 +01:00			`request_id=request_id,`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`text=self.text,`
			`log_prob=self.logprob,`
			`token_id=self.id,`
			`)`


			`class StreamDetails(pydantic.BaseModel):`
			`generated_tokens: int`
			`seed: int \| None`
			`finish_reason: Literal["length", "eos_token", "stop_sequence"]`


			`class GenerateStreamResponse(pydantic.BaseModel):`
inference worker overhaul & compliance check (#1727) 2023-02-18 23:26:05 +01:00			`token: Token \| None`
added token buffer for catchiing stop sequences 2023-02-09 23:46:44 +01:00			`generated_text: str \| None`
			`details: StreamDetails \| None`
inference worker overhaul & compliance check (#1727) 2023-02-18 23:26:05 +01:00			`error: str \| None`

			`@property`
			`def is_end(self) -> bool:`
			`return self.generated_text is not None`

			`@property`
			`def is_error(self) -> bool:`
			`return self.error is not None`