Pipeline Module¶
pipelines.pipelines.pipelines.standard_pipelines ¶
BaseStandardPipeline ¶
Base class for pre-made standard pipelines pipelines. This class does not inherit from Pipeline.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
add_node ¶
Add a new node to the pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component |
The object to be called when the data is passed to the node. It can be a pipelines component (like Retriever, Reader, or Generator) or a user-defined object that implements a run() method to process incoming data from predecessor node. |
required | |
name |
str
|
The name for the node. It must not contain any dots. |
required |
inputs |
List[str]
|
A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single edge with a list of documents. It can be represented as ["ElasticsearchRetriever"]. In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output must be specified explicitly as "QueryClassifier.output_2". |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
draw ¶
Create a Graphviz visualization of the pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
the path to save the image. |
Path('pipeline.png')
|
get_document_store ¶
Return the document store object used in the current pipeline.
Returns:
| Type | Description |
|---|---|
Optional[BaseDocumentStore]
|
Instance of DocumentStore or None |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
get_node ¶
Get a node from the Pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
The name of the node. |
required |
get_nodes_by_class ¶
Gets all nodes in the pipeline that are an instance of a certain class (incl. subclasses). This is for example helpful if you loaded a pipeline and then want to interact directly with the document store. Example:
| from pipelines.document_stores.base import BaseDocumentStore
| INDEXING_PIPELINE = Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME)
| res = INDEXING_PIPELINE.get_nodes_by_class(class_type=BaseDocumentStore)
Returns:
| Type | Description |
|---|---|
List[Any]
|
List of components that are an instance of the requested class |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
load_from_yaml
classmethod
¶
load_from_yaml(path: Path, pipeline_name: Optional[str] = None, overwrite_with_env_variables: bool = True)
Load Pipeline from a YAML file defining the individual components and how they're tied together to form
a Pipeline. A single YAML can declare multiple Pipelines, in which case an explicit pipeline_name must
be passed.
Here's a sample configuration:
```yaml
| version: '0.8'
|
| components: # define all the building-blocks for Pipeline
| - name: MyReader # custom-name for the component; helpful for visualization & debugging
| type: FARMReader # pipelines Class name for the component
| params:
| no_ans_boost: -10
| model_name_or_path: ernie-gram-zh-finetuned-dureader-robust
| - name: MyESRetriever
| type: ElasticsearchRetriever
| params:
| document_store: MyDocumentStore # params can reference other components defined in the YAML
| custom_query: null
| - name: MyDocumentStore
| type: ElasticsearchDocumentStore
| params:
| index: pipelines_test
|
| pipelines: # multiple Pipelines can be defined using the components from above
| - name: my_query_pipeline # a simple extractive-qa Pipeline
| nodes:
| - name: MyESRetriever
| inputs: [Query]
| - name: MyReader
| inputs: [MyESRetriever]
```
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
path of the YAML file. |
required |
pipeline_name |
Optional[str]
|
if the YAML contains multiple pipelines, the pipeline_name to load must be set. |
None
|
overwrite_with_env_variables |
bool
|
Overwrite the YAML configuration with environment variables. For example, to change index name param for an ElasticsearchDocumentStore, an env variable 'MYDOCSTORE_PARAMS_INDEX=documents-2021' can be set. Note that an |
True
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run_batch ¶
Run a batch of queries through the pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries |
List[str]
|
List of query strings. |
required |
params |
Optional[dict]
|
Parameters for the individual nodes of the pipeline. For instance, |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default these include the input parameters they received and the output they generated. All debug information can then be found in the dict returned by this method under the key "_debug" |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
save_to_yaml ¶
Save a YAML configuration for the Pipeline that can be used with Pipeline.load_from_yaml().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
path of the output YAML file. |
required |
return_defaults |
bool
|
whether to output parameters that have the default values. |
False
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
set_node ¶
Set the component for a node in the Pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
The name of the node. |
required |
component |
The component object to be set at the node. |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
DocPipeline ¶
Pipeline for document intelligence.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
__init__ ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
preprocessor |
BaseComponent
|
file/image preprocessor instance |
required |
docreader |
BaseComponent
|
document model runner instance |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
the query string. |
required | |
params |
Optional[dict]
|
params for the |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default these include the input parameters they received and the output they generated. All debug information can then be found in the dict returned by this method under the key "_debug" |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
ExtractiveQAPipeline ¶
Pipeline for Extractive Question Answering.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
__init__ ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reader |
BaseReader
|
Reader instance |
required |
retriever |
BaseRetriever
|
Retriever instance |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The search query string. |
required |
params |
Optional[dict]
|
Params for the |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default these include the input parameters they received and the output they generated. All debug information can then be found in the dict returned by this method under the key "_debug" |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
QAGenerationPipeline ¶
Pipeline for semantic search.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
__init__ ¶
__init__(answer_extractor: AnswerExtractor, question_generator: QuestionGenerator, qa_filter: QAFilter)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retriever |
Retriever instance |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
the query string. |
required | |
params |
Optional[dict]
|
params for the |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default these include the input parameters they received and the output they generated. All debug information can then be found in the dict returned by this method under the key "_debug" |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
SemanticSearchPipeline ¶
Pipeline for semantic search.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
__init__ ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retriever |
BaseRetriever
|
Retriever instance |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
the query string. |
required |
params |
Optional[dict]
|
params for the |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default these include the input parameters they received and the output they generated. All debug information can then be found in the dict returned by this method under the key "_debug" |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
SentaPipeline ¶
Pipeline for document intelligence.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
__init__ ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
preprocessor |
BaseComponent
|
file preprocessor instance |
required |
senta |
BaseComponent
|
senta model instance |
required |
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
the query string. |
required | |
params |
Optional[dict]
|
params for the |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default these include the input parameters they received and the output they generated. All debug information can then be found in the dict returned by this method under the key "_debug" |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
TextToImagePipeline ¶
A simple pipeline that takes prompt texts as input and generates images.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
WebQAPipeline ¶
Pipeline for Generative Question Answering performed based on Documents returned from a web search engine.
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
__init__ ¶
__init__(retriever: WebRetriever, prompt_node: PromptNode, sampler: Optional[BaseRanker] = None, shaper: Optional[Shaper] = None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retriever |
WebRetriever
|
The WebRetriever used for retrieving documents from a web search engine. |
required |
prompt_node |
PromptNode
|
The PromptNode used for generating the answer based on retrieved documents. |
required |
shaper |
Optional[Shaper]
|
The Shaper used for transforming the documents and scores into a format that can be used by the PromptNode. Optional. |
None
|
Source code in pipelines/pipelines/pipelines/standard_pipelines.py
run ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The search query string. |
required |
params |
Optional[dict]
|
Params for the |
None
|
debug |
Optional[bool]
|
Whether the pipeline should instruct nodes to collect debug information about their execution. By default, these include the input parameters they received and the output they generated. YOu can then find all debug information in the dict thia method returns under the key "_debug". |
None
|