Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: apache-parquet-utils | Distribution: openSUSE Tumbleweed |
Version: 19.0.1 | Vendor: openSUSE |
Release: 2.1 | Build date: Thu Mar 13 19:57:51 2025 |
Group: Productivity/Scientific/Math | Build host: reproducible |
Size: 180136 | Source RPM: apache-arrow-19.0.1-2.1.src.rpm |
Packager: http://bugs.opensuse.org | |
Url: https://arrow.apache.org/ | |
Summary: Development platform for in-memory data - development files |
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. This package provides utilities for working with the Parquet format.
Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT
* Thu Mar 13 2025 Ben Greiner <code@bnavigator.de> - Add missing dependencies for libboost_process explicitly boo#1239599 * Wed Feb 19 2025 Ben Greiner <code@bnavigator.de> - disable flight because of gh#grpc/grpc#37968 boo#1237422 * Mon Feb 17 2025 Ben Greiner <code@bnavigator.de> - Update to 19.0.1 [#]# Bug Fixes * [C++] Fix overflow issues for large build side in swiss join (#45108) * [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181) * [C++][Parquet] Omit level histogram when max level is 0 (#45285) * [Parquet][C++] Fix statistics load logic for no row group and multiple row groups (#45350) * [C++] Disable Flight test (#45232) [#]# Improvements * [C++][Parquet] Improve performance of generating size statistics (#45202) * [C++][S3] Workaround compatibility issue between AWS SDK and MinIO (#45310) - Release 19.0.0 [#]# New Features and Improvements * [CI][C++] Add a nightly job to test offline build (#44721) * [C++] GcsFileSystem::Make should return Result (#44503) * [C++][Parquet] Implement SizeStatistics (#40594) * [C++] Reduce string inlining in Substrait serde (#45174) * [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input (#44083) * [C++] Support the AWS S3 SSE-C encryption (#43601) * [C++][Parquet] Parquet Metadata Printer supports print sort-columns (#43599) * [C++] Add C++ implementation of Async C Data Interface (#44495) * [C++][Acero] Support AVX2 swiss join decoding (#43832) * [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621) * [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252) * [C++] Improve merge step in chunked sorting (#44217) * [C++][Parquet] Tools: Debug Print for Json should be valid JSON (#44532) * [C++][FS][Azure] Implement SAS token authentication (#45021) * [C++] Don’t export template class (#44365) * [C++][Docs] Update the URL to C++ Development in README.md (#44427) * [C++] Added rvalue-reference-qualified overload for arrow::Result::status() returning value instead of reference (#44477) * [C++] StatusConstant- cheaply copied const Status (#44493) * [C++][Compute] Allow casting struct to bigger nullable struct (#44587) * [C++] Use array type to compute min/max statistics Arrow type (#45094) * [C++] Minor: ArrayData ctor can assign null_count directly (#44582) * [C++] Add const and & to arrow::Array::statistics() return type (#44592) * [Python][C++] Add version suffix to libarrow_python* libraries (#44702) * [C++] NumericBuilder::AppendValues append vector prevent from ub (#44794) * [C++][Parquet] Remove obsolete parquet_constants generated files from old thrift (#44772) * [Docs][C++] Add arrow::ArrayStatistics to API doc (#44764) * [C++] Upgrade ORC to 2.0.3 (#44745) * [C++][Parquet] Add arrow::Result version of parquet::arrow::OpenFile() (#44785) * [C++] Fix a couple of maybe-uninitialized warnings (#44789) * [C++] Use arrow::util::span on arrow::util::bitmap_builders_utilities instead of std::vector (#44796) * [C++][Parquet] Add arrow::Result version of parquet::arrow::FileReader::GetRecordBatchReader() (#44809) * [C++] minor optimize cancel and thread pool (#44812) * [C++][Parquet] Add an example to dump statistics read as arrow::ArrayStatistics (#44816) * [C++] Add the Expm1(exponent) scalar arithmetic function (#44904) * [C++] Add WithinUlp testing functions (#44906) * [C++][Python] Add Hyperbolic Trig functions (#44630) * [C++] Enable mimalloc by default, disable jemalloc by default and more (#44951) * [C++] Add support for building system OpenTelemetry (#44983) * [C++][CMake] Use librt only for Linux (#44984) * [C++] Support for fixed-size list in conversion of range tuple (#45008) * [C++][Parquet] Allow configuring the default footer read size (#45016) * [C++] Remove result_internal.h (#45066) * [FlightRPC][C++] Deprecate InitializeFlightUcx before removing UCX (#45080) * [C++][Parquet] Add GetReadRanges function to FileReader (#45093) * [C++] Apply a cstdint patch to bundled Thrift for GCC 15 (#45097) * [C++] Remove useless “hash table ready” states in swiss join (#45136) * [CI][C++] Add a GCC 15 job (#45138) * [C++] Ensure using cpp/cmake_modules/*.cmake (#45143) * [CI][C++] Upgrade Alpine Linux to 3.18 from 3.16 (#45168) [#]# Bug Fixes * [C++] Fix CopyFiles when destination is a FileSystem with background_writes (#44897) * [C++][Python] Fix ORC crash when file contains unknown timezone (#45051) * [C++] Replace std::aligned_storage that is deprecated in C++23 (#45019) * [C++][Parquet] Refuse writing non-nullable column that contains nulls (#44921) * [C++] Initialize offset vector head as 0 after memory allocated in grouper.cc (#43123) * [C++] io::BufferedInput: Fix invalid state after SetBufferSize (#44387) * [C++][Parquet] Fix schema conversion from two-level encoding nested list (#43995) * [C++] Use “lib” for generating bundled dependencies even with “clang-cl” (#44391) * [C++] Fix unaligned load/store implementation for clang-18 (#44468) * [C++] Use CMAKE_LIBTOOL on macOS (#44385) * [CI][C++] Use setup-python on hosted runner (#44411) * [C++] Update vendored date to 3.0.3 (#44482) * [GLib][C++] Meson searches libraries with specific versions. (#44475) * [C++][Acero] Fix crash when thread in asof_join is not running (#44584) * [C++] NumericArray should not use ctor from parent directly (#44542) * [C++] FunctionOptions::{Serialize,Deserialize}() return an error without ARROW_IPC (#45171) * [C++][Acero] Enhance partition sort example (#44678) * [C++][Python] Fix Flight Timestamp precision, revert workaround from #43537 (#44681) * [C++] Add S3 option to ignore SIGPIPE signals (#44735) * [C++] Keep field metadata for keys and values when importing a map type via the C data interface (#44715) * [C++][CI] Fix arrow-c-bridge-test timeout with threading disabled (#44737) * [C++] Use lowercased windows.h to enable cross-platform builds (#44755) * [C++] Fix Float16.To{Little,Big}Endian on big endian machines (#44768) * [C++][Parquet] Fix read/write of metadata length footer on big-endian systems (#44787) * [C++][CI] Migrate to arrow::Result based parquet::arrow::OpenFile() API in example tutorials (#44807) * [C++] Fix thread-unsafe access in ConcurrentQueue::UnsyncFront (#44849) * [C++] Fix compilation error on GCC 8 (#44899) * [C++][CI] Silence protobuf-generated deprecations (#44955) * [C++] Use recommended downloads URLs for ORC and Thrift (#44977) * [C++] Include path in the documentation is wrong (#45031) * [C++] Remove Parquet requirement from Arrow Acero and from Arrow Dataset when not necessary (#45035) * [C++] Add support for Boost 1.87.0 (#45057) * [C++][CI] Fix test-build-cpp-fuzz failures (#45060) * [C++][Parquet] Fix generation of repetition levels for encryption test data (#45074) * [C++] Avoid static const variable in the status.h (#45100) * [C++][Parquet] Fix Null-dereference READ in parquet::arrow::ListToSchemaField (#45152) * [C++][Release] Add llvm-dev back to setup-ubuntu.sh (#45184) * [C++][Parquet] test-conda-cpp-valgrind fails on arrow-dataset-file-parquet-encryption-test - Release 18.1.0 [#]# Bug Fixes * [C++] Add support for overwriting grpc_cpp_plugin path for cross-compiling (#44507) * [Docs][C++] Fix documentation directive for ChunkLocation (#44505) * [C++] Add find module for abseil that handles missing version (#44613) * [C++][Dev] Update bundled Thrift, update mirrors to use CDN (#44685) [#]# New Features and Improvements * [C++] Move ChunkResolver to the public API (#44357) - Release 18.0.0 [#]# Bug Fixes * [C++] data corruption when using `group_by` and `aggregate` on large data sets * [C++] Use PutObject request for S3 in OutputStream when only uploading small data (#41564) * [C++] Clean up implicit fallthrough warnings (#41892) * [C++] Fix avx2 gather rows more than 2^31 issue in CompareColumnsToRows (#43065) * [C++][ArrowFlight] Crash due to UCS thread mode * [C++] Add workaround for missing Boost dependency of Thrift (#43328) * [C++] Skip not Emscripten ready tests in CSV tests (#43724) * [C++] Add date{32,64} to date{32,64} cast (#43192) * [C++][Compute] Detect and explicit error for offset overflow in row table (#43226) * [C++] Fix decimal benchmarks to avoid out-of-bounds accesses (#43212) * [C++] Resolve Abseil like any other dependency in the build system (#43219) * [C++][Parquet] Refactor parquet::encryption::AesEncryptor to use unique_ptr (#43222) * [C++] Fix Abseil compile error on GCC 13 (#43157) * [C++] Add missing serde methods to Location (#43332) * [C++][Parquet] min-max Statistics doesn’t work well when one of min-max is truncated (#43383) * [C++][Parquet] parquet-dump-footer: Remove redundant link and fix –debug processing (#43375) * [C++] Ensure using bundled GoogleTest when we use bundled GoogleTest (#43465) * [C++][Compute] Fix invalid memory access when resizing var-length buffer in row table (#43415) * [C++][FlightRPC] Fix Flight UCX build issues (#43430) * [C++] FIlter out zero length buffers on gRPC transport (#43448) * [C++][Gandiva] Always use gdv_function_stubs.h in context_helper.cc (#43464) * [C++] Add support for the official LZ4 CMake package (#43468) * [C++] Register the new Opaque extension type by default (#43788) * [C++][Acero] Fix typos in join benchmark (#43871) * [C++][CI] Catch potential integer overflow in PoolBuffer (#43886) * [C++] Leak S3 structures if finalization happens too late (#44090) * [C++][Parquet] Fix reported metrics in parquet-arrow-reader-writer-benchmark (#44082) * [C++] Don’t use Boost.Process with Emscripten (#44097) * [C++] Add home made _mm256_set_m128i for compilers who are missing it (#44116) * [C++] JsonExtensionType equality check ignores storage type (#44215) * [CI][C++][AppVeyor] Use conda instead of Mamba (#44235) * [C++][FS][Azure] Fix edgecase where GetFileInfo incorrectly returns NotFound on flat namespace and Azurite (#44302) * [C++][FS][Azure] Catch missing exceptions on HNS support check (#44274) * [C++][FS][Azure] Fix minor hierarchical namespace bugs (#44307) * [C++] Fix S3 error handling in ObjectOutputStream (#44335) * [C++] Disable jemalloc by default on ARM (#44380) [#]# New Features and Improvements * [C++][Python] Native support for UUID (#37298) * [C++][Python] Bool8 Extension Type Implementation (#43488) * [C++][Parquet] Add JSON canonical extension type (#13901) * [C++][Compute] Replace explicit checking with DCHECK for invariants in row segmenter (#44236) * [C++][CI] Improve IPC fuzzing seed corpus (#43621) * [Documentation][C++] Explicitly note that compute is optional (#43629) * [C++] Azure file system write buffering & async writes (#43096) * [C++][Parquet] Separate encoders and decoder (#43972) * [C++][Python][Parquet] Support reading/writing key-value metadata from/to ColumnChunkMetaData (#41580) * [Docs][C++] Is arrow::dataset namespace still experimental? * [C++] Add arrow::ArrayStatistics (#43273) * [CI][C++] Update Minio version (#44225) * [C++][Parquet] Add binary that extracts a footer from a parquet file (#42174) * [C++] Support casting to and from utf8_view/binary_view (#43302) * [C++] Update bundled vendor/datetime to support for building with libc++ and C++20 (#43094) * [C++] Implement PathFromUri support for Azure file system (#43098) * [C++][Compute] Fix the unnecessary allocation of extra bytes when encoding row table (#43125) * [C++][Parquet] Replace use of int with int32_t in the internal Parquet encryption APIs (#43413) * [C++][Parquet] Refactor Encryptor API to use arrow::util::span instead of raw pointers (#43195) * [C++][Parquet] Default initialize some parquet metadata variables (#43144) * [C++] Fix CMake link order for AWS SDK (#43230) * [C++] Suggest a cast when Concatenate fails due to offsets overflow (#43190) * [C++] Support basic is_in predicate simplification (#43761) * [C++][AzureFS] Ignore password field in URI (#44220) * [C++] Add lint for DCHECK in public headers (#43248) * [C++][FlightRPC] Reduce repetition in flight/types.cc in serde functions (#43237) * [C++][Parquet] remove useless template parameter of DeltaLengthByteArrayEncoder (#43250) * [C++] Always prefer mimalloc to jemalloc (#40875) * [C++][Flight] Use a Base CRTP type for the types used in RPC calls (#43255) * [C++] Expand the ‘take’ function tests to cover more chunked-array cases (#43292) * [C++][Parquet] Enhance the comment for ColumnReader/Decoder (#44003) * [C++] Order classes in flight/types.h according to Flight.proto (#43330) * [C++][Parquet] Deprecate ColumnChunk::file_offset field and no longer write Metadata at end of Chunk (#43428) * [C++] Add benchmark for binary view builder (#43445) * [C++][Python] Add Opaque canonical extension type (#43458) * [Java][C++] Support more CsvFragmentScanOptions in JNI call (#43482) * [C++] Thirdparty: Bump lz4 to 1.10.0 (#43493) * [C++][Compute] Widen the row offset of the row table to 64-bit (#43389) * [C++] Use ViewOrCopyTo instead of CopyTo when pretty printing non-CPU data (#43508) * [FlightRPC][C++] Reduce the number of references to protobuf::Any (#43544) * [C++] Simplify arrow::ArrayStatistics::ValueType (#43581) * [C++][GLib] Don’t install arrow-cuda.pc/arrow-cuda-glib.pc on Windows (#43593) * [C++] Remove redundant default constructor/deconstructor in arrow::ArrayStatistics (#43579) * [C++] Remove std::optional from arrow::ArrayStatistics::is_{min,max}_exact (#43595) * [C++][FlightRPC] Move the FlightTestServer to its own .cc and .h files (#43678) * [C++] Compute: fix register kernel SimdLevel for AddMinMax512AggKernels (#43704) * [C++] Prevent Snappy from disabling RTTI when bundled (#43706) * [C++][FS][Azure] Use the latest Azurite and update the bundled Azure SDK for C++ to azure-identity_1.9.0 (#43723) * [C++][Parquet][CI] Parquet: Introducing more bad_data for testing (#43708) * [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() (#43726) * [C++] Clarify the way SIMD-enabled agg kernels come from the same code in different compilation units (#43720) * [C++] Fix Scalar boolean handling in row encoder (#43734) * [C++] Add support for Boost 1.86 (#43766) * [C++] Compute: More comment in RowEncoder (#43763) * [C++] Acero: Minor code enhancement for Join (#43760) * [C++] Fix the case when boolean_{any all} meets constant input with length in Acero (#43799) * [C++] Add chunked Take benchmarks with a small selection factor (#43772) * [C++] Indent preprocessor directives (#43798) * [C++] Attach arrow::ArrayStatistics to arrow::ArrayData (#43801) * [C++] Enable filesystem automatically when one of ARROW_{AZURE,GCS,HDFS,S3}=ON is specified (#43806) * [C++] Expose the set of device types where a ChunkedArray is allocated (#43853) * [C++] Make ChunkResolver::ResolveMany output a list of ChunkLocations (#43928) * [C++][Parquet] Add support for arrow::ArrayStatistics: non zero-copy int based types (#43945) * [C++][Parquet] Guard against use of cleared decryptor/encryptor (#43947) * [C++] Add tests based on random data and benchmarks to ChunkResolver::ResolveMany (#43954) * [C++] Enhance error message for URI parsing (#43938) * [CI][C++][Dev] Add cpplint to pre-commit (#43982) * [C++][Parquet] Add support for arrow::ArrayStatistics: zero-copy types (#43984) * [C++][Acero] Some code cleanup to Grouper (#43988) * [C++] Add missing std::move() in array_nested.cc (#43993) * [C++][Docs] Add missing install command in building docs (#44000) * [C++][Parquet] Add support for arrow::ArrayStatistics: boolean (#44009) * [C++] IPC: ipc reader/writer code enhancement (#44019) * [C++][Compute] Reduce the complexity of row segmenter (#44053) * [C++][Parquet] Add Float16 reading benchmarks (#44073) * [C++][Parquet] Remove deprecated APIs (#44080) * [C++][Acero] Add more row segmenter tests (#44166) * [C++][Parquet] Fix typo in parquet/column_writer.cc (#40856) * [C++] Avoid repeated ArrayData::offset lookups (#44190) * [C++][Gandiva] Accept LLVM 19.1 (#44233) * [C++] Unify simd header includings (#44250) * [C++][Decimal] Use 0E+1 not 0.E+1 for broader compatibility (#44275) * [Packaging][C++] Enable Azure file system for deb/rpm (#44348) - Drop apache-arrow-pr43766-boost1_86.patch - Release notes for 18.0.0 and 19.0.0 * Fri Sep 27 2024 Guang Yee <gyee@suse.com> - Set the appropriate C++ complier for the given platform so it will compile on Leap 15.x. * Wed Sep 18 2024 Ben Greiner <code@bnavigator.de> - Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86 * gh#apache/arrow#43766 * Mon Aug 12 2024 Ben Greiner <code@bnavigator.de> - Update to 17.0.0 [#]# Bug Fixes * [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449) * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType * [C++] Use LargeStringArray for casting when writing tables to CSV (#40271) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060) * [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998) * [C++] Clean up unused parameter warnings (#41111) * [C++][Acero] Fix asof join race (#41614) * [C++] support for single threaded joins (#41125) * [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195) * [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452) * [C++] Fix crash on invalid Parquet file (#41366) * [C++][Parquet] More strict Parquet level checking (#41346) * [C++][Gandiva] Fix gandiva cache size env var (#41330) * [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341) * [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345) * [C++][maybe_unused] with Arrow macro (#41359) * [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419) * [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434) * [C++] Reuse deduplication logic for direct registration (#41466) * [C++] Clean up more redundant move warnings (#41487) * [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380) * [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582) * [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712) * [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716) * [C++] Fix the issue that temp vector stack may be under sized (#41746) * [C++] Check that extension metadata key is present before attempting to delete it (#41763) * [C++] Iterator releases its resource immediately when it reads all values (#41824) * [C++][Flight][Benchmark] Ensure waiting server ready (#41793) * [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188) * [C++][S3] Fix potential deadlock when closing output stream (#41876) * [CI][C++] Clear cache for mamba on AppVeyor (#41977) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [C++] Support list-views on list_slice (#42067) * [C++] Fix an OTel test failure and remove needless logs (#42122) * [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108) * [C++] Support list-view typed arrays in array_take and array_filter (#42117) * [C++] Fix some potential uninitialized variable warnings (#42207) * [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141) * [C++] Use FetchContent for bundled ORC (#43011) * [C++] Fix GetRecordBatchPayload crashes for device data (#42199) * [C++] Use non-stale c-ares download URL (#42250) * [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071) * [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128) * [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136) [#]# New Features and Improvements * [C++][Compute] Implement Grouper::Reset (#41352) * [Go][C++] Implement Flight SQL Bulk Ingestion (#38385) * [C++][FS][Azure] Support azure cli auth (#41976) * [C++][FS][Azure] Add support for environment credential (#41715) * [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297) * [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477) * [C++] Add support for OpenTelemetry logging (#39905) * [C++] Import/Export ArrowDeviceArrayStream (#40807) * [C++] move LocalFileSystem to the registry (#40356) * [C++] Make flatbuffers serialization more deterministic (#40392) * [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970) * [C++] Introduce portable compiler assumptions (#41021) * [C++] Add a grouper benchmark for preventing performance regression (#41036) * [C++] Support flatten for combining nested list related types (#41092) * [C++] Clean up remaining tasks related to half float casts (#41084) * [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276) * [C++] Add is_validity_defined_by_bitmap() predicate (#41115) * [C++] IO: enhance boundary checking in CompressedInputStream (#41117) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187) * [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373) * [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335) * [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362) * [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411) * [C++] Use ASAN to poison temp vector stack memory (#41695) * [C++][S3] Add a new option to check existence before CreateDir (#41822) * [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546) * [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548) * [C++] Improve fixed_width_test_util.h (#41575) * [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561) * [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [C++][CMake][Windows] Don’t build needless object libraries (#41658) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703) * [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727) * [C++][ORC] Ensure setting detected ORC version (#41767) * [C++][Parquet] Add file metadata read/write benchmark (#41761) * [C++] Make git-dependent definitions internal (#41781) * [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798) * [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819) * [C++] IPC: Minor enhance the code of writer (#41900) * [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925) * [C++] Minor enhance code style for FixedShapeTensorType (#41954) * [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956) * [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971) * [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995) * [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981) * [C++][CMake] Add preset for Valgrind (#42110) * [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127) * [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135) * [C++] list_parent_indices: Add support for list-view types (#42236) * [C++] Reduce the recursion of many-join test (#43042) * [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064) - Require cmake lz4 for 1.10 * Sun Apr 21 2024 Ben Greiner <code@bnavigator.de> - Update to 16.0.0 [#]# Bug Fixes * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) * [C++][S3] Handle conventional content-type for directories (#40147) * [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371) * [C++] Avoid hash_mean overflow (#39349) * [C++] Fix spelling (array) (#38963) * [C++][Parquet] Fix crash in Modular Encryption (#39623) * [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794) * [C++][Device] Fix Importing nested and string types for DeviceArray (#39770) * [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783) * [C++] Improve error message for "chunker out of sync" condition (#39892) * [C++] Use make -j1 to install bundled bzip2 (#39956) * [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995) * [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975) * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041) * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054) * [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086) * [C++] Decimal types with different precisions and scales bind failed in resolve type when call arithmetic function (#40223) * [C++][Docs] Correct the console emitter link (#40146) * [C++][Python] Fix test_gdb failures on 32-bit (#40293) * [Python][C++] Fix large file handling on 32-bit Python build (#40176) * [C++] Support glog 0.7 build (#40230) * [C++] Fix cast function bind failed after add an alias name through AddAlias (#40200) * [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA (#40206) * [C++] Fix an abort on asof_join_benchmark run for lost an arg (#40234) * [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277) * [C++] Reduce S3Client initialization time (#40299) * [C++] Fix a wrong total_bytes to generate StringType's test data in vector_hash_benchmark (#40307) * [C++][Gandiva] Add support for compute module's decimal promotion rules (#40434) * [C++][Parquet] Add missing config.h include in key_management_test.cc (#40330) * [C++][CMake] Add missing glog::glog dependency to arrow_util (#40332) * [C++][Gandiva] Add missing OpenSSL dependency to encrypt_utils_test.cc (#40338) * [C++] Remove const qualifier from Buffer::mutable_span_as (#40367) * [C++] Avoid simplifying expressions which call impure functions (#40396) * [C++] Expose protobuf dependency if opentelemetry or ORC are enabled (#40399) * [C++][FlightRPC] Add missing expiration_time arguments (#40425) * [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users (#40484) * [C++] Add missing Threads::Threads dependency to arrow_static (#40433) * [C++] Fix static build on Windows (#40446) * [C++] Ensure using bundled FlatBuffers (#40519) * [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559) * [C++] Repair FileSystem merge error (#40564) * [C++] Fix 3.12 Python support (#40322) * [C++] Move mold linker flags to variables (#40603) * [C++] Enlarge dest buffer according to dest offset for CopyBitmap benchmark (#40769) * [C++][Gandiva] 'ilike' function does not work (#40728) * [C++] Fix protobuf package name setting for builds with substrait (#40753) * [C++][ORC] Fix std::filesystem related link error with ORC 2.0.0 or later (#41023) * [C++] Fix TSAN link error for module library (#40864) * [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with Valgrind (#41163) * [C++] Fix null count check in BooleanArray.true_count() (#41070) * [C++] IO: fixing compiling in gcc 7.5.0 (#41025) * [C++][Parquet] Bugfixes and more tests in boolean arrow decoding (#41037) * [C++] formatting.h: Make sure space is allocated for the 'Z' when formatting timestamps (#41045) * [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 (#41062) * [C++] Fix: left anti join filter empty rows. (#41122) * [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151) * [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150) * [CI][R][C++] test-r-linux-valgrind has started failing * [C++][Python] Sporadic asof_join failures in PyArrow * [C++] Fix Valgrind error in string-to-float16 conversion (#41155) * [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake (#41177) * [C++] Fix mistake in integration test. Explicitly cast std::string to avoid compiler interpreting char* -> bool (#41202) [#]# New Features and Improvements * [C++] Filesystem implementation for Azure Blob Storage * [C++] Implement cast to/from halffloat (#40067) * [C++] Add residual filter support to swiss join (#39487) * [C++] Add support for building with Emscripten (#37821) * [C++][Python] Add missing methods to RecordBatch (#39506) * [C++][Java][Flight RPC] Add Session management messages (#34817) * [C++] build filesystems as separate modules (#39067) * [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd (#40335) * [C++] Add support for service-specific endpoint for S3 using AWS_ENDPOINT_URL_S3 (#39160) * [C++][FS][Azure] Implement DeleteFile() (#39840) * [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API (#39904) * [C++] Add ImportChunkedArray and ExportChunkedArray to/from ArrowArrayStream (#39455) * [CI][C++][Go] Don't run jobs that use a self-hosted GitHub Actions Runner on fork (#39903) * [C++][FS][Azure] Use the generic filesystem tests (#40567) * [C++][Compute] Add binary_slice kernel for fixed size binary (#39245) * [C++] Avoid creating memory manager instance for every buffer view/copy (#39271) * [C++][Parquet] Minor: Style enhancement for parquet::FileMetaData (#39337) * [C++] IO: Reuse same buffer in CompressedInputStream (#39807) * [C++] Use more permissable return code for rename (#39481) * [C++][Parquet] Use std::count in ColumnReader ReadLevels (#39397) * [C++] Support cast kernel from large string, (large) binary to dictionary (#40017) * [C++] Pass -jN to make in external projects (#39550) * [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT (#39570) * [C++] Ensure top-level benchmarks present informative metrics (#40091) * [C++] Ensure CSV and JSON benchmarks present a bytes/s or items/s metric (#39764) * [C++] Ensure dataset benchmarks present a bytes/s or items/s metric (#39766) * [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric (#40435) * [C++][Parquet] Benchmark levels decoding (#39705) * [C++][FS][Azure] Remove StatusFromErrorResponse as it's not necessary (#39719) * [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic (#39748) * [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types (#39772) * [C++] Document and micro-optimize ChunkResolver::Resolve() (#39817) * [C++] Allow building cpp/src/arrow/**/*.cc without waiting bundled libraries (#39824) * [C++][Parquet] Parquet binary length overflow exception should contain the length of binary (#39844) * [C++][Parquet] Minor: avoid creating a new Reader object in Decoder::SetData (#39847) * [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878) * [C++] DataType::ToString support optionally show metadata (#39888) * [C++][Gandiva] Accept LLVM 18 (#39934) * [C++] Use Requires instead of Libs for system RE2 in arrow.pc (#39932) * [C++] Small CSV reader refactoring (#39963) * [C++][Parquet] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094) * [C++][FS][Azure] Add support for reading user defined metadata (#40671) * [C++][FS][Azure] Add AzureFileSystem support to FileSystemFromUri() (#40325) * [C++][FS][Azure] Make attempted reads and writes against directories fail fast (#40119) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803) * [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts (#40075) * [CI][C++] Add a job on ARM64 macOS (#40456) * [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding (#40127) * [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length (#40132) * [C++] Make S3 narrative test more flexible (#40144) * [C++] Remove redundant invocation of BatchesFromTable (#40173) * [C++][CMake] Use "RapidJSON" CMake target for RapidJSON (#40210) * [C++][CMake] Use arrow/util/config.h.cmake instead of add_definitions() (#40222) * [C++] Fix: improve the backpressure handling in the dataset writer (#40722) * [C++][CMake] Improve description why we need to initialize AWS C++ SDK in arrow-s3fs-test (#40229) * [C++] Add support for system glog 0.7 (#40275) * [C++] Specialize ResolvedChunk::Value on value-specific types instead of entire class (#40281) * [C++][Docs] Add documentation of array factories (#40373) * [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed (#40329) * [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection (#40084) * [C++] Add benchmark for ToTensor conversions (#40358) * [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372) * [C++] Add support for mold (#40397) * [C++] Add support for LLD (#40927) * [C++] Produce better error message when Move is attempted on flat-namespace accounts (#40406) * [C++][ORC] Upgrade ORC to 2.0.0 (#40508) * [CI][C++] Don't install FlatBuffers (#40541) * [C++] Ensure pkg-config flags include -ldl for static builds (#40578) * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) * [C++] Rename Function::is_impure() to is_pure() (#40608) * [C++] Add missing util/config.h in arrow/io/compressed_test.cc (#40625) * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661) * [C++] Expand Substrait type support (#40696) * [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import (#40699) * [C++][Parquet] Minor enhancement code of encryption (#40732) * [C++][Parquet] Simplify PageWriter and ColumnWriter creation (#40768) * [C++] Re-order loads and stores in MemoryPoolStats update (#40647) * [C++] Revert changes from PR #40857 (#40980) * [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857) * [C++] Thirdparty: bump zstd to 1.5.6 (#40837) * [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867) * [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder (#40876) * [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes (#40883) * [C++] Fix unused function build error (#40984) * [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls (#40995) * [C++][FS][Azure] Adjust DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors against Azure for generic filesystem tests (#41068) * [C++][Parquet] Avoid allocating buffer object in RecordReader's SkipRecords (#39818) - Drop apache-arrow-pr40230-glog-0.7.patch - Drop apache-arrow-pr40275-glog-0.7-2.patch - Belated inclusion of submission without changelog by Shani Hadiyanto <shanipribadi@gmail.com>) * disable static devel packages by default: The CMake targets require them for all builds, if not disabled * Add subpackages for Apache Arrow Flight and Flight SQL * Sat Mar 23 2024 Ben Greiner <code@bnavigator.de> - Update to 15.0.2 [#]# Bug Fixes * [C++][Acero] Increase size of Acero TempStack (#40007) * [C++][Dataset] Add missing Protobuf static link dependency (#40015) * [C++] Possible data race when reading metadata of a parquet file (#40111) * [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253) * Wed Feb 28 2024 Ben Greiner <code@bnavigator.de> - Reenable logging * Add apache-arrow-pr40230-glog-0.7.patch * Add apache-arrow-pr40275-glog-0.7-2.patch * now requires glog devel files to be present for apache-arrow-devel; ArrowConfig.cmake fails otherwise * gh#apache/arrow#40181 * gh#apache/arrow#40230 * gh#apache/arrow#40275 * Fri Feb 23 2024 Ben Greiner <code@bnavigator.de> - Update to 15.0.1 [#]# Bug Fixes * [C++] "iso_calendar" kernel returns incorrect results for array length > 32 (#39360) * [C++] Explicit error in ExecBatchBuilder when appending var length data exceeds offset limit (int32 max) (#39383) * [C++][Parquet] Pass memory pool to decoders (#39526) * [C++][Parquet] Validate page sizes before truncating to int32 (#39528) * [C++] Fix tail-word access cross buffer boundary in `CompareBinaryColumnToRow` (#39606) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585) * [Release] Update platform tags for macOS wheels to macosx_10_15 (#39657) * [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711) * [C++] Fix tail-byte access cross buffer boundary in key hash avx2 (#39800) * [C++][Acero] Fix AsOfJoin with differently ordered schemas than the output (#39804) * [C++] Expression ExecuteScalarExpression execute empty args function with a wrong result (#39908) * [C++] Strip extension metadata when importing a registered extension (#39866) * [C#] Restore support for .NET 4.6.2 (#40008) * [C++] Fix out-of-line data size calculation in BinaryViewBuilder::AppendArraySlice (#39994) * [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175) [#]# New Features and Improvements * [C++] PollFlightInfo does not follow rule of 5 * [C++] Fix filter and take kernel for month_day_nano intervals (#39795) * [C++] Thirdparty: Bump zlib to 1.3.1 (#39877) * [C++] Add missing "#include <algorithm>" (#40010) - Release 15.0.0 [#]# Bug Fixes * [C++] Bring back case_when tests for union types (#39308) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (#39234) * [C++][Python] Add a no-op kernel for dictionary_encode(dictionary) (#38349) * [C++] Use the latest tagged version of flatbuffers (#38192) * [C++] Don't use MSVC_VERSION to determin - fms-compatibility-version (#36595) * [C++] Optimize hash kernels for Dictionary ChunkedArrays (#38394) * [C++][Gandiva] Avoid registering exported functions multiple times in gandiva (#37752) * [C++][Acero] Fix race condition caused by straggling input in the as-of-join node (#37839) * [C++][Parquet] add more closed file checks for ParquetFileWriter (#38390) * [C++][FlightRPC] Add missing app_metadata arguments (#38231) * [C++][Parquet] Fix Valgrind memory leak in arrow-dataset-file-parquet-encryption-test (#38306) * [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL 1.1 (#38379) * [C++] Re-generate flatbuffers C++ for Skyhook (#38405) * [C++] Avoid passing null pointer to LZ4 frame decompressor (#39125) * [C++] Add missing explicit size_t cast for i386 (#38557) * [C++] Fix: add TestingEqualOptions for gtest functions. (#38642) * [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva (#38698) * [C++] Protect against PREALLOCATE preprocessor defined on macOS (#38760) * [C++] Check variadic buffer counts in bounds (#38740) * [C++][FS][Azure] Do nothing for CreateDir("/container", true) (#38783) * Fix TestArrowReaderAdHoc.ReadFloat16Files to use new uncompressed files (#38825) * [C++] S3FileSystem export s3 sdk config "use_virtual_addressing" to arrow::fs::S3Options (#38858) * [C++][Gandiva] Fix Gandiva to_date function's validation for supress errors parameter (#38987) * [C++][Parquet] Fix spelling (#38959) * [C++] Fix spelling (acero) (#38961) * [C++] Fix spelling (compute) (#38965) * [C++] Fix spelling (util) (#38967) * [C++] Fix spelling (dataset) (#38969) * [C++] Fix spelling (filesystem) (#38972) * [C++] Fix spelling (#38978) * [C++] Fix spelling (#38980) * [C++][Acero] union node output batches should be unordered (#39046) * [C++][CI] Fix Valgrind failures (#39127) * [C++] Remove needless system Protobuf dependency with - DARROW_HDFS=ON (#39137) * [C++][Compute] Fix negative duration division (#39158) * [C++] Add missing data copy in StreamDecoder::Consume(data) (#39164) * [C++] Remove compiler warnings with -Wconversion - Wno-sign-conversion in public headers (#39186) * [C++][Benchmarking] Remove hardcoded min times (#39307) * [C++] Don't use "if constexpr" in lambda (#39334) * [C++] Disable -Werror=attributes for Azure SDK's identity.hpp (#39448) * [C++] Fix compile warning (#39389) * [CI][JS] Force node 20 on JS build on arm64 to fix build issues (#39499) * [C++] Disable parallelism for jemalloc external project (#39522) * [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering (#39632) * [C++] Disable parallelism for all `make`-based externalProjects when CMake >= 3.28 is used [#]# New Features and Improvements * [C++][JSON] Change the max rows to Unlimited(int_32) (#38582) * [C++][Python] Add "Z" to the end of timestamp print string when tz defined (#39272) * [C++][Python] DLPack implementation for Arrow Arrays (producer) (#38472) * [C++] Diffing of Run-End Encoded arrays (#35003) * [C++][Python][R] Allow users to adjust S3 log level by environment variable (#38267) * [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats (#35345) * [C++] Use Cast() instead of CastTo() for Scalar in test (#39044) * [C++][Python][Parquet] Implement Float16 logical type (#36073) * [C++] Add Utf8View and BinaryView to the c ABI (#38443) * [C++][Parquet] Add api to get RecordReader from RowGroupReader (#37003) * [C++] Expose a span converter for Buffer and ArraySpan (#38027) * [C++] Add A Dictionary Compaction Function For DictionaryArray (#37418) * [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970) * [C++] Implement file reads for Azure filesystem (#38269) * [C++][Integration] Add C++ Utf8View implementation (#37792) * [C++][Gandiva] Add external function registry support (#38116) * [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT (#39098) * [C++] Feature: support concatenate recordbatches. (#37896) * [C++] Add support for specifying custom Array opening and closing delimiters to arrow::PrettyPrintDelimiters (#38187) * [R] Allow code() to return package name prefix. (#38144) * [C++][Benchmark] Add non-stream Codec Compression/Decompression (#38067) * [C++][Parquet] Change DictEncoder dtor checking to warning log (#38118) * [C++][Parquet] Support reading parquet files with multiple gzip members (#38272) * [C++][Parquet] check the decompressed page size same as size in page header (#38327) * [C++][Azure] Use properties for input stream metadata (#38524) * [C++][FS][Azure] Implement file writes (#38780) * [C++] Implement GetFileInfo for a single file in Azure filesystem (#38505) * [C++][CMake] Use transitive dependency for system GoogleTest (#38340) * [C++][Parquet] Use new encrypted files for page index encryption test (#38347) * Add validation logic for offsets and values to arrow.array.ListArray.fromArrays (#38531) * [C++][Acero] Create a sorted merge node (#38380) * [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression (#38453) * [C++] Support LogicalNullCount for DictionaryArray (#38681) * [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529) * [C++][Gandiva] Support registering external C functions (#38632) * [C++] Implement GetFileInfo(selector) for Azure filesystem (#39009) * [C++][FS][Azure] Implement CreateDir() (#38708) * [C++][FS][Azure] Implement DeleteDir() (#38793) * [C++][FS][Azure] Implement DeleteDirContents() (#38888) * [C++] : Implement AzureFileSystem::DeleteRootDirContents (#39151) * [C++][FS][Azure] Implement CopyFile() (#39058) * [C++][Go][Parquet] Add tests for reading Float16 files in parquet-testing (#38753) * [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773) * [C++] Implement directory semantics even when the storage account doesn't support HNS (#39361) * [C++][Parquet] Update parquet.thrift to sync with 2.10.0 (#38815) * [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to ARROW_WITH_ZLIB (#38853) * [C++][Parquet] Using length to optimize bloom filter read (#38863) * [C++][Parquet] Minor: making parquet TypedComparator operation as const method (#38875) * [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed (#38885) * [C++][Parquet] Move EstimatedBufferedValueBytes from TypedColumnWriter to ColumnWriter (#39055) * [C++] Stop installing internal bpacking_simd* headers (#38908) * [C++][Gandiva] Refactor function holder to return arrow Result (#38873) * [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test (#39362) * [C++] Use Cast() instead of CastTo() for Timestamp Scalar in test (#39060) * [C++] Use Cast() instead of CastTo() for List Scalar in test (#39353) * [C++][Parquet] Support row group filtering for nested paths for struct fields (#39065) * [C++] Refactor the Azure FS tests and filesystem class instantiation (#39207) * [C++][Parquet] Optimize FLBA record reader (#39124) * Create module info compiler plugin (#39135) * [C++] : Try to make Buffer::device_type_ non-optional (#39150) * [C++][Parquet] Remove deprecated AppendRowGroup(int64_t num_rows) (#39209) * [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized RowGroup (#39211) * [C++] Support binary to fixed_size_binary cast (#39236) * [C++][Azure][FS] Add default credential auth configuration (#39263) * [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+ (#39269) * [C++][FS] : Remove the AzureBackend enum and add more flexible connection options (#39293) * [C++][FS] : Inform caller of container not-existing when checking for HNS support (#39298) * [C++][FS][Azure] Add workload identity auth configuration (#39319) * [C++][FS][Azure] Add managed identity auth configuration (#39321) * [C++] Forward arguments to ExceptionToStatus all the way to Status::FromArgs (#39323) * [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure test (#39379) * [C++] Add ForceCachedHierarchicalNamespaceSupport to help with testing (#39340) * [C++][FS][Azure] Add client secret auth configuration (#39346) * [C++] Reduce function.h includes (#39312) * [C++] Use Cast() instead of CastTo() for Parquet (#39364) * [C++][Parquet] Vectorize decode plain on FLBA (#39414) * [C++][Parquet] Style: Using arrow::Buffer data_as api rather than reinterpret_cast (#39420) * [C++][ORC] Upgrade ORC to 1.9.2 (#39431) * [C++] Use default Azure credentials implicitly and support anonymous credentials explicitly (#39450) * [C++][Parquet] Allow reading dictionary without reading data via ByteArrayDictionaryRecordReader (#39153) - Disable logging until compatibility with glog is restored gh#apache/arrow#40181 * Mon Jan 15 2024 Ben Greiner <code@bnavigator.de> - Update to 14.0.2 [#]# New Features and Improvements * GH-38449 - [Release][Go][macOS] Use local test data if possible (#38450) * GH-38591 - [Parquet][C++] Remove redundant open calls in ParquetFileFormat::GetReaderAsync (#38621) [#]# Bug Fixes * GH-38345 - [Release] Use local test data for verification if possible (#38362) * GH-38438 - [C++] Dataset: Trying to fix the async bug in Parquet dataset (#38466) * GH-38577 - Reading parquet file behavior change from 13.0.0 to 14.0.0 * GH-38618 - [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories (#38845) * GH-38861 - [C++] Add missing “-framework Security” to Libs.private in arrow.pc (#38869) * GH-39072 - [Release][CI] Python3.11-devel is required for the verification job on AlmaLinux 8 (#39073) * GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS (#39082) * Thu Jan 11 2024 pgajdos@suse.com - disable some tests for s390x [bsc#1218592] * Mon Nov 13 2023 Ondřej Súkup <mimi.vx@gmail.com> - update 14.0.1 * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests * GH-38607 - [Python] Disable PyExtensionType autoload - update to 14.0.1 * very long list of changes can be found here: https://arrow.apache.org/release/14.0.0.html * Fri Aug 25 2023 Ben Greiner <code@bnavigator.de> - Update to 13.0.0 [#]# Acero * Handling of unaligned buffers is input nodes can be configured programmatically or by setting the environment variable ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when an unaligned buffer is detected GH-35498. [#]# Compute * Several new functions have been added: - aggregate functions “first”, “last”, “first_last” GH-34911; - vector functions “cumulative_prod”, “cumulative_min”, “cumulative_max” GH-32190; - vector function “pairwise_diff” GH-35786. * Sorting now works on dictionary arrays, with a much better performance than the naive approach of sorting the decoded dictionary GH-29887. Sorting also works on struct arrays, and nested sort keys are supported using FieldRed GH-33206. * The check_overflow option has been removed from CumulativeSumOptions as it was redundant with the availability of two different functions: “cumulative_sum” and “cumulative_sum_checked” GH-35789. * Run-end encoded filters are efficiently supported GH-35749. * Duration types are supported with the “is_in” and “index_in” functions GH-36047. They can be multiplied with all integer types GH-36128. * “is_in” and “index_in” now cast their inputs more flexibly: they first attempt to cast the value set to the input type, then in the other direction if the former fails GH-36203. * Multiple bugs have been fixed in “utf8_slice_codeunits” when the stop option is omitted GH-36311. [#]# Dataset * A custom schema can now be passed when writing a dataset GH-35730. The custom schema can alter nullability or metadata information, but is not allowed to change the datatypes written. [#]# Filesystems * The S3 filesystem now writes files in equal-sized chunks, for compatibility with Cloudflare’s “R2” Storage GH-34363. * A long-standing issue where S3 support could crash at shutdown because of resources still being alive after S3 finalization has been fixed GH-36346. Now, attempts to use S3 resources (such as making filesystem calls) after S3 finalization should result in a clean error. * The GCS filesystem accepts a new option to set the project id GH-36227. [#]# IPC * Nullability and metadata information for sub-fields of map types is now preserved when deserializing Arrow IPC GH-35297. [#]# Orc * The Orc adapter now maps Arrow field metadata to Orc type attributes when writing, and vice-versa when reading GH-35304. [#]# Parquet * It is now possible to write additional metadata while a ParquetFileWriter is open GH-34888. * Writing a page index can be enabled selectively per-column GH-34949. In addition, page header statistics are not written anymore if the page index is enabled for the given column GH-34375, as the information would be redundant and less efficiently accessed. * Parquet writer properties allow specifying the sorting columns GH-35331. The user is responsible for ensuring that the data written to the file actually complies with the given sorting. * CRC computation has been implemented for v2 data pages GH-35171. It was already implemented for v1 data pages. * Writing compliant nested types is now enabled by default GH-29781. This should not have any negative implication. * Attempting to load a subset of an Arrow extension type is now forbidden GH-20385. Previously, if an extension type’s storage is nested (for example a “Point” extension type backed by a struct<x: float64, y: float64>), it was possible to load selectively some of the columns of the storage type. [#]# Substrait * Support for various functions has been added: “stddev”, “variance”, “first”, “last” (GH-35247, GH-35506). * Deserializing sorts is now supported GH-32763. However, some features, such as clustered sort direction or custom sort functions, are not implemented. [#]# Miscellaneous * FieldRef sports additional methods to get a flattened version of nested fields GH-14946. Compared to their non-flattened counterparts, the methods GetFlattened, GetAllFlattened, GetOneFlattened and GetOneOrNoneFlattened combine a child’s null bitmap with its ancestors’ null bitmaps such as to compute the field’s overall logical validity bitmap. * In other words, given the struct array [null, {'x': null}, {'x': 5}], FieldRef("x")::Get might return [0, null, 5] while FieldRef("y")::GetFlattened will always return [null, null, 5]. * Scalar::hash() has been fixed for sliced nested arrays GH-35360. * A new floating-point to decimal conversion algorithm exhibits much better precision GH-35576. * It is now possible to cast between scalars of different list-like types GH-36309. * Mon Jun 12 2023 Ben Greiner <code@bnavigator.de> - Update to 12.0.1 * [GH-35423] - [C++][Parquet] Parquet PageReader Force decompression buffer resize smaller (#35428) * [GH-35498] - [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers (#35565) * [GH-35519] - [C++][Parquet] Fixing exception handling in parquet FileSerializer (#35520) * [GH-35538] - [C++] Remove unnecessary status.h include from protobuf (#35673) * [GH-35730] - [C++] Add the ability to specify custom schema on a dataset write (#35860) * [GH-35850] - [C++] Don't disable optimization with RelWithDebInfo (#35856) - Drop cflags.patch -- fixed upstream * Thu May 18 2023 Ben Greiner <code@bnavigator.de> - Update to 12.0.0 * Run-End Encoded Arrays have been implemented and are accessible (GH-32104) * The FixedShapeTensor Logical value type has been implemented using ExtensionType (GH-15483, GH-34796) [#]# Compute * New kernel to convert timestamp with timezone to wall time (GH-33143) * Cast kernels are now built into libarrow by default (GH-34388) [#]# Acero * Acero has been moved out of libarrow into it’s own shared library, allowing for smaller builds of the core libarrow (GH-15280) * Exec nodes now can have a concept of “ordering” and will reject non-sensible plans (GH-34136) * New exec nodes: “pivot_longer” (GH-34266), “order_by” (GH-34248) and “fetch” (GH-34059) * Breaking Change: Reorder output fields of “group_by” node so that keys/segment keys come before aggregates (GH-33616) [#]# Substrait * Add support for the round function GH-33588 * Add support for the cast expression element GH-31910 * Added API reference documentation GH-34011 * Added an extension relation to support segmented aggregation GH-34626 * The output of the aggregate relation now conforms to the spec GH-34786 [#]# Parquet * Added support for DeltaLengthByteArray encoding to the Parquet writer (GH-33024) * NaNs are correctly handled now for Parquet predicate push-downs (GH-18481) * Added support for reading Parquet page indexes (GH-33596) and writing page indexes (GH-34053) * Parquet writer can write columns in parallel now (GH-33655) * Fixed incorrect number of rows in Parquet V2 page headers (GH-34086) * Fixed incorrect Parquet page null_count when stats are disabled (GH-34326) * Added support for reading BloomFilters to the Parquet Reader (GH-34665) * Parquet File-writer can now add additional key-value metadata after it has been opened (GH-34888) * Breaking Change: The default row group size for the Arrow writer changed from 64Mi rows to 1Mi rows. GH-34280 [#]# ORC * Added support for the union type in ORC writer (GH-34262) * Fixed ORC CHAR type mapping with Arrow (GH-34823) * Fixed timestamp type mapping between ORC and arrow (GH-34590) [#]# Datasets * Added support for reading JSON datasets (GH-33209) * Dataset writer now supports specifying a function callback to construct the file name in addition to the existing file name template (GH-34565) [#]# Filesystems * GcsFileSystem::OpenInputFile avoids unnecessary downloads (GH-34051) [#]# Other changes * Convenience Append(std::optional...) methods have been added to array builders ([GH-14863](https://github.com/apache/arrow/issues/14863)) * A deprecated OpenTelemetry header was removed from the Flight library (GH-34417) * Fixed crash in “take” kernels on ExtensionArrays with an underlying dictionary type (GH-34619) * Fixed bug where the C-Data bridge did not preserve nullability of map values on import (GH-34983) * Added support for EqualOptions to RecordBatch::Equals (GH-34968) * zstd dependency upgraded to v1.5.5 (GH-34899) * Improved handling of “logical” nulls such as with union and RunEndEncoded arrays (GH-34361) * Fixed incorrect handling of uncompressed body buffers in IPC reader, added IpcWriteOptions::min_space_savings for optional compression optimizations (GH-15102) * Mon Apr 03 2023 Andreas Schwab <schwab@suse.de> - cflags.patch: fix option order to compile with optimisation - Adjust constraints * Wed Mar 29 2023 Ben Greiner <code@bnavigator.de> - Remove gflags-static. It was only needed due to a packaging error with gflags which is about to be fixed in Tumbleweed - Disable build of the jemalloc memory pool backend * It requires every consuming application to LD_PRELOAD libjemalloc.so.2, even when it is not set as the default memory pool, due to static TLS block allocation errors * Usage of the bundled jemalloc as a workaround is not desired (gh#apache/arrow#13739) * jemalloc does not seem to have a clear advantage over the system glibc allocator: https://ursalabs.org/blog/2021-r-benchmarks-part-1 * This overrides the default behavior documented in https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool * Sun Mar 12 2023 Ben Greiner <code@bnavigator.de> - Update to v11.0.0 * ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100) * ARROW-11776 - [C++][Java] Support parquet write from ArrowReader to file (#14151) * ARROW-13938 - [C++] Date and datetime types should autocast from strings * ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018) * ARROW-14999 - [C++] Optional field name equality checks for map and list type (#14847) * ARROW-15538 - [C++] Expanding coverage of math functions from Substrait to Acero (#14434) * ARROW-15592 - [C++] Add support for custom output field names in a substrait::PlanRel (#14292) * ARROW-15732 - [C++] Do not use any CPU threads in execution plan when use_threads is false (#15104) * ARROW-16782 - [Format] Add REE definitions to FlatBuffers (#14176) * ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656) * ARROW-17301 - [C++] Implement compute function "binary_slice" (#14550) * ARROW-17509 - [C++] Simplify async scheduler by removing the need to call End (#14524) * ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll) (#14186) * ARROW-17610 - [C++] Support additional source types in SourceNode (#14207) * ARROW-17613 - [C++] Add function execution API for a preconfigured kernel (#14043) * ARROW-17640 - [C++] Add File Handling Test cases for GlobFile handling in Substrait Read (#14132) * ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer (#14191) * ARROW-17825 - [C++] Allow the possibility to write several tables in ORCFileWriter (#14219) * ARROW-17836 - [C++] Allow specifying alignment of buffers (#14225) * ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext that will store a plan's shared data structures (#14227) * ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource (#14250) * ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in Flight SQL (#14266) * ARROW-17932 - [C++] Implement streaming RecordBatchReader for JSON (#14355) * ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395) * ARROW-17966 - [C++] Adjust to new format for Substrait optional arguments (#14415) * ARROW-17975 - [C++] Create at-fork facility (#14594) * ARROW-17980 - [C++] As-of-Join Substrait extension (#14485) * ARROW-17989 - [C++][Python] Enable struct_field kernel to accept string field names (#14495) * ARROW-18008 - [Python][C++] Add use_threads to run_substrait_query * ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425) * ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139 * ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723) * ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be uninitialized (#14480) * ARROW-18144 - [C++] Improve JSONTypeError error message in testing (#14486) * ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552) * ARROW-18206 - [C++][CI] Add a nightly build for C++20 compilation (#14571) * ARROW-18235 - [C++][Gandiva] Fix the like function implementation for escape chars (#14579) * ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0 * ARROW-18253 - [C++][Parquet] Add additional bounds safety checks (#14592) * ARROW-18259 - [C++][CMake] Add support for system Thrift CMake package (#14597) * ARROW-18280 - [C++][Python] Support slicing to end in list_slice kernel (#14749) * ARROW-18282 - [C++][Python] Support step >= 1 in list_slice kernel (#14696) * ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc provided by vcpkg (#14609) * ARROW-18342 - [C++] AsofJoinNode support for Boolean data field (#14658) * ARROW-18350 - [C++] Use std::to_chars instead of std::to_string (#14666) * ARROW-18367 - [C++] Enable the creation of named table relations (#14681) * ARROW-18373 - Fix component drop-down, add license text (#14688) * ARROW-18377 - MIGRATION: Automate component labels from issue form content (#15245) * ARROW-18395 - [C++] Move select-k implementation into separate module * ARROW-18402 - [C++] Expose DeclarationInfo (#14765) * ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu 20.04 (#14735) * ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in building plasma-glib (#14739) * ARROW-18413 - [C++][Parquet] Expose page index info from ColumnChunkMetaData (#14742) * ARROW-18419 - [C++] Update vendored fast_float (#14817) * ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex (#14803) * ARROW-18421 - [C++][ORC] Add accessor for stripe information in reader (#14806) * ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode (#14934) * ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942) * GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in. (#14900) * GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake package (#15251) * GH-14937 - [C++] Add rank kernel benchmarks (#14938) * GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED encoding (#15140) * GH-15072 - [C++] Move the round functionality into a separate module (#15073) * GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit (#15182) * GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097) * GH-15100 - [C++][Parquet] Add benchmark for reading strings from Parquet (#15101) * GH-15151 - [C++] Adding RecordBatchReaderSource to solve an issue in R API (#15183) * GH-15185 - [C++][Parquet] Improve documentation for Parquet Reader column_indices (#15184) * GH-15199 - [C++][Substrait] Allow AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198) * GH-15200 - [C++] Created benchmarks for round kernels. (#15201) * GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch (#15240) * GH-15226 - [C++] Add DurationType to hash kernels (#33685) * GH-15237 - [C++] Add ::arrow::Unreachable() using std::string_view (#15238) * GH-15239 - [C++][Parquet] Parquet writer writes decimal as int32/64 (#15244) * GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case when the scalar is null (#15291) * GH-33607 - [C++] Support optional additional arguments for inline visit functions (#33608) * GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc without ARROW_PARQUET=ON (#33665) * PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated fields (#14366) * PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader (#14142) * PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should reuse scratch space (#14509) * PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader ReadBatch and Skip (#14523) * PARQUET-2209 - [parquet-cpp] Optimize skip for the case that number of values to skip equals page size (#14545) * PARQUET-2210 - [C++][Parquet] Skip pages based on header metadata using a callback (#14603) * PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field (#14556) - Remove unused python3-arrow package declaration * Add options as recommended for python support - Provide test data for unittests - Don't use system jemalloc but bundle it in order to avoid static TLS errors in consuming packages like python-pyarrow * gh#apache/arrow#13739 * Sun Aug 28 2022 Stefan Brüns <stefan.bruens@rwth-aachen.de> - Revert ccache change, using ccache in a pristine buildroot just slows down OBS builds (use --ccache for local builds). - Remove unused gflags-static-devel dependency. * Mon Aug 22 2022 John Vandenberg <jayvdb@gmail.com> - Speed up builds with ccache * Sat Aug 06 2022 Stefan Brüns <stefan.bruens@rwth-aachen.de> - Update to v9.0.0 No (current) changelog provided - Spec file cleanup: * Remove lots of duplicate, unused, or wrong build dependencies * Do not package outdated Readmes and Changelogs - Enable tests, disable ones requiring external test data
/usr/bin/parquet-dump-arrow-statistics /usr/bin/parquet-dump-footer /usr/bin/parquet-dump-schema /usr/bin/parquet-reader /usr/bin/parquet-scan /usr/share/doc/packages/apache-parquet-utils /usr/share/doc/packages/apache-parquet-utils/README.md /usr/share/licenses/apache-parquet-utils /usr/share/licenses/apache-parquet-utils/LICENSE.txt /usr/share/licenses/apache-parquet-utils/NOTICE.txt /usr/share/licenses/apache-parquet-utils/header
Generated by rpm2html 1.8.1
Fabrice Bellet, Thu Oct 23 22:58:29 2025